# Basic Example

**Demonstrating current iteration of tablevault: command line version**



# Setup

Installing package:

- Download from Github repository and build
- Add OpenAI Key to appropriate location

# General Functionalities

- Every Table Modification is Logged
- Multi-Process Safe -> Locking supports Single Writes and Multiple Reads
- Re-start Safe (as long as operation is logged, the operation will execute function as intended):
- Robust Versioning
- Granualar (Manual) History Tracking
- Inter-Table Linking (Similar to Foriegn Keys)
- Allow both Row and Column Table Updates

# Initial/External Data
- Yaml Files
- Data Source (pdf): OpenAI generated short story. 
- OpenAI API Key

## Example YAML Files

In [22]:
import yaml
# Table Generation File
rel_path = './test_data/test_data_db/stories/fetch_stories.yaml'

with open(rel_path, "r") as file:
    yaml_data = yaml.safe_load(file)
yaml_data

{'type': 'code',
 'dependencies': [],
 'changed_columns': ['paper_name', 'paper_path'],
 'function': 'create_paper_table_from_folder',
 'code_file': 'table_generation.py',
 'is_global': True,
 'is_udf': False,
 'arguments': {'folder_dir': './test_data/stories', 'copies': 1},
 'table_creation': True,
 'n_threads': 1}

In [23]:
# Code Execution File

rel_path = './test_data/test_data_db/llm_storage/upload_openai.yaml'

with open(rel_path, "r") as file:
    yaml_data = yaml.safe_load(file)
yaml_data

{'type': 'code',
 'dependencies': ['self.paper_name',
  'stories.paper_name',
  'stories.paper_path'],
 'changed_columns': ['openai_id'],
 'function': 'upload_file_from_table',
 'code_file': 'open_ai_store.py',
 'is_global': True,
 'is_udf': True,
 'arguments': {'file_path': '<<stories.paper_path[paper_name:self.paper_name]>>',
  'key_file': './open_ai_key/key.txt'},
 'n_threads': 1}

In [24]:
# LLM Execution File (Only support OpenAI Threads for now)

rel_path = './test_data/test_data_db/llm_questions/question_3.yaml'

with open(rel_path, "r") as file:
    yaml_data = yaml.safe_load(file)
yaml_data

{'type': 'llm',
 'dependencies': ['llm_storage.openai_id',
  'llm_storage.paper_name',
  'self.paper_name'],
 'changed_columns': ['q3'],
 'retry': 5,
 'n_threads': 1,
 'context_files': ['<<llm_storage.openai_id[paper_name:self.paper_name]>>'],
 'context_msgs': ['Use this story to answer questions'],
 'questions': ['Would you best classify Clara as a: CATEGORIES?'],
 'open_ai_key': './open_ai_key/key.txt',
 'output_type': 'category',
 'entity_name': 'motivations',
 'category_names': ['Protagonist seeking knowledge',
  'Reckless adventurer',
  'Reluctant hero forced into action',
  'Naive child learning responsibility'],
 'model': 'gpt-4o-mini',
 'temperature': 0.2}

# Make Base Folder

In [25]:
# Command to initialize folder

!tablevault database -db test_database -r

In [26]:
# View current folder structure
import os

def print_directory_tree(root_dir, indent=" ", files = True):
    for item in os.listdir(root_dir):
        path = os.path.join(root_dir, item)
        if os.path.isdir(path):
            print(indent + "📁 " + item)
            print_directory_tree(path, indent + "    ", files)
        else:
            if files:
                print(indent + "📄 " + item)


print_directory_tree(root_dir = 'test_database', indent=" ")

 📁 locks
     📄 DATABASE.lock
     📁 RESTART
         📄 RESTART.lock
 📁 metadata
     📄 active_log.json
     📄 log.txt
     📄 tables_history.json
     📄 tables_multiple.json
     📄 columns_history.json
 📁 code_functions


# Make Table Folders

In [27]:
# Commands to initialize table folders

! tablevault setup-table -db test_database -t stories

! tablevault setup-table -db test_database -t llm_storage

! tablevault setup-table -db test_database -t llm_questions -m # allow multiple active versions for different versioning testing

In [28]:
# View Current folder structure

print_directory_tree(root_dir = 'test_database', indent=" ")

 📁 llm_questions
     📁 prompts
 📁 locks
     📁 llm_questions
         📄 llm_questions.lock
     📄 DATABASE.lock
     📁 RESTART
         📄 RESTART.lock
     📁 llm_storage
         📄 llm_storage.lock
     📁 stories
         📄 stories.lock
 📁 llm_storage
     📁 prompts
 📁 stories
     📁 prompts
 📁 metadata
     📄 active_log.json
     📄 log.txt
     📄 tables_history.json
     📄 LOG.lock
     📄 tables_multiple.json
     📄 columns_history.json
 📁 code_functions


In [29]:
# View example log and metadata files
log_file = 'test_database/metadata/log.txt'
with open(log_file, 'r') as f:
    content = f.read()
    print(content)

{"process_id": "379af146-1a3b-417f-a492-5e0b480bfdf5", "author": "command_line", "start_time": 1738552629.203393, "log_time": 1738552629.20468, "table_name": "stories", "instance_id": "", "restarts": [], "operation": "setup_table", "complete_steps": ["write_log"], "step_times": [1738552629.2041652], "data": {"allow_multiple": false}, "success": true}
{"process_id": "09ebbc3f-24d0-43ec-8d00-4b03ed899933", "author": "command_line", "start_time": 1738552629.802677, "log_time": 1738552629.803997, "table_name": "llm_storage", "instance_id": "", "restarts": [], "operation": "setup_table", "complete_steps": ["write_log"], "step_times": [1738552629.803471], "data": {"allow_multiple": false}, "success": true}
{"process_id": "a6e8a098-f813-4b34-baf0-bfba0f9f69c1", "author": "command_line", "start_time": 1738552630.402587, "log_time": 1738552630.403903, "table_name": "llm_questions", "instance_id": "", "restarts": [], "operation": "setup_table", "complete_steps": ["write_log"], "step_times": [173

In [30]:
import json
log_file = 'test_database/metadata/tables_multiple.json'
with open(log_file, 'r') as f:
    content = json.load(f)
    print(content)

{'stories': False, 'llm_storage': False, 'llm_questions': True}


**Example Active Log**

Shown in text because file deleted once executed.

# Make Table Instances

## First we add the prompts to the data table "prompts" directories

In [31]:
import shutil

def copy_files_to_table(base_dir, db_dir, table_name):
    org_path = os.path.join(base_dir, table_name)
    new_path = os.path.join(db_dir, table_name)
    new_path = os.path.join(new_path, 'prompts')
    for file in os.listdir(org_path):
        if file.endswith('.yaml'):
            org_path_ = os.path.join(org_path, file)
            new_path_ = os.path.join(new_path, file)
            if os.path.exists(new_path_):
                os.remove(new_path_)
            shutil.copy2(org_path_, new_path_)

yaml_base_dir = './test_data/test_data_db'
db_dire = './test_database'

copy_files_to_table(yaml_base_dir, db_dire, "stories")
copy_files_to_table(yaml_base_dir, db_dire, "llm_storage")
copy_files_to_table(yaml_base_dir, db_dire, "llm_questions")

## Then we make instances with specified prompts

In [32]:
# Make Temporary (Not executed) Instances

!tablevault setup-temp -db test_database -t stories -p fetch_stories -gp fetch_stories

!tablevault setup-temp -db test_database -t llm_storage -p fetch_llm_storage -p upload_openai -gp fetch_llm_storage

!tablevault setup-temp -db test_database -t llm_questions -v version_1 \
-p fetch_llm_question -p question_1 -p question_2 -p  question_3 -gp fetch_llm_question


In [33]:
# View Current Folder Structure
print_directory_tree(root_dir = 'test_database/stories', indent=" ")

 📁 TEMP_
     📄 table.csv
     📁 prompts
         📄 fetch_stories.yaml
         📄 description.yaml
 📁 prompts
     📄 fetch_stories_5.yaml
     📄 fetch_stories.yaml


In [34]:

rel_path = './test_database/stories/TEMP_/prompts/description.yaml'

with open(rel_path, "r") as file:
    yaml_data = yaml.safe_load(file)
yaml_data

{'copied_prompts': ['fetch_stories'], 'table_generator': 'fetch_stories'}

# Execute Table Instances

In [35]:
# Make PDF Table
! tablevault execute -db test_database -t stories
print_directory_tree(root_dir = 'test_database/stories', indent=" ")

PARSED DEPENDENCIES AND TO EXECUTE PROMPTS
FINISHED STEP
fetch_stories
FINISHED INSTANCE: _1738552641_bAMKC
 📁 _1738552641_bAMKC
     📄 table.csv
     📁 prompts
         📄 fetch_stories.yaml
         📄 description.yaml
 📁 prompts
     📄 fetch_stories_5.yaml
     📄 fetch_stories.yaml


In [36]:
story_instance_id = !tablevault list-instances -db test_database -t stories
story_instance_id = story_instance_id[0]


In [37]:
import pandas as pd
df_path = f'test_database/stories/{story_instance_id}/table.csv'
df = pd.read_csv(df_path)
df

Unnamed: 0,paper_name,paper_path
0,The_Clockmakers_Secret,./test_data/stories/The_Clockmakers_Secret.pdf


In [38]:
# Make OpenAI Storage File (upload pdf to OpenAI) (Allow Multiple Threads)
! tablevault execute -db test_database -t llm_storage

PARSED DEPENDENCIES AND TO EXECUTE PROMPTS
FINISHED STEP
fetch_llm_storage
FINISHED STEP
upload_openai
FINISHED INSTANCE: _1738552643_UbwUu


In [39]:
# View Table
code_instance_id = !tablevault list-instances -db test_database -t llm_storage
code_instance_id = code_instance_id[0]

df_path = f'test_database/llm_storage/{code_instance_id}/table.csv'
df = pd.read_csv(df_path)
df

Unnamed: 0,paper_name,openai_id
0,The_Clockmakers_Secret,file-GgbX2BsfqTW6MUBj1KtASg


In [40]:
# Make LLM Question Response Table (Allow Multiple Threads)
! tablevault execute -db test_database -t llm_questions -v version_1

PARSED DEPENDENCIES AND TO EXECUTE PROMPTS
FINISHED STEP
fetch_llm_question
FINISHED STEP
question_1
FINISHED STEP
question_2
FINISHED STEP
question_3
FINISHED INSTANCE: version_1_1738552646_mJNuy


In [41]:
llm_instance_id = !tablevault list-instances -db test_database -t llm_questions
llm_instance_id = llm_instance_id[0]

df_path = f'test_database/llm_questions/{llm_instance_id}/table.csv'
pd.set_option("display.max_colwidth", None)

df = pd.read_csv(df_path)
df

Unnamed: 0,paper_name,q1,q2_1,q2,q3_1,q3
0,The_Clockmakers_Secret,"with a deeper understanding of time's value and the impact of her choices. Years later, she reflects on her experiences and carries the lesson that time is precious, indicating a maturation in her character and a recognition of the weight of her curiosity【5:0†source】. \n\nIn summary, Clara's curiosity serves as both a catalyst for her adventures and a means of personal development, leading her to learn about the complexities of time and the responsibilities that come with knowledge and exploration.","Percival's motivation to help Clara, even after she disrupts time, can be attributed to several reasons:\n\n1. **Trust and Mentorship**: Percival had developed a relationship with Clara over time, trusting her with tasks in his shop. This bond likely motivated him to guide her through the consequences of her actions rather than abandoning her【5:0†source】.\n\n2. **Responsibility**: As the clockmaker, Percival understands the significance of time and its manipulation. He feels a sense of responsibility to help Clara correct her mistake, as he is aware of the potential chaos that can ensue from tampering with the Master Clock【5:0†source】.\n\n3. **Empathy**: Percival's reaction to Clara's panic suggests that he empathizes with her situation. He recognizes that her actions were not intentional and that she is genuinely distressed about the disruption she caused【5:0†source】.\n\n4. **Desire to Restore Balance**: Percival is motivated by the need to restore the balance of time. He understands the importance of fixing the disruption Clara caused and is willing to work with her to achieve that【5:0†source】.\n\n5. **Teaching Moment**: Percival likely sees this as an opportunity to teach Clara about the complexities and responsibilities that come with understanding time. By helping her, he can impart valuable lessons about the consequences of her actions【5:0†source】.\n\nThese motivations reflect Percival's character as a wise mentor who values the integrity of time and the growth of those he guides.","['Trust and Mentorship', 'Responsibility', 'Empathy', 'Desire to Restore Balance', 'Teaching Moment']","Based on the story, Clara can best be classified as a **""Protagonist seeking knowledge.""** Throughout the narrative, she demonstrates a strong curiosity and desire to learn, particularly about the magical clocks and their abilities to manipulate time. Her initial exploration of the clockmaker's shop and her subsequent actions, such as asking questions and engaging with the clocks, highlight her quest for understanding and knowledge.\n\nWhile she does exhibit some reckless behavior by tampering with the Master Clock, her primary motivation seems to stem from a thirst for knowledge rather than a desire for adventure or heroism. Therefore, the classification of ""Protagonist seeking knowledge"" fits her character best【5:0†source】.",Protagonist seeking knowledge


# Table Versioning

We automatically do not re-execute tables if there hasn't been any changes. We only execute cases where the rows (samples), columns (prompts), or dependencies (other tables) have changed by default. 

## No Changes

In [42]:
# Copy Previous Instances
!tablevault setup-temp -db test_database -t stories -pid {story_instance_id}

!tablevault setup-temp -db test_database -t llm_storage -pid {code_instance_id}

!tablevault setup-temp -db test_database -t llm_questions -pid {llm_instance_id} -v version_1

In [43]:
# Execute new Instances
! tablevault execute -db test_database -t stories
! tablevault execute -db test_database -t llm_storage
! tablevault execute -db test_database -t llm_questions -v version_1

PARSED DEPENDENCIES AND TO EXECUTE PROMPTS
NO UPDATES: NOTHING HAPPENS.
PARSED DEPENDENCIES AND TO EXECUTE PROMPTS
NO UPDATES: NOTHING HAPPENS.
PARSED DEPENDENCIES AND TO EXECUTE PROMPTS
NO UPDATES: NOTHING HAPPENS.


In [44]:
print_directory_tree(root_dir = 'test_database/stories', indent=" ")

 📁 _1738552641_bAMKC
     📄 table.csv
     📁 prompts
         📄 fetch_stories.yaml
         📄 description.yaml
 📁 TEMP_
     📄 table.csv
     📁 prompts
         📄 fetch_stories.yaml
         📄 description.yaml
 📁 prompts
     📄 fetch_stories_5.yaml
     📄 fetch_stories.yaml


## Column Update 

What happens to execution of LLM questions if we change a column? We will test this on a copied database so the original remains clean.

We generate a new prompt question for llm_questions, and create a new instance with that question.

In [45]:
# function to replace question_1 prompt with question_1a
def replace_prompt(prompt, instance_id, table_name, base_dir = './test_data/test_data_db', db_dir = './test_database_2', prev_prompt= None):
    org_path = os.path.join(base_dir, table_name, prompt)
    new_dir = os.path.join(db_dir, table_name, instance_id, 'prompts')
    new_path = os.path.join(new_dir, prompt)
    if prev_prompt != None:
        prev_prompt_path = os.path.join(new_dir, prev_prompt)
        if os.path.exists(prev_prompt_path):
            print('Removed')
            print(prev_prompt_path)
            os.remove(prev_prompt_path)
    shutil.copy2(org_path, new_path)

In [46]:
# Copy database
def copy_db(new_dir = 'test_database_2', orig_dir = 'test_database'):
    if os.path.exists(new_dir):
        shutil.rmtree(new_dir)
    shutil.copytree(orig_dir, new_dir)

copy_db() 
# since everything is saved in the directory, 
# once we copy the data over, we can execute like normal 
# as if we are just continuing previous operations

In [47]:
# generate new temporary instance and replace prompt
!tablevault setup-temp -db test_database_2 -t llm_questions -pid {llm_instance_id} -v version_2

replace_prompt("question_1a.yaml", "TEMP_version_2", "llm_questions", prev_prompt= "question_1.yaml")

print_directory_tree(root_dir = 'test_database_2/llm_questions/TEMP_version_2', indent=" ") # Note how we have a new prompt (user changed)

Removed
./test_database_2/llm_questions/TEMP_version_2/prompts/question_1.yaml
 📄 table.csv
 📁 prompts
     📄 fetch_llm_question.yaml
     📄 question_2.yaml
     📄 description.yaml
     📄 question_3.yaml
     📄 question_1a.yaml


In [48]:
# Execute with new prompt
!tablevault execute -db test_database_2 -t llm_questions -v version_2

PARSED DEPENDENCIES AND TO EXECUTE PROMPTS
FINISHED STEP
fetch_llm_question
FINISHED STEP
question_1a
FINISHED INSTANCE: version_2_1738552797_TFhjO


In [49]:
# If we compare the two dataframes, note only the question 1's column changed
llm_instance_ids = !tablevault list-instances -db test_database_2 -t llm_questions

for instance_id in llm_instance_ids:
    df_path = f'test_database_2/llm_questions/{instance_id}/table.csv'
    df = pd.read_csv(df_path)
    print(instance_id)
    display(df)


version_1_1738552646_mJNuy


Unnamed: 0,paper_name,q1,q2_1,q2,q3_1,q3
0,The_Clockmakers_Secret,"with a deeper understanding of time's value and the impact of her choices. Years later, she reflects on her experiences and carries the lesson that time is precious, indicating a maturation in her character and a recognition of the weight of her curiosity【5:0†source】. \n\nIn summary, Clara's curiosity serves as both a catalyst for her adventures and a means of personal development, leading her to learn about the complexities of time and the responsibilities that come with knowledge and exploration.","Percival's motivation to help Clara, even after she disrupts time, can be attributed to several reasons:\n\n1. **Trust and Mentorship**: Percival had developed a relationship with Clara over time, trusting her with tasks in his shop. This bond likely motivated him to guide her through the consequences of her actions rather than abandoning her【5:0†source】.\n\n2. **Responsibility**: As the clockmaker, Percival understands the significance of time and its manipulation. He feels a sense of responsibility to help Clara correct her mistake, as he is aware of the potential chaos that can ensue from tampering with the Master Clock【5:0†source】.\n\n3. **Empathy**: Percival's reaction to Clara's panic suggests that he empathizes with her situation. He recognizes that her actions were not intentional and that she is genuinely distressed about the disruption she caused【5:0†source】.\n\n4. **Desire to Restore Balance**: Percival is motivated by the need to restore the balance of time. He understands the importance of fixing the disruption Clara caused and is willing to work with her to achieve that【5:0†source】.\n\n5. **Teaching Moment**: Percival likely sees this as an opportunity to teach Clara about the complexities and responsibilities that come with understanding time. By helping her, he can impart valuable lessons about the consequences of her actions【5:0†source】.\n\nThese motivations reflect Percival's character as a wise mentor who values the integrity of time and the growth of those he guides.","['Trust and Mentorship', 'Responsibility', 'Empathy', 'Desire to Restore Balance', 'Teaching Moment']","Based on the story, Clara can best be classified as a **""Protagonist seeking knowledge.""** Throughout the narrative, she demonstrates a strong curiosity and desire to learn, particularly about the magical clocks and their abilities to manipulate time. Her initial exploration of the clockmaker's shop and her subsequent actions, such as asking questions and engaging with the clocks, highlight her quest for understanding and knowledge.\n\nWhile she does exhibit some reckless behavior by tampering with the Master Clock, her primary motivation seems to stem from a thirst for knowledge rather than a desire for adventure or heroism. Therefore, the classification of ""Protagonist seeking knowledge"" fits her character best【5:0†source】.",Protagonist seeking knowledge


version_2_1738552797_TFhjO


Unnamed: 0,paper_name,q1,q2_1,q2,q3_1,q3
0,The_Clockmakers_Secret,"Clara's actions in the story illustrate a fundamental aspect of human nature: the irresistible pull of curiosity. Her initial exploration of the clockmaker's shop reflects a natural desire to seek out the unknown and understand the world around her. This curiosity drives her to engage with the mysterious clocks, each representing a different facet of time and reality. \n\nDespite warnings from Percival about the dangers of tampering with the Master Clock, Clara's intrigue ultimately leads her to act against her better judgment. This moment of decision highlights a common theme in human behavior—curiosity can lead to both discovery and peril. When she opens the cabinet and disrupts the flow of time, it serves as a metaphor for the consequences that can arise from unchecked curiosity. \n\nThe narrative suggests that while curiosity is a powerful motivator for exploration and learning, it can also lead to unintended consequences. Clara's journey emphasizes the duality of human nature: the quest for knowledge can bring enlightenment, but it also carries risks that must be navigated carefully【5:0†source】.","Percival's motivation to help Clara, even after she disrupts time, can be attributed to several reasons:\n\n1. **Trust and Mentorship**: Percival had developed a relationship with Clara over time, trusting her with tasks in his shop. This bond likely motivated him to guide her through the consequences of her actions rather than abandoning her【5:0†source】.\n\n2. **Responsibility**: As the clockmaker, Percival understands the significance of time and its manipulation. He feels a sense of responsibility to help Clara correct her mistake, as he is aware of the potential chaos that can ensue from tampering with the Master Clock【5:0†source】.\n\n3. **Empathy**: Percival's reaction to Clara's panic suggests that he empathizes with her situation. He recognizes that her actions were not intentional and that she is genuinely distressed about the disruption she caused【5:0†source】.\n\n4. **Desire to Restore Balance**: Percival is motivated by the need to restore the balance of time. He understands the importance of fixing the disruption Clara caused and is willing to work with her to achieve that【5:0†source】.\n\n5. **Teaching Moment**: Percival likely sees this as an opportunity to teach Clara about the complexities and responsibilities that come with understanding time. By helping her, he can impart valuable lessons about the consequences of her actions【5:0†source】.\n\nThese motivations reflect Percival's character as a wise mentor who values the integrity of time and the growth of those he guides.","['Trust and Mentorship', 'Responsibility', 'Empathy', 'Desire to Restore Balance', 'Teaching Moment']","Based on the story, Clara can best be classified as a **""Protagonist seeking knowledge.""** Throughout the narrative, she demonstrates a strong curiosity and desire to learn, particularly about the magical clocks and their abilities to manipulate time. Her initial exploration of the clockmaker's shop and her subsequent actions, such as asking questions and engaging with the clocks, highlight her quest for understanding and knowledge.\n\nWhile she does exhibit some reckless behavior by tampering with the Master Clock, her primary motivation seems to stem from a thirst for knowledge rather than a desire for adventure or heroism. Therefore, the classification of ""Protagonist seeking knowledge"" fits her character best【5:0†source】.",Protagonist seeking knowledge


## Row Update
What happens to execution of LLM questions if we add a row (pdf file)? We will test this on a copied database so the original remains clean.

We insert a new story in the stories folder. We will then re-execute the stories and the open-ai storage dataframes to seel what happens.

In [50]:
# Functions to copy
def copy_story(base_dir= './test_data/stories', story_name = 'The_Clockmakers_Secret.pdf'):
    org_path = os.path.join(base_dir, story_name)
    new_name = story_name.split(".")[0] + '_copy.pdf'
    new_path = os.path.join(base_dir, new_name)
    shutil.copy2(org_path, new_path)

# Clean Up
def delete_story(base_dir= './test_data/stories', story_name = 'The_Clockmakers_Secret_copy.pdf'):
    story_path  = os.path.join(base_dir, story_name)
    os.remove(story_path)

In [51]:
copy_db()
copy_story()

In [52]:
# make new table instances
!tablevault setup-temp -db test_database_2 -t stories -pid {story_instance_id}
!tablevault setup-temp -db test_database_2 -t llm_storage -pid {code_instance_id}


In [55]:
# Execute New instances
! tablevault execute -db test_database_2 -t stories
! tablevault execute -db test_database_2 -t llm_storage

PARSED DEPENDENCIES AND TO EXECUTE PROMPTS
FINISHED STEP
fetch_stories
FINISHED INSTANCE: _1738552851_OvSnn
PARSED DEPENDENCIES AND TO EXECUTE PROMPTS
FINISHED STEP
fetch_llm_storage
FINISHED STEP
upload_openai
FINISHED INSTANCE: _1738552851_KDEmv


In [60]:
# If we compare the two dataframes, note only the column's changed
story_instance_ids = !tablevault list-instances -db test_database_2 -t stories
code_instance_ids = !tablevault list-instances -db test_database_2 -t llm_storage

for s_instance_id, c_instance_id,  in zip(story_instance_ids, code_instance_ids):
    df_path = f'test_database_2/stories/{s_instance_id}/table.csv'
    df = pd.read_csv(df_path)
    display(df)
    df_path = f'test_database_2/llm_storage/{c_instance_id}/table.csv'
    df = pd.read_csv(df_path)
    display(df)
    


Unnamed: 0,paper_name,paper_path
0,The_Clockmakers_Secret,./test_data/stories/The_Clockmakers_Secret.pdf


Unnamed: 0,paper_name,openai_id
0,The_Clockmakers_Secret,file-GgbX2BsfqTW6MUBj1KtASg


Unnamed: 0,paper_name,paper_path
0,The_Clockmakers_Secret,./test_data/stories/The_Clockmakers_Secret.pdf
1,The_Clockmakers_Secret_copy,./test_data/stories/The_Clockmakers_Secret_copy.pdf


Unnamed: 0,paper_name,openai_id
0,The_Clockmakers_Secret,file-GgbX2BsfqTW6MUBj1KtASg
1,The_Clockmakers_Secret_copy,file-EGsvC5DNzCuW2kbueCEoAn


In [61]:
# IMPORTANT: Cleanup
delete_story()

# Delete Instance and Delete Table

Deletes are simple but need to be executed with the command lines, so that the metadata is correctly updated. By default nothing is every deleted.

In [62]:
# Delete table instance 
! tablevault delete-instance -db test_database -t stories -id {story_instance_id}

In [63]:
# Delete table
! tablevault delete-table -db test_database -t llm_storage

In [64]:
# Show new folder and metadata
print_directory_tree(root_dir = 'test_database', indent=" ", files = False)

 📁 llm_questions
     📁 TEMP_version_1
         📁 prompts
     📁 version_1_1738552646_mJNuy
         📁 prompts
     📁 prompts
 📁 locks
     📁 llm_questions
     📁 RESTART
     📁 stories
 📁 stories
     📁 _1738552829_onJsA
         📁 prompts
     📁 prompts
 📁 metadata
 📁 code_functions


# Restarts

Not included in demo because requires mid-process stopping, but tested.

Restarts are executed with:

! tablevault restart -db test_database 


# IMPORTANT - Cleanup
Cleanup OpenAI Storage after tests. Note: this cleans up all openai files -> might destroy other experiments

In [65]:
def clean_up_open_ai(key_file = "open_ai_key/key.txt"):
    import openai
    from tqdm import tqdm
    with open(key_file, 'r') as f:
        secret = f.read()
        os.environ["OPENAI_API_KEY"] = secret
    client = openai.OpenAI()
    files = list(client.files.list())
    vector_stores = list(client.beta.vector_stores.list())
    my_assistants = list(client.beta.assistants.list())
    for store in tqdm(vector_stores):
        try:
          client.beta.vector_stores.delete(
            vector_store_id=store.id
          )
        except:
            pass
    for f in tqdm(files):
        try:
          client.files.delete(
            file_id=f.id
          )
        except:
          pass
    
    for assistant in tqdm(my_assistants):
        try:
            client.beta.assistants.delete(assistant.id)
        except:
            pass
    
    print(client.beta.vector_stores.list())
    print(client.files.list())
    print(client.beta.assistants.list())

In [67]:
clean_up_open_ai()
shutil.rmtree('test_database')
shutil.rmtree('test_database_2')


0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]


SyncCursorPage[VectorStore](data=[], object='list', first_id=None, last_id=None, has_more=False)
SyncCursorPage[FileObject](data=[], object='list', has_more=False, first_id=None, last_id=None)
SyncCursorPage[Assistant](data=[], object='list', first_id=None, last_id=None, has_more=False)
