# Basic Example

**Demonstrating current iteration of tablevault: command line version**



# Setup

Installing package:

- Download from Github repository and build
- Add OpenAI Key to appropriate location

# General Functionalities

- Every Table Modification is Logged
- Multi-Process Safe -> Locking supports Single Writes and Multiple Reads
- Re-start Safe (as long as operation is logged, the operation will execute function as intended):
- Robust Versioning
- Granualar (Manual) History Tracking
- Inter-Table Linking (Similar to Foriegn Keys)
- Allow both Row and Column Table Updates

# Initial/External Data
- Yaml Files
- Data Source (pdf): OpenAI generated short story. 
- OpenAI API Key

## Example YAML Files

In [1]:
import yaml
# Table Generation File
rel_path = './test_data/test_data_db/stories/fetch_stories.yaml'

with open(rel_path, "r") as file:
    yaml_data = yaml.safe_load(file)
yaml_data

{'type': 'code',
 'dependencies': [],
 'changed_columns': ['paper_name', 'paper_path'],
 'function': 'create_paper_table_from_folder',
 'code_file': 'table_generation.py',
 'is_global': True,
 'is_udf': False,
 'arguments': {'folder_dir': './test_data/stories', 'copies': 1},
 'table_creation': True,
 'n_threads': 1}

In [2]:
# Code Execution File

rel_path = './test_data/test_data_db/llm_storage/upload_openai.yaml'

with open(rel_path, "r") as file:
    yaml_data = yaml.safe_load(file)
yaml_data

{'type': 'code',
 'dependencies': ['self.paper_name',
  'stories.paper_name',
  'stories.paper_path'],
 'changed_columns': ['openai_id'],
 'function': 'upload_file_from_table',
 'code_file': 'open_ai_store.py',
 'is_global': True,
 'is_udf': True,
 'arguments': {'file_path': '<<stories.paper_path[paper_name:self.paper_name]>>',
  'key_file': './open_ai_key/key.txt'},
 'n_threads': 1}

In [3]:
# LLM Execution File (Only support OpenAI Threads for now)

rel_path = './test_data/test_data_db/llm_questions/question_3.yaml'

with open(rel_path, "r") as file:
    yaml_data = yaml.safe_load(file)
yaml_data

{'type': 'llm',
 'dependencies': ['llm_storage.openai_id',
  'llm_storage.paper_name',
  'self.paper_name'],
 'changed_columns': ['q3'],
 'retry': 5,
 'n_threads': 1,
 'context_files': ['<<llm_storage.openai_id[paper_name:self.paper_name]>>'],
 'context_msgs': ['Use this story to answer questions'],
 'questions': ['Would you best classify Clara as a: CATEGORIES?'],
 'open_ai_key': './open_ai_key/key.txt',
 'output_type': 'category',
 'entity_name': 'motivations',
 'category_names': ['Protagonist seeking knowledge',
  'Reckless adventurer',
  'Reluctant hero forced into action',
  'Naive child learning responsibility'],
 'model': 'gpt-4o-mini',
 'temperature': 0.2}

# Make Base Folder

In [19]:
# Command to initialize folder

!tablevault database -db test_database -r

In [54]:
# View current folder structure
import os

def print_directory_tree(root_dir, indent=" ", files = True):
    for item in os.listdir(root_dir):
        path = os.path.join(root_dir, item)
        if os.path.isdir(path):
            print(indent + "📁 " + item)
            print_directory_tree(path, indent + "    ", files)
        else:
            if files:
                print(indent + "📄 " + item)


print_directory_tree(root_dir = 'test_database', indent=" ")

 📁 llm_questions
     📁 TEMP_version_1
         📄 table.csv
         📁 prompts
             📄 fetch_llm_question.yaml
             📄 question_1.yaml
             📄 question_2.yaml
             📄 description.yaml
             📄 question_3.yaml
     📁 version_1_1738336233_URPVb
         📄 table.csv
         📁 prompts
             📄 fetch_llm_question.yaml
             📄 question_1.yaml
             📄 question_2.yaml
             📄 description.yaml
             📄 question_3.yaml
     📁 prompts
         📄 fetch_llm_question.yaml
         📄 question_1_multi.yaml
         📄 question_1.yaml
         📄 question_2.yaml
         📄 question_3.yaml
         📄 question_1a.yaml
 📁 locks
     📁 llm_questions
         📄 version_1_1738336233_URPVb.lock
         📄 llm_questions.lock
         📄 TEMP_version_1.lock
     📄 DATABASE.lock
     📁 RESTART
         📄 RESTART.lock
     📁 stories
         📄 TEMP_.lock
         📄 _1738336171_Fbikh.lock
         📄 stories.lock
 📁 stories
     📁 TEMP_
         📄 tab

# Make Table Folders

In [21]:
# Commands to initialize table folders

! tablevault setup-table -db test_database -t stories

! tablevault setup-table -db test_database -t llm_storage

! tablevault setup-table -db test_database -t llm_questions -m # allow multiple active versions for different versioning testing

In [22]:
# View Current folder structure

print_directory_tree(root_dir = 'test_database', indent=" ")

 📁 llm_questions
     📁 prompts
 📁 locks
     📁 llm_questions
         📄 llm_questions.lock
     📄 DATABASE.lock
     📁 RESTART
         📄 RESTART.lock
     📁 llm_storage
         📄 llm_storage.lock
     📁 stories
         📄 stories.lock
 📁 llm_storage
     📁 prompts
 📁 stories
     📁 prompts
 📁 metadata
     📄 active_log.json
     📄 log.txt
     📄 tables_history.json
     📄 LOG.lock
     📄 tables_multiple.json
     📄 columns_history.json
 📁 code_functions


In [23]:
# View example log and metadata files
log_file = 'test_database/metadata/log.txt'
with open(log_file, 'r') as f:
    content = f.read()
    print(content)

{"process_id": "b2945c09-5666-479f-90d1-cf7978cf4677", "author": "command_line", "start_time": 1738336152.337733, "log_time": 1738336152.3392901, "table_name": "stories", "instance_id": "", "restarts": [], "operation": "setup_table", "complete_steps": ["write_log"], "step_times": [1738336152.338526], "data": {"allow_multiple": false}, "success": true}
{"process_id": "a1abb1de-b12a-49c3-a2a5-6606ed1f13fd", "author": "command_line", "start_time": 1738336152.93056, "log_time": 1738336152.9323149, "table_name": "llm_storage", "instance_id": "", "restarts": [], "operation": "setup_table", "complete_steps": ["write_log"], "step_times": [1738336152.931417], "data": {"allow_multiple": false}, "success": true}
{"process_id": "b6620f3e-b760-412b-8493-9a6f592fa123", "author": "command_line", "start_time": 1738336153.520171, "log_time": 1738336153.5217128, "table_name": "llm_questions", "instance_id": "", "restarts": [], "operation": "setup_table", "complete_steps": ["write_log"], "step_times": [1

In [24]:
import json
log_file = 'test_database/metadata/tables_multiple.json'
with open(log_file, 'r') as f:
    content = json.load(f)
    print(content)

{'stories': False, 'llm_storage': False, 'llm_questions': True}


**Example Active Log**

Shown in text because file deleted once executed.

# Make Table Instances

## First we add the prompts to the data table "prompts" directories

In [25]:
import shutil

def copy_files_to_table(base_dir, db_dir, table_name):
    org_path = os.path.join(base_dir, table_name)
    new_path = os.path.join(db_dir, table_name)
    new_path = os.path.join(new_path, 'prompts')
    for file in os.listdir(org_path):
        if file.endswith('.yaml'):
            org_path_ = os.path.join(org_path, file)
            new_path_ = os.path.join(new_path, file)
            if os.path.exists(new_path_):
                os.remove(new_path_)
            shutil.copy2(org_path_, new_path_)

yaml_base_dir = './test_data/test_data_db'
db_dire = './test_database'

copy_files_to_table(yaml_base_dir, db_dire, "stories")
copy_files_to_table(yaml_base_dir, db_dire, "llm_storage")
copy_files_to_table(yaml_base_dir, db_dire, "llm_questions")

## Then we make instances with specified prompts

In [26]:
# Make Temporary (Not executed) Instances

!tablevault setup-temp -db test_database -t stories -p fetch_stories -gp fetch_stories

!tablevault setup-temp -db test_database -t llm_storage -p fetch_llm_storage -p upload_openai -gp fetch_llm_storage

!tablevault setup-temp -db test_database -t llm_questions -v version_1 \
-p fetch_llm_question -p question_1 -p question_2 -p  question_3 -gp fetch_llm_question


In [27]:
# View Current Folder Structure
print_directory_tree(root_dir = 'test_database/stories', indent=" ")

 📁 TEMP_
     📄 table.csv
     📁 prompts
         📄 fetch_stories.yaml
         📄 description.yaml
 📁 prompts
     📄 fetch_stories_5.yaml
     📄 fetch_stories.yaml


In [28]:

rel_path = './test_database/stories/TEMP_/prompts/description.yaml'

with open(rel_path, "r") as file:
    yaml_data = yaml.safe_load(file)
yaml_data

{'copied_prompts': ['fetch_stories'], 'table_generator': 'fetch_stories'}

# Execute Table Instances

In [30]:
# Make PDF Table
! tablevault execute -db test_database -t stories
print_directory_tree(root_dir = 'test_database/stories', indent=" ")

PARSED DEPENDENCIES AND TO EXECUTE PROMPTS
{'all_columns': ['paper_name', 'paper_path'],
 'external_deps': {'fetch_stories': []},
 'gen_columns': ['paper_name', 'paper_path'],
 'internal_prompt_deps': {'fetch_stories': []},
 'origin': None,
 'perm_instance_id': '_1738336210_hUshR',
 'start_time': 1738336210.472285,
 'to_change_columns': ['paper_name', 'paper_path'],
 'to_execute': ['fetch_stories'],
 'top_pnames': ['fetch_stories']}
execute code
FINISHED STEP
fetch_stories
FINISHED INSTANCE: _1738336210_hUshR
 📁 _1738336210_hUshR
     📄 table.csv
     📁 prompts
         📄 fetch_stories.yaml
         📄 description.yaml
 📁 prompts
     📄 fetch_stories_5.yaml
     📄 fetch_stories.yaml


In [31]:
import pandas as pd
df_path = 'test_database/stories/_1738336210_hUshR/table.csv'
df = pd.read_csv(df_path)
df

Unnamed: 0,paper_name,paper_path
0,The_Clockmakers_Secret,./test_data/stories/The_Clockmakers_Secret.pdf


In [32]:
# Make OpenAI Storage File (upload pdf to OpenAI) (Allow Multiple Threads)
! tablevault execute -db test_database -t llm_storage

PARSED DEPENDENCIES AND TO EXECUTE PROMPTS
{'all_columns': ['paper_name', 'openai_id'],
 'external_deps': {'fetch_llm_storage': [('stories',
                                          'paper_name',
                                          '_1738336210_hUshR',
                                          1738336210.48013,
                                          True)],
                   'upload_openai': [('stories',
                                      'paper_name',
                                      '_1738336210_hUshR',
                                      1738336210.48013,
                                      True),
                                     ('stories',
                                      'paper_path',
                                      '_1738336210_hUshR',
                                      1738336210.48013,
                                      True)]},
 'gen_columns': ['paper_name'],
 'internal_prompt_deps': {'fetch_llm_storage': [],
                       

In [35]:
# View Table
df_path = 'test_database/llm_storage/_1738336228_ltwSq/table.csv'
df = pd.read_csv(df_path)
df

Unnamed: 0,paper_name,openai_id
0,The_Clockmakers_Secret,file-LffagJ4CsHeE4Z4phAsW7Q


In [33]:
# Make LLM Question Response Table (Allow Multiple Threads)
! tablevault execute -db test_database -t llm_questions -v version_1

PARSED DEPENDENCIES AND TO EXECUTE PROMPTS
{'all_columns': ['paper_name', 'q1', 'q3_1', 'q3', 'q2_1', 'q2'],
 'external_deps': {'fetch_llm_question': [('stories',
                                           'paper_name',
                                           '_1738336210_hUshR',
                                           1738336210.48013,
                                           True)],
                   'question_1': [('llm_storage',
                                   'openai_id',
                                   '_1738336228_ltwSq',
                                   1738336229.295321,
                                   True),
                                  ('llm_storage',
                                   'paper_name',
                                   '_1738336228_ltwSq',
                                   1738336229.295321,
                                   True)],
                   'question_2': [('llm_storage',
                                   'openai_id',
    

In [37]:
df_path = 'test_database/llm_questions/version_1_1738336233_URPVb/table.csv'
pd.set_option("display.max_colwidth", None)

df = pd.read_csv(df_path)
df

Unnamed: 0,paper_name,q1,q3_1,q3,q2_1,q2
0,The_Clockmakers_Secret,"Clara's curiosity is a driving force in her journey and development throughout the story. Initially, her inquisitive nature leads her to discover the clockmaker's shop, a place that seems to reveal itself only to those who truly need it. This curiosity propels her into a world filled with magical clocks that not only measure time but also shape it, challenging her understanding of reality【5:1†source】.\n\nAs Clara visits the shop repeatedly, her curiosity deepens, allowing her to learn about the intricate workings of the clocks and the responsibilities that come with such knowledge. She is entrusted with small tasks by Percival Hawthorne, the clockmaker, which signifies her growing involvement and trustworthiness【5:0†source】. However, her curiosity also leads her to the forbidden Master Clock, where her desire to explore the unknown results in a significant disruption of time itself. This pivotal moment serves as a catalyst for her growth, forcing her to confront the consequences of her actions and the complexities of time【5:0†source】.\n\nThrough her journey, Clara learns valuable lessons about the nature of time, the importance of choices, and the impact of her actions on the world around her. By the end of the story, she emerges with a deeper understanding of time as a precious gift, carrying the lessons from her adventure into her future【5:0†source】. Clara's curiosity, while initially a source of wonder, ultimately becomes a means of personal growth and maturity as she navigates the challenges that arise from her explorations.","Based on the story, Clara can best be classified as a **""Protagonist seeking knowledge.""** Throughout the narrative, she exhibits a strong curiosity and a desire to explore the mysteries of time and the clocks in the shop. Her initial intrigue leads her to repeatedly visit the clockmaker's shop, where she learns about the special clocks and their powers. Although her actions lead to unintended consequences, her journey is driven by a quest for understanding and knowledge about time, rather than recklessness or reluctance【5:0†source】.",Protagonist seeking knowledge,"Percival's motivation to help Clara, even after she disrupts time, can be attributed to several reasons:\n\n1. **Sense of Responsibility**: Percival feels a duty to guide Clara through the consequences of her actions. He understands the gravity of tampering with time and takes it upon himself to help her fix the chaos she has caused.\n\n2. **Mentorship**: Throughout their interactions, Percival has developed a mentor-mentee relationship with Clara. He has invested time in teaching her about the clocks and the nature of time, which likely motivates him to assist her in rectifying her mistake.\n\n3. **Empathy**: Percival shows a mix of anger and sorrow when Clara disrupts time, indicating that he empathizes with her plight. He recognizes that her actions were not malicious but stemmed from curiosity and a desire to learn.\n\n4. **Belief in Potential**: Percival may see potential in Clara. By helping her, he not only aids her in correcting her mistake but also helps her grow and learn from the experience, reinforcing the idea that mistakes can lead to valuable lessons.\n\n5. **Connection to Time**: As a clockmaker, Percival has a deep connection to the concept of time. His understanding of its complexities may drive him to ensure that time is restored to its proper flow, as it aligns with his life's work and passion.\n\nThese motivations reflect a blend of personal responsibility, mentorship, empathy, and a commitment to the integrity of time itself【5:0†source】.","['Sense of Responsibility', 'Mentorship', 'Empathy', 'Belief in Potential', 'Connection to Time']"


# Table Versioning

We automatically do not re-execute tables if there hasn't been any changes. We only execute cases where the rows (samples), columns (prompts), or dependencies (other tables) have changed by default. 

In [40]:
# Copy Previous Instances
!tablevault setup-temp -db test_database -t stories -pid _1738336210_hUshR

!tablevault setup-temp -db test_database -t llm_storage -pid _1738336228_ltwSq

!tablevault setup-temp -db test_database -t llm_questions -pid version_1_1738336233_URPVb -v version_1

TEMP_ folder already exists for stories
TEMP_ folder already exists for llm_storage


In [41]:
# Execute new Instances
! tablevault execute -db test_database -t stories
! tablevault execute -db test_database -t llm_storage
! tablevault execute -db test_database -t llm_questions -v version_1

PARSED DEPENDENCIES AND TO EXECUTE PROMPTS
{'all_columns': ['paper_name', 'paper_path'],
 'external_deps': {'fetch_stories': []},
 'gen_columns': ['paper_name', 'paper_path'],
 'internal_prompt_deps': {'fetch_stories': []},
 'origin': '_1738336210_hUshR',
 'perm_instance_id': '_1738336922_ktuop',
 'start_time': 1738336922.105401,
 'to_change_columns': [],
 'to_execute': [],
 'top_pnames': ['fetch_stories']}
execute code
NO UPDATES: NOTHING HAPPENS.
PARSED DEPENDENCIES AND TO EXECUTE PROMPTS
{'all_columns': ['paper_name', 'openai_id'],
 'external_deps': {'fetch_llm_storage': [('stories',
                                          'paper_name',
                                          '_1738336210_hUshR',
                                          1738336210.48013,
                                          True)],
                   'upload_openai': [('stories',
                                      'paper_path',
                                      '_1738336210_hUshR',
                 

In [43]:
print_directory_tree(root_dir = 'test_database/stories', indent=" ")

 📁 _1738336210_hUshR
     📄 table.csv
     📁 prompts
         📄 fetch_stories.yaml
         📄 description.yaml
 📁 TEMP_
     📄 table.csv
     📁 prompts
         📄 fetch_stories.yaml
         📄 description.yaml
 📁 prompts
     📄 fetch_stories_5.yaml
     📄 fetch_stories.yaml


**NOTE:** If changes happen (columns, rows, dependencies) only table entries affected by those changes are executed

# Delete Instance and Delete Table

Deletes are simple but need to be executed with the command lines, so that the metadata is correctly updated. By default nothing is every deleted.

In [44]:
# Delete table instance 
! tablevault delete-instance -db test_database -t stories -id _1738336210_hUshR

In [48]:
# Delete table
! tablevault delete-table -db test_database -t llm_storage

In [55]:
# Show new folder and metadata
print_directory_tree(root_dir = 'test_database', indent=" ", files = False)

 📁 llm_questions
     📁 TEMP_version_1
         📁 prompts
     📁 version_1_1738336233_URPVb
         📁 prompts
     📁 prompts
 📁 locks
     📁 llm_questions
     📁 RESTART
     📁 stories
 📁 stories
     📁 TEMP_
         📁 prompts
     📁 prompts
 📁 metadata
 📁 code_functions


# Restarts

Not included in demo because requires mid-process stopping, but tested.