# A Tutorial for the phi_generation Library: Data-Driven Prompting and RAG (Direct Function Calls)

## Introduction

This tutorial demonstrates the key features of the `phi_generation` library, focusing on data ingestion, preparing data for a Retrieval-Augmented Generation (RAG) pipeline, and querying the RAG database with optional CSV context.

The library aims to provide focused and relevant answers from language models by grounding them in specific data.

**Note:** This tutorial assumes you have the `phi_generation` library installed.

 After making changes to the code, you need to rebuild and reinstall the package.
 Run the following commands in your `phi_generation` project root:

In [3]:
python -m build
pip install dist/code_files-0.1.0.tar.gz --upgrade

SyntaxError: invalid syntax (2719185373.py, line 1)

## 1. Preparing Data from CSV for RAG

This section demonstrates how to convert structured data in a CSV file into a Markdown format suitable for the RAG database using the `utils` module.

In [1]:
import os
from code_files import utils as ut

# Define the path to your CSV file
csv_file_path = r"C:\Users\noliv\Downloads\structured_data_filled.csv"  

# Convert the CSV to a Markdown table
markdown_table = ut.csv_to_markdown_table(csv_file_path)

Some error handling on the above:

In [2]:
if isinstance(markdown_table, str) and markdown_table.startswith("Error"):
    print(markdown_table)
else:
    print("Successfully converted CSV to Markdown table.")
    print("\n--- Markdown Table Preview ---")
    print(markdown_table[:500] + "...\n--- End Preview ---") # Preview a part of the table

    # Save the Markdown table to the data directory
    save_message = ut.var_markdown_to_data_dir(markdown_table, filename="from_csv_data.md")
    print(save_message)

    # Copy the generated Markdown file to the RAG data directory
    # Assuming your notebook is in the project root
    local_md_path = "code_files/rag_module/data/from_csv_data.md"
    ut.local_markdown_to_data_dir(local_md_path)

Successfully converted CSV to Markdown table.

--- Markdown Table Preview ---
| Patient Name   |   Patient Age |   Alzheimers |   Anxiety |   Arthritis |   Behavior |   Bipolar |   Cannabis |   Cardio |   Chronic Disease |   Depression |   Diabetes |   Dieting |   Disabilities |   Drug-Induced Delirium |   Exercise |   Gastrointestinal |   Getting Worse or Not Better |   Hospital Admission |   Hospital Readmission |   Hypertension |   Kidney Disease |   Long-Term Care |   Memory Care |   Mental Health Questionnaire |   Obesity/Metabolic |   Osteoarthritis |   Pain |   Pre...
--- End Preview ---
Markdown written to c:\Users\noliv\phi_generation_repo_destination\phi_generation\code_files\..\code_files/data\from_csv_data.md
Error: File not found at code_files/rag_module/data/from_csv_data.md


## 2. Adding Additional Documents to the Data Directory (Optional)

You can add more Markdown files to the 'data' directory, which will be processed by the database creation step.

In [4]:
# Example 1: Adding content directly
document_content = """
# This is an example document

This document contains some additional information for the RAG database.
"""
ut.var_markdown_to_data_dir(document_content, filename = "sample_text.md")
print("Added example_document.md to the data directory.")

Added example_document.md to the data directory.


In [None]:
# Example 2: Copying a local Markdown file

# You can replace with the path to your local .md file, or add many files in a path at once with this function
local_file_to_copy = "path/to/your/local_document.md" 
copy_result = ut.copy_file_to_dir(local_file_to_copy, "code_files/data", enforce_md=True)
print(copy_result)

## 3. Creating the RAG Database

Now that we have our data in the 'data' directory (including the Markdown generated from the CSV and any additional documents), we'll create the Chroma vector database using the `create_database` module.

In [None]:
from code_files.rag_module import create_database as crd

# Run the database generation process
crd.generate_data_store()
print("\nRAG database generation complete.")

## 4. Querying the RAG Database with Optional CSV Context (Direct Function Call)

This section demonstrates querying the RAG database using the `query_rag_with_csv` function from the `query_data` module.


In [None]:
from code_files.rag_module import query_data as qd

# Define the query text
query = "Answer the question based on the information provided."

In [None]:
# Option A: Query without providing a CSV file
print("\n--- Querying without CSV Context ---")
response_no_csv = qd.query_rag_with_csv(query)
print(response_no_csv)

In [None]:

# Option B: Query with a CSV file providing additional context (Direct Function Call)
print("\n--- Querying with CSV Context ---")
csv_file_for_query = "path/to/your/supplemental_data.csv" # Replace with a relevant CSV path
response_with_csv = qd.query_rag_with_csv(query, csv_file_for_query)
print(response_with_csv)

## 5. Comparing Embeddings (Direct Function Call)

The `compare_embeddings` module allows you to query the database directly and see the retrieved documents along with their relevance scores.

In [None]:
from code_files.rag_module import compare_embeddings as ce
print("\n--- Comparing Embeddings ---")
query_for_comparison = "What are some key details?"
embedding_comparison_results = ce.compare_embeddings_query(query_for_comparison)
print(embedding_comparison_results)

## Conclusion

This tutorial demonstrated how to use the `phi_generation` library to ingest data from CSV files and local Markdown files, build a RAG database, and query it with the option to include CSV data directly in the prompt. The `utils` module provides helpful functions for data handling, and the direct function calls offer a more integrated experience within the Jupyter Notebook. 