# Tutorial

## RAG tool

The first step of this pipeline involves answering a prompt based on data. Specifically, this package is designed to take in structured data (in the form of a csv), as well as style supplements.  In querying a non-RAG like ChatGPT, the model often inserts unwanted context (e.g., history of a medication or disease, or its other uses not related to the patient's needs) that does not reflect the tone of true doctor's notes. This tool is designed to mitigate those effects, or to assess the future need for fine-tuning an LLM on patient data.

In [3]:
from code_files import utils as ut
from code_files.telephone_module import telephone as tm
from code_files.rag_module import create_database as crd


In [None]:
# any time you write new functions you have to rebuild and reinstall
# run from phi_generation project
# python -m build
# pip install dist/code_files-0.1.0.tar.gz
# and then restart the kernel and redo the imports
crd.generate_data_store()

--- Generating Data Store ---


libmagic is unavailable but assists in filetype detection. Please consider installing libmagic for better results.
libmagic is unavailable but assists in filetype detection. Please consider installing libmagic for better results.
libmagic is unavailable but assists in filetype detection. Please consider installing libmagic for better results.


Loaded 3 documents.
Split documents into 97 chunks.
Saved 97 chunks to chroma.
--- Data Store Generation Complete ---


  vector_store.persist()


As we have seen, the RAG is not capable of viewing style context, only of performing retrieval.

To add your style supplements to the database, which is drawn from 'code_files > rag_module > data', they will need to be structured as .md documents. You can use the built-in function to add 

First, for any data contained in your prompt, it must be contained in both the database and the prompt to achieve a high enough relevance score to produce the response.
In a similar manner to your context documents, use the utils.py functions to 

In [None]:
import csv

csv_file_path = "path/to/your/patient_data.csv"  # Replace with your actual path

try:
    with open(csv_file_path, 'r', newline='') as csvfile:
        reader = csv.DictReader(csvfile)
        patient_data = list(reader)

    print("\n--- Loaded CSV Data ---")
    for row in patient_data:
        print(row)
    print("--- End Loaded CSV Data ---")

except FileNotFoundError:
    print(f"Error: CSV file not found at {csv_file_path}")
except Exception as e:
    print(f"An error occurred while loading CSV: {e}")

# A Tutorial for the phi_generation Library: Data-Driven Prompting and RAG (Direct Function Calls)

## Introduction

This tutorial demonstrates the key features of the `phi_generation` library, focusing on data ingestion, preparing data for a Retrieval-Augmented Generation (RAG) pipeline, and querying the RAG database with optional CSV context.

The library aims to provide focused and relevant answers from language models by grounding them in specific data.

**Note:** This tutorial assumes you have the `phi_generation` library installed.

 After making changes to the code, you need to rebuild and reinstall the package.
 Run the following commands in your `phi_generation` project root:

In [None]:
python -m build
pip install dist/code_files-0.1.0.tar.gz --upgrade

SyntaxError: invalid syntax (2719185373.py, line 1)

## 1. Preparing Data from CSV for RAG

This section demonstrates how to convert structured data in a CSV file into a Markdown format suitable for the RAG database using the `utils` module.

In [None]:
import os
from code_files import utils as ut

# Define the path to your CSV file
csv_file_path = r"C:\Users\noliv\Downloads\structured_data_filled.csv"  

# Convert the CSV to a Markdown table
markdown_table = ut.csv_to_markdown_table(csv_file_path)

Some error handling on the above:

In [None]:
if isinstance(markdown_table, str) and markdown_table.startswith("Error"):
    print(markdown_table)
else:
    print("Successfully converted CSV to Markdown table.")
    print("\n--- Markdown Table Preview ---")
    print(markdown_table[:500] + "...\n--- End Preview ---") # Preview a part of the table

    # Save the Markdown table to the data directory
    save_message = ut.var_markdown_to_data_dir(markdown_table, filename="from_csv_data.md")
    print(save_message)

    # Copy the generated Markdown file to the RAG data directory
    # Assuming your notebook is in the project root
    local_md_path = "code_files/rag_module/data/from_csv_data.md"
    ut.local_markdown_to_data_dir(local_md_path)

Successfully converted CSV to Markdown table.

--- Markdown Table Preview ---
| Patient Name   |   Patient Age |   Alzheimers |   Anxiety |   Arthritis |   Behavior |   Bipolar |   Cannabis |   Cardio |   Chronic Disease |   Depression |   Diabetes |   Dieting |   Disabilities |   Drug-Induced Delirium |   Exercise |   Gastrointestinal |   Getting Worse or Not Better |   Hospital Admission |   Hospital Readmission |   Hypertension |   Kidney Disease |   Long-Term Care |   Memory Care |   Mental Health Questionnaire |   Obesity/Metabolic |   Osteoarthritis |   Pain |   Pre...
--- End Preview ---
Markdown written to c:\Users\noliv\phi_generation_repo_destination\phi_generation\code_files\..\code_files/rag_module/data\from_csv_data.md


## 2. Adding Additional Documents to the Data Directory (Optional)

You can add more Markdown files to the 'data' directory, which will be processed by the database creation step.

In [None]:
# Example 1: Adding content directly
document_content = """
# This is an example document

This document contains some additional information for the RAG database.
"""
ut.var_markdown_to_data_dir(document_content, filename = "sample_text.md")
print("Added example_document.md to the data directory.")

Added example_document.md to the data directory.


In [None]:
# Example 2: Copying a local Markdown file

# You can replace with the path to your local .md file, or add many files in a path at once with this function

# This is not an .md so it will throw an error
local_file_to_copy = r"C:\Users\noliv\Downloads\s41592-022-01728-4.pdf" 
copy_result = ut.local_markdown_to_data_dir(local_file_to_copy, "code_files/data", enforce_md=True)
print(copy_result)

Error: Only files with the '.md' extension can be copied.


## 3. Creating the RAG Database

Now that we have our data in the 'data' directory (including the Markdown generated from the CSV and any additional documents), we'll create the Chroma vector database using the `create_database` module.

In [None]:
from code_files.rag_module import create_database as crd

# Run the database generation process
crd.generate_data_store()
print("\nRAG database generation complete.")

--- Generating Data Store ---


libmagic is unavailable but assists in filetype detection. Please consider installing libmagic for better results.
libmagic is unavailable but assists in filetype detection. Please consider installing libmagic for better results.
libmagic is unavailable but assists in filetype detection. Please consider installing libmagic for better results.
libmagic is unavailable but assists in filetype detection. Please consider installing libmagic for better results.
libmagic is unavailable but assists in filetype detection. Please consider installing libmagic for better results.


Loaded 5 documents.
Split documents into 100 chunks.
Error saving to Chroma: [WinError 32] The process cannot access the file because it is being used by another process: 'chroma\\daee5f4c-9b22-493e-b188-e19f8df24101\\data_level0.bin'
--- Data Store Generation Complete ---

RAG database generation complete.


## 4. Querying the RAG Database with Optional CSV Context (Direct Function Call)

This section demonstrates querying the RAG database using the `query_rag_with_csv` function from the `query_data` module.


This is an unhelpful query, and we expect the response to indicate as such.

In [None]:
from code_files.rag_module import query_data as qd

# Define the query text
query = "Answer the question based on the information provided."

In [None]:
# Option A: Query without providing a CSV file
print("\n--- Querying without CSV Context ---")
response_no_csv = qd.query_rag_with_csv(query)
print(response_no_csv)


--- Querying without CSV Context ---


  db = Chroma(persist_directory=CHROMA_PATH, embedding_function=embedding_function)
  response_text = model.predict(prompt)


Response: I'm sorry, but you have not provided any specific question or CSV data to work with. Can you please provide more details or information related to your question?
Sources: ['C:\\Users\\noliv\\phi_generation_repo_destination\\phi_generation\\code_files\\rag_module\\data\\connor_soap_therapy.md', 'C:\\Users\\noliv\\phi_generation_repo_destination\\phi_generation\\code_files\\rag_module\\data\\connor_soap_therapy.md', 'C:\\Users\\noliv\\phi_generation_repo_destination\\phi_generation\\code_files\\rag_module\\data\\connor_soap_therapy.md']


In [None]:

# Option B: Query with a CSV file providing additional context (Direct Function Call)
print("\n--- Querying with CSV Context ---")

# Replace below with your CSV path:
better_query = "Create a patient profile for Kathryn in SOAP format using the structured patient data provided for Kathryn in the .csv file attached, as well as style supplements for writing a SOAP note in the database. Do not list features of patient, but rather explain in sentences."
csv_file_for_query = r"C:\Users\noliv\Downloads\structured_data_filled.csv" 
response_with_csv = qd.query_rag_with_csv(better_query, csv_file_for_query)
print(response_with_csv)


--- Querying with CSV Context ---
Response: Subjective: Kathryn is a 26-year-old individual who reports experiencing anxiety and behavioral issues. She denies a history of Alzheimer's Disease and chronic diseases.

Objective: Kathryn's structured patient data indicates that she does not have Alzheimer's Disease, diabetes, or disabilities. She does, however, report anxiety, behavior problems, and a regular exercise routine.

Assessment: Kathryn's overall physical health appears stable, with a focus on managing anxiety and behavioral concerns.

Plan: Recommend continued monitoring of anxiety symptoms and behavior patterns, as well as encouraging the maintenance of regular exercise regimen for overall well-being. Consider further evaluation or interventions if necessary for managing anxiety and behavior issues.
Sources: ['C:\\Users\\noliv\\phi_generation_repo_destination\\phi_generation\\code_files\\rag_module\\data\\meta_report_apso_soap.md', 'C:\\Users\\noliv\\phi_generation_repo_desti

This looks promising, and it even drew from the style guide without metadata indicating it was a style guide (there are options for this in the functions that add to database), but note that subjective and objective are the same. Future tasks include to adjust context window, try with various temperatures (and observe error propagation), and improve prompting.

## 5. Comparing Embeddings (Optional : Direct Function Call for Error Handling)

The `compare_embeddings` module allows you to query the database directly and see the retrieved documents along with their relevance scores.

In [None]:
from code_files.rag_module import compare_embeddings as ce
print("\n--- Comparing Embeddings ---")
query_for_comparison = "What are some key details?"
embedding_comparison_results = ce.compare_embeddings_query(query_for_comparison)
print(embedding_comparison_results)


--- Comparing Embeddings ---
--- Matching Results ---
Score: 0.7201408587167824
Content: Plan:
Source: C:\Users\noliv\phi_generation_repo_destination\phi_generation\code_files\rag_module\data\connor_soap_therapy.md
---
Score: 0.7193233152227922
Content: Guiding Principles
Source: C:\Users\noliv\phi_generation_repo_destination\phi_generation\code_files\rag_module\data\meta_report_apso_soap.md
---
Score: 0.710408245238374
Content: Objective:
Source: C:\Users\noliv\phi_generation_repo_destination\phi_generation\code_files\rag_module\data\connor_soap_therapy.md
---
--- Query Ended ---



## 6. Generating Structured to Unstructured PHI

This section uses the `st_to_unst` module to query a vanilla LLM (i.e., without RAG) to create a doctor's record from a .csv file containing structured PHI for one or more patients. For more specific formatting instructions, such as record templates, compare results qualitatively and with the telephone module. 

This section demonstrates how to use the `st_to_unst` module to:
1. Generate patient health records from structured CSV data.
2. Create an AnnData object to store both structured and unstructured data.
3. Convert the unstructured data to a JSON file.

If you do not readily have csv data, you can use the following example csv:

In [None]:
# ```csv
# PatientID,Name,Age,Gender,ConditionA,ConditionB,MedicationX,MedicationY
# 1,Alice,30,Female,0,1,1,0
# 2,Bob,25,Male,1,0,0,1
# ```

In [None]:
from code_files.st_to_unst_module import st_to_unst as sun
import pandas as pd

# Define the path to your patient data CSV file
patient_csv_file = r"C:\Users\noliv\Downloads\structured_data_filled.csv" # Replace with the actual path


# 1. Generate an AnnData object
print("\n--- Creating AnnData Object ---")
try:
    adata = sun.create_anndata_from_csv(patient_csv_file, patient_id_columns=["PatientID", "Name"])
    print(adata)
    print("\nSample patient records:")
    print(adata.obs.head())
    print("\nSample unstructured data:")
    print(adata.X.head())
except FileNotFoundError:
    print(f"Error: CSV file not found at {patient_csv_file}")
except Exception as e:
    print(f"An error occurred: {e}")


--- Creating AnnData Object ---


  llm = OpenAI(temperature=0.7)  # You can adjust temperature
  report = llm.predict(introduction_prompt)


AnnData object with n_obs × n_vars = 7 × 2
    obs: 'Patient Name', 'Patient Age', 'Alzheimers', 'Anxiety', 'Arthritis', 'Behavior', 'Bipolar', 'Cannabis', 'Cardio', 'Chronic Disease', 'Depression', 'Diabetes', 'Dieting', 'Disabilities', 'Drug-Induced Delirium', 'Exercise', 'Gastrointestinal', 'Getting Worse or Not Better', 'Hospital Admission', 'Hospital Readmission', 'Hypertension', 'Kidney Disease', 'Long-Term Care', 'Memory Care', 'Mental Health Questionnaire', 'Obesity/Metabolic', 'Osteoarthritis', 'Pain', 'Prediabetes', 'Quality of Life', 'Semaglutide', 'Sleep', 'Stress'
    uns: 'csv_filename'

Sample patient records:
  Patient Name Patient Age Alzheimers Anxiety Arthritis Behavior Bipolar  \
0      Kathryn          26          0       1       N/A        1       0   
1      Michael          38          0       0         0        0       0   
2        Kevin          17          0       1         0        0       0   
3        Megan          25          0       1         0        



In [None]:
from code_files.st_to_unst_module import st_to_unst as sun
import pandas as pd

# Define the path to your patient data CSV file
patient_csv_file = r"C:\Users\noliv\Downloads\structured_data_filled.csv" # Replace with the actual path



print("\n--- Creating AnnData Object from CSV ---")
try:
    adata = sun.create_anndata_from_csv(patient_csv_file, patient_id_columns=["PatientID", "Name"])

    # Explore the AnnData object
    print(adata)  # Print basic information
    print("\nSample of structured patient data (adata.obs):")
    print(adata.obs.head())  # Show the first few rows of the original CSV data

    print("\nSample of unstructured patient reports (adata.X):")
    if isinstance(adata.X, pd.DataFrame):
        print(adata.X.head())
    else:
        print("adata.X is not a DataFrame. Showing raw data:")
        print(adata.X)  # Or print(adata.X.toarray()) if it's a sparse matrix

except FileNotFoundError:
    print(f"Error: CSV file not found at {patient_csv_file}")
except Exception as e:
    print(f"An error occurred: {e}")


--- Creating AnnData Object from CSV ---
AnnData object with n_obs × n_vars = 7 × 2
    obs: 'Patient Name', 'Patient Age', 'Alzheimers', 'Anxiety', 'Arthritis', 'Behavior', 'Bipolar', 'Cannabis', 'Cardio', 'Chronic Disease', 'Depression', 'Diabetes', 'Dieting', 'Disabilities', 'Drug-Induced Delirium', 'Exercise', 'Gastrointestinal', 'Getting Worse or Not Better', 'Hospital Admission', 'Hospital Readmission', 'Hypertension', 'Kidney Disease', 'Long-Term Care', 'Memory Care', 'Mental Health Questionnaire', 'Obesity/Metabolic', 'Osteoarthritis', 'Pain', 'Prediabetes', 'Quality of Life', 'Semaglutide', 'Sleep', 'Stress'
    uns: 'csv_filename'

Sample of structured patient data (adata.obs):
  Patient Name Patient Age Alzheimers Anxiety Arthritis Behavior Bipolar  \
0      Kathryn          26          0       1       N/A        1       0   
1      Michael          38          0       0         0        0       0   
2        Kevin          17          0       1         0        0       0  



In [None]:
import csv

csv_file_path =  r"C:\Users\noliv\Downloads\structured_data_filled.csv" # Replace with your actual path

try:
    with open(csv_file_path, 'r', newline='') as csvfile:
        reader = csv.DictReader(csvfile)
        patient_data = list(reader)

    print("\n--- Loaded CSV Data ---")
    for row in patient_data:
        print(row)
    print("--- End Loaded CSV Data ---")

except FileNotFoundError:
    print(f"Error: CSV file not found at {csv_file_path}")
except Exception as e:
    print(f"An error occurred while loading CSV: {e}")


--- Loaded CSV Data ---
{'Patient Name': 'Kathryn', 'Patient Age': '26', 'Alzheimers': '0', 'Anxiety': '1', 'Arthritis': 'N/A', 'Behavior': '1', 'Bipolar': '0', 'Cannabis': '1', 'Cardio': '1', 'Chronic Disease': '0', 'Depression': 'N/A', 'Diabetes': '0', 'Dieting': '0', 'Disabilities': '0', 'Drug-Induced Delirium': '0', 'Exercise': '0', 'Gastrointestinal': '0', 'Getting Worse or Not Better': '0', 'Hospital Admission': '0', 'Hospital Readmission': '0', 'Hypertension': '0', 'Kidney Disease': '0', 'Long-Term Care': '0', 'Memory Care': '0', 'Mental Health Questionnaire': '0', 'Obesity/Metabolic': '0', 'Osteoarthritis': '0', 'Pain': '0', 'Prediabetes': '0', 'Quality of Life': '0', 'Semaglutide': '0', 'Sleep': '0', 'Stress': '0'}
{'Patient Name': 'Michael', 'Patient Age': '38', 'Alzheimers': '0', 'Anxiety': '0', 'Arthritis': '0', 'Behavior': '0', 'Bipolar': '0', 'Cannabis': '0', 'Cardio': '1', 'Chronic Disease': '0', 'Depression': '0', 'Diabetes': '0', 'Dieting': '0', 'Disabilities': '0', '

In [None]:
import csv

csv_file_path = r"C:\Users\noliv\Downloads\structured_data_filled.csv"  # Replace with your actual path

try:
    with open(csv_file_path, 'r', newline='') as csvfile:
        reader = csv.DictReader(csvfile)
        patient_data = list(reader)

    print("\n--- Loaded CSV Data ---")
    for row in patient_data:
        print(row)
    print("--- End Loaded CSV Data ---")

except FileNotFoundError:
    print(f"Error: CSV file not found at {csv_file_path}")
except Exception as e:
    print(f"An error occurred while loading CSV: {e}")


--- Loaded CSV Data ---
{'Patient Name': 'Kathryn', 'Patient Age': '26', 'Alzheimers': '0', 'Anxiety': '1', 'Arthritis': 'N/A', 'Behavior': '1', 'Bipolar': '0', 'Cannabis': '1', 'Cardio': '1', 'Chronic Disease': '0', 'Depression': 'N/A', 'Diabetes': '0', 'Dieting': '0', 'Disabilities': '0', 'Drug-Induced Delirium': '0', 'Exercise': '0', 'Gastrointestinal': '0', 'Getting Worse or Not Better': '0', 'Hospital Admission': '0', 'Hospital Readmission': '0', 'Hypertension': '0', 'Kidney Disease': '0', 'Long-Term Care': '0', 'Memory Care': '0', 'Mental Health Questionnaire': '0', 'Obesity/Metabolic': '0', 'Osteoarthritis': '0', 'Pain': '0', 'Prediabetes': '0', 'Quality of Life': '0', 'Semaglutide': '0', 'Sleep': '0', 'Stress': '0'}
{'Patient Name': 'Michael', 'Patient Age': '38', 'Alzheimers': '0', 'Anxiety': '0', 'Arthritis': '0', 'Behavior': '0', 'Bipolar': '0', 'Cannabis': '0', 'Cardio': '1', 'Chronic Disease': '0', 'Depression': '0', 'Diabetes': '0', 'Dieting': '0', 'Disabilities': '0', '

## Conclusion

This tutorial demonstrated how to use the `phi_generation` library to ingest data from CSV files and local Markdown files, build a RAG database, and query it with the option to include CSV data directly in the prompt. The `utils` module provides helpful functions for data handling, and the direct function calls offer a more integrated experience within the Jupyter Notebook. 

# A Tutorial for the phi_generation Library: Data-Driven Prompting and RAG


## Introduction


This tutorial demonstrates the key features of the `phi_generation` library, focusing on data ingestion from CSV, preparing data for a Retrieval-Augmented Generation (RAG) pipeline, and querying the RAG database with CSV context.

The library is designed to address the challenge of language models inserting unwanted context in responses, especially in sensitive domains like medical notes. By grounding the model in specific data and controlling the prompt, this tool aims to provide more focused and relevant answers.



**Note:** This tutorial assumes you have the `phi_generation` library installed.

After making changes to the code, you need to rebuild and reinstall the package.
Run the following commands in your `phi_generation` project root:

In [None]:
# !python -m build
# !pip install dist/phi_generation-0.1.0.tar.gz --upgrade

Then, restart the Jupyter kernel to ensure the updated package is used.

## 1. Preparing Data from CSV for RAG

This section demonstrates how to convert structured data 
in a CSV file into a format suitable for the RAG database. We'll use the `utils` module to convert the CSV to a Markdown table and then save it as a `.md` file in the data directory.

### Converting CSV to Markdown Table

In [None]:
import os
from code_files import utils as ut

# Define the path to your CSV file (for example: user will cahange)

csv_file_path = r"C:\Users\noliv\Downloads\structured_data_filled.csv"

# Convert the CSV to a Markdown table
markdown_table = ut.csv_to_markdown_table(csv_file_path)

if isinstance(markdown_table, str) and markdown_table.startswith("Error"):
    print(markdown_table)
else:
    print("Successfully converted CSV to Markdown table.")
    print("\n--- Markdown Table Preview ---")
    print(markdown_table[:500] + "...\n--- End Preview ---") # Preview a part of the table

    # Save the Markdown table to the data directory
    save_message = ut.markdown_table_to_data_dir(markdown_table, filename="from_csv_data.md")
    print(save_message)

    # Copy the generated Markdown file to the RAG data directory
    # Assuming your notebook is in the project root
    local_md_path = "code_files/data/from_csv_data.md"
    ut.copy_markdown_to_rag_data(local_md_path)


Successfully converted CSV to Markdown table.

--- Markdown Table Preview ---
| Patient Name   |   Patient Age |   Alzheimers |   Anxiety |   Arthritis |   Behavior |   Bipolar |   Cannabis |   Cardio |   Chronic Disease |   Depression |   Diabetes |   Dieting |   Disabilities |   Drug-Induced Delirium |   Exercise |   Gastrointestinal |   Getting Worse or Not Better |   Hospital Admission |   Hospital Readmission |   Hypertension |   Kidney Disease |   Long-Term Care |   Memory Care |   Mental Health Questionnaire |   Obesity/Metabolic |   Osteoarthritis |   Pain |   Pre...
--- End Preview ---
Markdown table written to c:\Users\noliv\phi_generation_repo_destination\phi_generation\code_files\..\data\from_csv_data.md
Error: File not found at code_files/data/from_csv_data.md


## 2. Creating the RAG Database

Now that we have our data (potentially supplemented by other `.md` files in the `code_files/rag_module/data` directory), we'll create the Chroma vector database using the `create_database` module. This process loads the documents, splits them into chunks, generates embeddings, and saves them to the `chroma` directory.

In [None]:
from code_files.rag_module import create_database as crd

# Run the database generation process
crd.generate_data_store()
print("\nRAG database generation complete.")

libmagic is unavailable but assists in filetype detection. Please consider installing libmagic for better results.
libmagic is unavailable but assists in filetype detection. Please consider installing libmagic for better results.
libmagic is unavailable but assists in filetype detection. Please consider installing libmagic for better results.


--- Generating Data Store ---
Loaded 3 documents.
Split documents into 97 chunks.
Error saving to Chroma: [WinError 32] The process cannot access the file because it is being used by another process: 'chroma\\daee5f4c-9b22-493e-b188-e19f8df24101\\data_level0.bin'
--- Data Store Generation Complete ---

RAG database generation complete.


## 3. Querying the RAG Database with Optional CSV Context

This section demonstrates how to query the RAG database using the `query_data` module. Notably, this version of the query tool allows you to optionally provide a CSV file path, whose content will be included in the prompt to the language model along with the retrieved context.

### Option A: Querying witout a CSV file

Example with an unhelpful query

In [None]:
from code_files.rag_module import query_data as qd

# Define the query text
query = "Answer the question based on the information provided."

# Option 1: Query without providing a CSV file
print("\n--- Querying without CSV Context ---")
response_no_csv = qd.query_rag_with_csv(query)
print(response_no_csv)


--- Querying without CSV Context ---
Response: I'm sorry, there is no question provided in the context given. Can you please provide more information or specify the question you would like me to answer?
Sources: ['C:\\Users\\noliv\\phi_generation_repo_destination\\phi_generation\\code_files\\rag_module\\data\\connor_soap_therapy.md', 'C:\\Users\\noliv\\phi_generation_repo_destination\\phi_generation\\code_files\\rag_module\\data\\connor_soap_therapy.md', 'C:\\Users\\noliv\\phi_generation_repo_destination\\phi_generation\\code_files\\rag_module\\data\\connor_soap_therapy.md']


Example with a more helpful query

### Option B: Query with a CSV file that provides additonal context to the prompt

In [None]:
print("\n--- Querying with CSV Context ---")
csv_file_for_query = "path/to/your/supplemental_data.csv" # Replace with a relevant CSV path

try:
    result = subprocess.run(
        ["python", "code_files/rag_module/query_data.py", query, "--csv_file", csv_file_for_query],
        capture_output=True,
        text=True,
        check=True
    )
    print(result.stdout)
    if result.stderr:
        print(f"Error during query with CSV: {result.stderr}")
except subprocess.CalledProcessError as e:
    print(f"Error running query_data.py with CSV: {e}")
except FileNotFoundError:
    print("Error: query_data.py or the specified CSV file not found. Ensure the paths are correct.")



--- Querying with CSV Context ---
Error running query_data.py with CSV: Command '['python', 'code_files/rag_module/query_data.py', 'Answer the question based on the information provided.', '--csv_file', 'path/to/your/supplemental_data.csv']' returned non-zero exit status 1.


## 4. Comparing Embeddings (Optional)

The `compare_embeddings` module allows you to query the database directly and see the retrieved documents along with their relevance scores. This can be useful for understanding how well the embeddings are working.

In [None]:
print("\n--- Comparing Embeddings ---")
query_for_comparison = "What are some key details?"

try:
    result = subprocess.run(
        ["python", "code_files/rag_module/compare_embeddings.py", query_for_comparison],
        capture_output=True,
        text=True,
        check=True
    )
    print(result.stdout)
    if result.stderr:
        print(f"Error during embedding comparison: {result.stderr}")
except subprocess.CalledProcessError as e:
    print(f"Error running compare_embeddings.py: {e}")
except FileNotFoundError:
    print("Error: compare_embeddings.py not found. Ensure the path is correct.")


--- Comparing Embeddings ---
Error running compare_embeddings.py: Command '['python', 'code_files/rag_module/compare_embeddings.py', 'What are some key details?']' returned non-zero exit status 1.


## Conclusion

This tutorial demonstrated how to use the `phi_generation` library to ingest data from CSV files, build a RAG database, and query it with the option to include CSV data directly in the prompt. This approach allows for more controlled and context-aware interactions with language models. Remember to replace the placeholder file paths with your actual file locations.