# Query ChromaDB Vector Store

This notebook demonstrates how to query the ChromaDB vector store we created for SQL RAG.

In [112]:
# Import necessary modules
import sys
import os
import pandas as pd

# Add the src directory to path so we can import our modules
sys.path.append("../src")

# Import our ChromaVectorStore class
from rag.vectorstore.chroma import ChromaVectorStore

## Connect to the Existing Vector Store

First, let's connect to our existing ChromaDB vector store that we populated with SQL examples.

In [113]:
# Path to the vector store
vector_store_path = "../src/rag/vectorstore"

# Initialize the vector store with the existing data
vector_store = ChromaVectorStore(persist_directory=vector_store_path)

print("Connected to the ChromaDB vector store")

Connected to the ChromaDB vector store


## Retrieve a Single Result

Now let's query the vector store to get just one relevant result.

In [114]:
# Define a sample question to query the vector store
sample_question = "How many singers do we have?"

# Set k=1 to get exactly one result
results = vector_store.retrieve_relevant_question_sql(sample_question, k=3)

for result in results:
    print(f"Similarity: {result['similarity']:.4f}")
    print(f"Question: {result['question']}")
    print(f"SQL: {result['sql']}")
    print(f"Schema: {result['schema']}")
    print("---")

Similarity: 1.0000
Question: 
SQL: 
Schema: 
---
Similarity: 0.9430
Question: 
SQL: 
Schema: 
---
Similarity: 0.8723
Question: 
SQL: 
Schema: 
---


## View All Available Data

Let's see how many examples are in our vector store in total.

In [111]:
# Get all data from the vector store
all_data = vector_store.fetch_all_vectorstore_data()

print(f"Total number of examples in the vector store: {len(all_data)}")

# Show the first result if any exists
if not all_data.empty:
    print("\nFirst example:")
    print(all_data.iloc[0])
else:
    print(
        "No data found in the vector store. Please run the load_data.py script first."
    )

Total number of examples in the vector store: 1034

First example:
id            ca2b9ef2-4696-4030-9eb3-32b0c2cce8ba
collection                            question_sql
schema                                            
question                                          
sql                                               
Name: 0, dtype: object


In [98]:
print(results)

[{'id': 'ca2b9ef2-4696-4030-9eb3-32b0c2cce8ba', 'schema': '', 'question': '', 'sql': '', 'similarity': 1.0000001192092896}, {'id': '2d4370c5-dac8-430d-9924-943fe95f750e', 'schema': '', 'question': '', 'sql': '', 'similarity': 0.9430298209190369}, {'id': '43165215-fc93-4ffb-8285-77a259ba4165', 'schema': '', 'question': '', 'sql': '', 'similarity': 0.8723382353782654}]
