# [SOLUTION] Udaplay Project - Part 01

## Part 01 - Offline RAG

In this part of the project, you'll build your VectorDB using Chroma.

The data is inside folder `project/starter/games`. Each file will become a document in the collection you'll create.
Example.:
```json
{
  "Name": "Gran Turismo",
  "Platform": "PlayStation 1",
  "Genre": "Racing",
  "Publisher": "Sony Computer Entertainment",
  "Description": "A realistic racing simulator featuring a wide array of cars and tracks, setting a new standard for the genre.",
  "YearOfRelease": 1997
}
```


### Setup

In [1]:
!pip install chromadb
import chromadb



You should consider upgrading via the 'C:\Users\PC\AppData\Local\Programs\Python\Python310\python.exe -m pip install --upgrade pip' command.


In [2]:
# Only needed for Udacity workspace
import importlib.util
import sys

# Check if 'pysqlite3' is available before importing
if importlib.util.find_spec("pysqlite3") is not None:
    import pysqlite3
    sys.modules['sqlite3'] = sys.modules.pop('pysqlite3')

In [3]:
import os
import json
import chromadb
from chromadb.utils import embedding_functions
from dotenv import load_dotenv

In [4]:
# Load environment variables from .env file
# Make sure you have created a .env file with:
# OPENAI_API_KEY="YOUR_KEY"
# CHROMA_OPENAI_API_KEY="YOUR_KEY"
# TAVILY_API_KEY="YOUR_KEY"

load_dotenv()

# Verify that the API keys are loaded
assert os.getenv('OPENAI_API_KEY') is not None, "OPENAI_API_KEY not found in .env file"
assert os.getenv('TAVILY_API_KEY') is not None, "TAVILY_API_KEY not found in .env file"

print("✓ Environment variables loaded successfully")

✓ Environment variables loaded successfully


### VectorDB Instance

In [5]:
# Create a persistent ChromaDB client
# This will store the database in a folder called 'chromadb' in the current directory
chroma_client = chromadb.PersistentClient(path="chromadb")

print("✓ ChromaDB client created successfully")

✓ ChromaDB client created successfully


### Collection

In [6]:
# Set up the OpenAI embedding function
# This will use OpenAI's text-embedding-ada-002 model by default
embedding_fn = embedding_functions.OpenAIEmbeddingFunction(
    api_key=os.getenv('OPENAI_API_KEY'),
    model_name="text-embedding-ada-002"
)

print("✓ Embedding function configured")

✓ Embedding function configured


In [7]:
# Try to get existing collection or create a new one
try:
    # Try to get existing collection
    collection = chroma_client.get_collection(
        name="udaplay",
        embedding_function=embedding_fn
    )
    print(f"✓ Loaded existing collection 'udaplay' with {collection.count()} documents")
except:
    # Create new collection if it doesn't exist
    collection = chroma_client.create_collection(
        name="udaplay",
        embedding_function=embedding_fn,
        metadata={"description": "Video game information database for UdaPlay agent"}
    )
    print("✓ Created new collection 'udaplay'")

✓ Loaded existing collection 'udaplay' with 0 documents


### Add documents

In [8]:
# Load and index all game data from JSON files
data_dir = "games"

# Check if collection is empty before adding documents
if collection.count() == 0:
    print("Adding documents to collection...")
    
    for file_name in sorted(os.listdir(data_dir)):
        if not file_name.endswith(".json"):
            continue

        file_path = os.path.join(data_dir, file_name)
        with open(file_path, "r", encoding="utf-8") as f:
            game = json.load(f)

        # Create a rich text representation for better semantic search
        content = f"[{game['Platform']}] {game['Name']} ({game['YearOfRelease']}) - {game['Description']}"

        # Use file name (like 001) as ID
        doc_id = os.path.splitext(file_name)[0]

        # Add document with metadata
        collection.add(
            ids=[doc_id],
            documents=[content],
            metadatas=[game]
        )
        print(f"  Added: {game['Name']} ({game['Platform']})")
    
    print(f"\n✓ Successfully added {collection.count()} documents to the collection")
else:
    print(f"✓ Collection already contains {collection.count()} documents")

Adding documents to collection...
  Added: Gran Turismo (PlayStation 1)
  Added: Grand Theft Auto: San Andreas (PlayStation 2)
  Added: Gran Turismo 5 (PlayStation 3)
  Added: Marvel's Spider-Man (PlayStation 4)
  Added: Marvel's Spider-Man 2 (PlayStation 5)
  Added: Pokémon Gold and Silver (Game Boy Color)
  Added: Pokémon Ruby and Sapphire (Game Boy Advance)
  Added: Super Mario World (Super Nintendo Entertainment System (SNES))
  Added: Super Mario 64 (Nintendo 64)
  Added: Super Smash Bros. Melee (GameCube)
  Added: Wii Sports (Wii)
  Added: Mario Kart 8 Deluxe (Nintendo Switch)
  Added: Kinect Adventures! (Xbox 360)
  Added: Minecraft (Xbox One)
  Added: Halo Infinite (Xbox Series X|S)

✓ Successfully added 15 documents to the collection


### Test the Vector Database

In [9]:
# Test semantic search with a sample query
test_query = "racing games for PlayStation"
results = collection.query(
    query_texts=[test_query],
    n_results=3
)

print(f"Query: '{test_query}'\n")
print("Top 3 Results:")
for i, (doc, metadata) in enumerate(zip(results['documents'][0], results['metadatas'][0]), 1):
    print(f"\n{i}. {metadata['Name']}")
    print(f"   Platform: {metadata['Platform']}")
    print(f"   Year: {metadata['YearOfRelease']}")
    print(f"   Genre: {metadata['Genre']}")

Query: 'racing games for PlayStation'

Top 3 Results:

1. Gran Turismo
   Platform: PlayStation 1
   Year: 1997
   Genre: Racing

2. Gran Turismo 5
   Platform: PlayStation 3
   Year: 2010
   Genre: Racing

3. Grand Theft Auto: San Andreas
   Platform: PlayStation 2
   Year: 2004
   Genre: Action-adventure


## Summary

In this notebook, we:
1. ✓ Set up ChromaDB as a persistent vector database
2. ✓ Configured OpenAI embeddings for semantic search
3. ✓ Created a collection to store game information
4. ✓ Loaded and indexed all game data from JSON files
5. ✓ Tested the semantic search functionality

The vector database is now ready to be used in Part 2 for the AI agent implementation.