# [STARTER] Udaplay Project

## Part 01 - Offline RAG

In this part of the project, you'll build your VectorDB using Chroma.

The data is inside folder `project/starter/games`. Each file will become a document in the collection you'll create.
Example.:
```json
{
  "Name": "Gran Turismo",
  "Platform": "PlayStation 1",
  "Genre": "Racing",
  "Publisher": "Sony Computer Entertainment",
  "Description": "A realistic racing simulator featuring a wide array of cars and tracks, setting a new standard for the genre.",
  "YearOfRelease": 1997
}
```


### Setup

In [1]:
# # Only needed for Udacity workspace

# import importlib.util
# import sys

# # Check if 'pysqlite3' is available before importing
# if importlib.util.find_spec("pysqlite3") is not None:
#     import pysqlite3
#     sys.modules['sqlite3'] = sys.modules.pop('pysqlite3')

In [2]:
import os
import json
import chromadb
from chromadb.utils import embedding_functions
from dotenv import load_dotenv

In [3]:
# TODO: Create a .env file with the following variables
# OPENAI_API_KEY="YOUR_KEY"
# CHROMA_OPENAI_API_KEY="YOUR_KEY"
# TAVILY_API_KEY="YOUR_KEY"

In [4]:
# Load environment variables
load_dotenv()

# Verify environment variables are loaded
openai_key = os.getenv('OPENAI_API_KEY')
tavily_key = os.getenv('TAVILY_API_KEY')

if openai_key:
    print(f"✅ OpenAI API key loaded: {openai_key[:10]}...")
else:
    print("❌ OpenAI API key not found in environment")

if tavily_key:
    print(f"✅ Tavily API key loaded: {tavily_key[:10]}...")
else:
    print("❌ Tavily API key not found in environment")

# Test Vocareum API connection
print("\n🔗 Testing Vocareum API connection...")
try:
    import openai
    client = openai.OpenAI(
        api_key=openai_key,
        base_url="https://openai.vocareum.com/v1"
    )
    
    # Test with a simple completion
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": "Hello from UdaPlay!"}],
        max_tokens=10
    )
    
    print(f"✅ Vocareum API working! Response: {response.choices[0].message.content}")
    
except Exception as e:
    print(f"❌ Vocareum API test failed: {e}")
    print("   Make sure you're using the correct Vocareum API key")


✅ OpenAI API key loaded: voc-170784...
✅ Tavily API key loaded: tvly-dev-T...

🔗 Testing Vocareum API connection...
✅ Vocareum API working! Response: Hello UdaPlay! How can I assist you


### VectorDB Instance

In [5]:
# TODO: Instantiate your ChromaDB Client
# Choose any path you want
chroma_client = chromadb.PersistentClient(path="chromadb")
print(" ✅ chorma client ready")

 ✅ chorma client ready


### Collection

In [6]:
# Pick one embedding function
# If picking something different than openai, 
# make sure you use the same when loading it
embedding_fn = embedding_functions.OpenAIEmbeddingFunction(
    model_name="text-embedding-3-small",
    api_key=os.getenv('OPENAI_API_KEY'),
    api_base="https://openai.vocareum.com/v1"
)

print("✅ OpenAI embedding function created with Vocareum base URL")

✅ OpenAI embedding function created with Vocareum base URL


In [7]:
# TODO: Create a collection
# Choose any name you want
# collection = chroma_client.create_collection(
#    name="udaplay",
#    embedding_function=embedding_fn
#)
collection = chroma_client.get_or_create_collection(
    name="udaplay",
    embedding_function=embedding_fn
)
print("✅ Collection created")


✅ Collection created


### Add documents

In [8]:
# Make sure you have a directory "project/starter/games"
data_dir = "games"

print("📚 Loading game data into ChromaDB...")
games_loaded = 0

for file_name in sorted(os.listdir(data_dir)):
    if not file_name.endswith(".json"):
        continue

    file_path = os.path.join(data_dir, file_name)
    with open(file_path, "r", encoding="utf-8") as f:
        game = json.load(f)

    # You can change what text you want to index
    content = f"[{game['Platform']}] {game['Name']} ({game['YearOfRelease']}) - {game['Description']}"

    # Use file name (like 001) as ID
    doc_id = os.path.splitext(file_name)[0]

    collection.add(
        ids=[doc_id],
        documents=[content],
        metadatas=[game]
    )
    
    games_loaded += 1
    print(f"  ✅ Loaded: {game['Name']} ({game['Platform']}, {game['YearOfRelease']})")

print(f"\n🎮 Successfully loaded {games_loaded} games into ChromaDB!")
print(f"📊 Collection count: {collection.count()}")

📚 Loading game data into ChromaDB...
  ✅ Loaded: Gran Turismo (PlayStation 1, 1997)
  ✅ Loaded: Grand Theft Auto: San Andreas (PlayStation 2, 2004)
  ✅ Loaded: Gran Turismo 5 (PlayStation 3, 2010)
  ✅ Loaded: Marvel's Spider-Man (PlayStation 4, 2018)
  ✅ Loaded: Marvel's Spider-Man 2 (PlayStation 5, 2023)
  ✅ Loaded: Pokémon Gold and Silver (Game Boy Color, 1999)
  ✅ Loaded: Pokémon Ruby and Sapphire (Game Boy Advance, 2002)
  ✅ Loaded: Super Mario World (Super Nintendo Entertainment System (SNES), 1990)
  ✅ Loaded: Super Mario 64 (Nintendo 64, 1996)
  ✅ Loaded: Super Smash Bros. Melee (GameCube, 2001)
  ✅ Loaded: Wii Sports (Wii, 2006)
  ✅ Loaded: Mario Kart 8 Deluxe (Nintendo Switch, 2017)
  ✅ Loaded: Kinect Adventures! (Xbox 360, 2010)
  ✅ Loaded: Minecraft (Xbox One, 2014)
  ✅ Loaded: Halo Infinite (Xbox Series X|S, 2021)

🎮 Successfully loaded 15 games into ChromaDB!
📊 Collection count: 15


### Semantic Search Demonstration

Now let's test our RAG implementation by performing semantic searches on our game database. This demonstrates that our vector database can actually find relevant games based on meaning, not just exact text matches.


In [9]:
# Test 1: Search for racing games
print("🏎️ Test 1: Searching for 'racing games'")
print("=" * 50)

results = collection.query(
    query_texts=["racing games"],
    n_results=3
)

print(f"Found {len(results['documents'][0])} results:")
for i, (doc, metadata, distance) in enumerate(zip(results['documents'][0], results['metadatas'][0], results['distances'][0])):
    similarity_score = 1 - distance  # Convert distance to similarity
    print(f"\n{i+1}. {metadata['Name']} ({metadata['Platform']}, {metadata['YearOfRelease']})")
    print(f"   Genre: {metadata['Genre']}")
    print(f"   Similarity Score: {similarity_score:.3f}")
    print(f"   Description: {metadata['Description']}")
    print(f"   Document: {doc[:100]}...")


🏎️ Test 1: Searching for 'racing games'
Found 3 results:

1. Gran Turismo (PlayStation 1, 1997)
   Genre: Racing
   Similarity Score: 0.044
   Description: A realistic racing simulator featuring a wide array of cars and tracks, setting a new standard for the genre.
   Document: [PlayStation 1] Gran Turismo (1997) - A realistic racing simulator featuring a wide array of cars an...

2. Gran Turismo 5 (PlayStation 3, 2010)
   Genre: Racing
   Similarity Score: 0.022
   Description: A comprehensive racing simulator featuring a vast selection of vehicles and tracks, with realistic driving physics.
   Document: [PlayStation 3] Gran Turismo 5 (2010) - A comprehensive racing simulator featuring a vast selection ...

3. Pokémon Ruby and Sapphire (Game Boy Advance, 2002)
   Genre: Role-playing
   Similarity Score: -0.331
   Description: Third-generation Pokémon games set in the Hoenn region, featuring new Pokémon and double battles.
   Document: [Game Boy Advance] Pokémon Ruby and Sapphire (2002

In [10]:
# Test 2: Search for Nintendo games
print("\n🎮 Test 2: Searching for 'Nintendo games'")
print("=" * 50)

results = collection.query(
    query_texts=["Nintendo games"],
    n_results=3
)

print(f"Found {len(results['documents'][0])} results:")
for i, (doc, metadata, distance) in enumerate(zip(results['documents'][0], results['metadatas'][0], results['distances'][0])):
    similarity_score = 1 - distance
    print(f"\n{i+1}. {metadata['Name']} ({metadata['Platform']}, {metadata['YearOfRelease']})")
    print(f"   Publisher: {metadata['Publisher']}")
    print(f"   Similarity Score: {similarity_score:.3f}")
    print(f"   Description: {metadata['Description']}")
    print(f"   Document: {doc[:100]}...")



🎮 Test 2: Searching for 'Nintendo games'
Found 3 results:

1. Super Smash Bros. Melee (GameCube, 2001)
   Publisher: Nintendo
   Similarity Score: -0.152
   Description: A crossover fighting game featuring characters from various Nintendo franchises battling it out in dynamic arenas.
   Document: [GameCube] Super Smash Bros. Melee (2001) - A crossover fighting game featuring characters from vari...

2. Super Mario World (Super Nintendo Entertainment System (SNES), 1990)
   Publisher: Nintendo
   Similarity Score: -0.158
   Description: A classic platformer where Mario embarks on a quest to save Princess Toadstool and Dinosaur Land from Bowser.
   Document: [Super Nintendo Entertainment System (SNES)] Super Mario World (1990) - A classic platformer where M...

3. Pokémon Gold and Silver (Game Boy Color, 1999)
   Publisher: Nintendo
   Similarity Score: -0.163
   Description: Second-generation Pokémon games introducing new regions, Pokémon, and gameplay mechanics.
   Document: [Game Boy

In [11]:
# Test 3: Search for platform games
print("\n🦘 Test 3: Searching for 'platform jumping games'")
print("=" * 50)

results = collection.query(
    query_texts=["platform jumping games"],
    n_results=3
)

print(f"Found {len(results['documents'][0])} results:")
for i, (doc, metadata, distance) in enumerate(zip(results['documents'][0], results['metadatas'][0], results['distances'][0])):
    similarity_score = 1 - distance
    print(f"\n{i+1}. {metadata['Name']} ({metadata['Platform']}, {metadata['YearOfRelease']})")
    print(f"   Genre: {metadata['Genre']}")
    print(f"   Similarity Score: {similarity_score:.3f}")
    print(f"   Description: {metadata['Description']}")
    print(f"   Document: {doc[:100]}...")



🦘 Test 3: Searching for 'platform jumping games'
Found 3 results:

1. Kinect Adventures! (Xbox 360, 2010)
   Genre: Party
   Similarity Score: -0.289
   Description: A collection of mini-games designed to showcase the capabilities of the Kinect motion sensor.
   Document: [Xbox 360] Kinect Adventures! (2010) - A collection of mini-games designed to showcase the capabilit...

2. Super Mario 64 (Nintendo 64, 1996)
   Genre: Platformer
   Similarity Score: -0.312
   Description: A groundbreaking 3D platformer that set new standards for the genre, featuring Mario's quest to rescue Princess Peach.
   Document: [Nintendo 64] Super Mario 64 (1996) - A groundbreaking 3D platformer that set new standards for the ...

3. Super Mario World (Super Nintendo Entertainment System (SNES), 1990)
   Genre: Platformer
   Similarity Score: -0.371
   Description: A classic platformer where Mario embarks on a quest to save Princess Toadstool and Dinosaur Land from Bowser.
   Document: [Super Nintendo Enter