# GameAnswerAgent Project

## Offline RAG

In this part of the project, we prepare a VectorDB using Chroma.

The data is inside folder `project/starter/games`. Each file will become a document in the collection you'll create.
Example.:
```json
{
  "Name": "Gran Turismo",
  "Platform": "PlayStation 1",
  "Genre": "Racing",
  "Publisher": "Sony Computer Entertainment",
  "Description": "A realistic racing simulator featuring a wide array of cars and tracks, setting a new standard for the genre.",
  "YearOfRelease": 1997
}
```


## Workspace compatibility setup

This cell ensures compatibility and avoiding issues with SQLite in Jupyter environments.  
It checks for the presence of `pysqlite3` and sets up the correct module for database operations.

In [None]:
import importlib.util
import sys

# Check if 'pysqlite3' is available before importing
if importlib.util.find_spec("pysqlite3") is not None:
    import pysqlite3
    sys.modules['sqlite3'] = sys.modules.pop('pysqlite3')

## Import required libraries

This cell imports essential Python libraries and modules for environment variable management, file operations, and ChromaDB setup.

In [2]:
import os
import json
import chromadb
from chromadb.utils import embedding_functions
from dotenv import load_dotenv

## Load environment variables

This cell loads environment variables from the `.env` file so they can be used in the notebook.

In [3]:
load_dotenv()

True

## Instantiate ChromaDB client

This cell creates a persistent ChromaDB client instance, specifying the path for the database files.

In [None]:
chroma_client = chromadb.PersistentClient(path="chromadb2")                                           

## Set up embedding function

This cell configures the embedding function for ChromaDB using OpenAI's API.  
It specifies the model and API endpoint to use for generating text embeddings.

In [None]:
embedding_fn = embedding_functions.OpenAIEmbeddingFunction(
    api_key=os.getenv("CHROMA_OPENAI_API_KEY"),
    model_name="text-embedding-3-small",
    api_base=os.getenv("CHROMA_OPENAI_API_BASE", "https://api.openai.com/v1")
)

## Create ChromaDB collection

This cell creates a new collection in ChromaDB for storing game documents.  
You can choose any name for your collection and specify the embedding function to use.

In [None]:
collection = chroma_client.create_collection(
   name="udaplay",
   embedding_function=embedding_fn
)

## Add documents to the collection

This cell loads game data from JSON files in the `games` directory and adds them to the ChromaDB collection.  
Each file is parsed and indexed with its metadata for semantic search.

In [8]:
# Make sure you have a directory "project/starter/games"
data_dir = "games"

for file_name in sorted(os.listdir(data_dir)):
    if not file_name.endswith(".json"):
        continue

    file_path = os.path.join(data_dir, file_name)
    with open(file_path, "r", encoding="utf-8") as f:
        game = json.load(f)

    # You can change what text you want to index
    content = f"[{game['Platform']}] {game['Name']} ({game['YearOfRelease']}) - {game['Description']}"

    # Use file name (like 001) as ID
    doc_id = os.path.splitext(file_name)[0]

    collection.add(
        ids=[doc_id],
        documents=[content],
        metadatas=[game]
    )

### Querying the DB

In [None]:
# Basic query: Search for racing games
query_text = "racing games with cars"
results = collection.query(
    query_texts=[query_text],
    n_results=3
)

print(f"Query: '{query_text}'")
print(f"Found {len(results['ids'][0])} results:")
for i, (doc_id, document, metadata) in enumerate(zip(results['ids'][0], results['documents'][0], results['metadatas'][0])):
    print(f"\n{i+1}. ID: {doc_id}")
    print(f"   Game: {metadata['Name']}")
    print(f"   Platform: {metadata['Platform']}")
    print(f"   Genre: {metadata['Genre']}")
    print(f"   Year: {metadata['YearOfRelease']}")
    print(f"   Document: {document}")

In [None]:
# Query with metadata filtering: Find PlayStation games from the 1990s
query_text = "quest"
results = collection.query(
    query_texts=[query_text],
    n_results=5,
    where={
        "$and": [
            {"Platform": {"$eq": "Super Nintendo Entertainment System (SNES)"}},
            {"YearOfRelease": {"$gte": 1990}},
            {"YearOfRelease": {"$lt": 2000}}
        ]
    }
)

print(f"Query: '{query_text}' with filters (SNES, 1990s)")
print(f"Found {len(results['ids'][0])} results:")
for i, (doc_id, document, metadata) in enumerate(zip(results['ids'][0], results['documents'][0], results['metadatas'][0])):
    print(f"\n{i+1}. {metadata['Name']} ({metadata['YearOfRelease']})")
    print(f"   Genre: {metadata['Genre']}")
    print(f"   Publisher: {metadata['Publisher']}")
    print(f"   Description: {metadata['Description']}")

In [None]:
# Query by genre: Find all RPG games
results = collection.query(
    query_texts=["role playing fantasy adventure"],
    n_results=10,
    where={"Genre": {"$eq": "Role-playing"}}
)

print("RPG Games in the collection:")
for i, (doc_id, document, metadata) in enumerate(zip(results['ids'][0], results['documents'][0], results['metadatas'][0])):
    print(f"{i+1}. {metadata['Name']} ({metadata['Platform']}, {metadata['YearOfRelease']})")
    print(f"   {metadata['Description'][:100]}...")
    print()

In [None]:
# Get specific documents by ID
specific_ids = ["001", "005", "010"]
results = collection.get(
    ids=specific_ids,
    include=["documents", "metadatas"]
)

print("Specific games by ID:")
for doc_id, document, metadata in zip(results['ids'], results['documents'], results['metadatas']):
    print(f"ID {doc_id}: {metadata['Name']}")
    print(f"  Platform: {metadata['Platform']}")
    print(f"  Genre: {metadata['Genre']}")
    print(f"  Year: {metadata['YearOfRelease']}")
    print(f"  Document: {document}")
    print()

In [None]:
# Collection statistics and exploration
print("Collection Information:")
print(f"Collection name: {collection.name}")
print(f"Total documents: {collection.count()}")
print()

# Get all documents to explore the data
all_docs = collection.get(include=["metadatas"])
platforms = [doc['Platform'] for doc in all_docs['metadatas']]
genres = [doc['Genre'] for doc in all_docs['metadatas']]
years = [doc['YearOfRelease'] for doc in all_docs['metadatas']]

from collections import Counter

print("Platform distribution:")
for platform, count in Counter(platforms).most_common():
    print(f"  {platform}: {count}")

print("\nGenre distribution:")
for genre, count in Counter(genres).most_common():
    print(f"  {genre}: {count}")

print(f"\nYear range: {min(years)} - {max(years)}")