# Udaplay Project

## Part 01 - Offline RAG

In this part of the project, you'll build your VectorDB using Chroma.

The data is inside folder `project/starter/games`. Each file will become a document in the collection you'll create.
Example.:
```json
{
  "Name": "Gran Turismo",
  "Platform": "PlayStation 1",
  "Genre": "Racing",
  "Publisher": "Sony Computer Entertainment",
  "Description": "A realistic racing simulator featuring a wide array of cars and tracks, setting a new standard for the genre.",
  "YearOfRelease": 1997
}
```


### Setup

In [1]:
# Only needed for Udacity workspace

import importlib.util
import sys

# Check if 'pysqlite3' is available before importing
if importlib.util.find_spec("pysqlite3") is not None:
    import pysqlite3
    sys.modules['sqlite3'] = sys.modules.pop('pysqlite3')

In [2]:
import os
import json
import chromadb
# from chromadb.utils import embedding_functions
from lib.documents import Document
from lib.vector_db import VectorStoreManager
from dotenv import load_dotenv

In [3]:
# TODO: Create a .env file with the following variables #DONE
# OPENAI_API_KEY="YOUR_KEY"
# CHROMA_OPENAI_API_KEY="YOUR_KEY"
# TAVILY_API_KEY="YOUR_KEY"

In [4]:
# TODO: Load environment variables
load_dotenv()

True

### VectorDB Instance

In [5]:
# TODO: Instantiate your ChromaDB Client
# Choose any path you want
chroma_client = chromadb.PersistentClient(path="chromadb")

### Collection

In [6]:
# TODO: Pick one embedding function
# If picking something different than openai, 
# make sure you use the same when loading it

# Already handled in VectorStoreManager 
# embedding_fn = embedding_functions.OpenAIEmbeddingFunction(api_base="https://openai.vocareum.com/v1")

In [7]:
# TODO: Create a collection
# Choose any name you want
# collection = chroma_client.create_collection(
#    name="udaplay",
#    embedding_function=embedding_fn
#)

# We'll use the VectorStoreManager, but override its client to be persistent.
manager = VectorStoreManager(openai_api_key=os.getenv("OPENAI_API_KEY"))
manager.chroma_client = chroma_client
# manager.embedding_function = embedding_fn
store = manager.get_or_create_store("udaplay")

### Add documents

In [8]:
# Make sure you have a directory "project/starter/games"
data_dir = "games"
documents = []

for file_name in sorted(os.listdir(data_dir)):
    if not file_name.endswith(".json"):
        continue

    file_path = os.path.join(data_dir, file_name)
    with open(file_path, "r", encoding="utf-8") as f:
        game = json.load(f)

    # You can change what text you want to index
    content = (
        f"The game \"{game['Name']}\", released in {game['YearOfRelease']}, "
        f"is a {game['Genre']} game for the {game['Platform']} platform, "
        f"published by {game['Publisher']}. "
        f"Here is a short description: {game['Description']}"
    )

    # Use file name (like 001) as ID
    doc_id = os.path.splitext(file_name)[0]

    documents.append(
        Document(
            id=doc_id,
            content=content,
            metadata=game
        )
    )

store.add(documents)
print(f"Added {len(documents)} documents to the 'udaplay' store.")


Added 15 documents to the 'udaplay' store.
