# Udaplay Project

## Part 01 - Offline RAG

In this part of the project, you'll build your VectorDB using Chroma.

The data is inside folder `project/starter/games`. Each file will become a document in the collection you'll create.
Example.:
```json
{
  "Name": "Gran Turismo",
  "Platform": "PlayStation 1",
  "Genre": "Racing",
  "Publisher": "Sony Computer Entertainment",
  "Description": "A realistic racing simulator featuring a wide array of cars and tracks, setting a new standard for the genre.",
  "YearOfRelease": 1997
}
```


### Setup

In [None]:
import os
import json
import chromadb
from dotenv import load_dotenv


In [2]:
from typing import List
from openai import OpenAI

class CustomOpenAIEmbedder:
    def __init__(self, api_key: str, base_url: str, model: str = "text-embedding-3-small"):
        self.client = OpenAI(api_key=api_key, base_url=base_url)
        self.model = model

    def __call__(self, input: List[str]) -> List[List[float]]:
        if isinstance(input, str):
            input = [input]
        response = self.client.embeddings.create(input=input, model=self.model)
        return [r.embedding for r in response.data]

In [3]:
# Load environment variables
load_dotenv()

True

### VectorDB Instance

In [None]:

chroma_client = chromadb.PersistentClient(path="chromadb")

### Collection

In [5]:
embedding_function = CustomOpenAIEmbedder(
    api_key=os.environ["CHROMA_OPENAI_API_KEY"],
    base_url="https://openai.vocareum.com/v1"
)

In [6]:
# Create a collection
collection = chroma_client.create_collection(
   name="udaplay",
   embedding_function=embedding_function,
   get_or_create=True
)

In [9]:
collection = chroma_client.get_collection("udaplay")

### Add documents

In [8]:
# Make sure you have a directory "project/starter/games"
data_dir = "games"

for file_name in sorted(os.listdir(data_dir)):
    if not file_name.endswith(".json"):
        continue

    file_path = os.path.join(data_dir, file_name)
    with open(file_path, "r", encoding="utf-8") as f:
        game = json.load(f)

    # You can change what text you want to index
    content = f"[{game['Platform']}] {game['Name']} ({game['YearOfRelease']}) - {game['Description']}"

    # Use file name (like 001) as ID
    doc_id = os.path.splitext(file_name)[0]

    collection.add(
        ids=[doc_id],
        documents=[content],
        metadatas=[game]
    )