# Derivative
Derived from [llamaIndex Example](https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/vector_stores/ChromaIndexDemo.ipynb)

## Pre-requisite

Do a `docker compose up` in this project root folder to bring up **ChromaDB**, **VectorAdmin**

# Basic Example

In this basic example, we take the Paul Graham essay, split it into chunks, embed it using an openai embedding model, load it into Chroma, and then query it.

In [None]:
%pip install llama-index-vector-stores-chroma
%pip install llama-index-llms-openai
%pip install llama-index-embeddings-openai
%pip install openai

#### Get OpenAI API Key

In [2]:
# set up OpenAI
import os
import getpass
import openai

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
openai.api_key = os.environ["OPENAI_API_KEY"]

#### Download Url 

In [None]:
!mkdir -p '../data/downloaded/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O '../data/downloaded/paul_graham_essay.txt'

In [4]:
import chromadb
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext
from llama_index.embeddings.openai import OpenAIEmbedding

# create client and a new collection
chroma_client = chromadb.HttpClient()
chroma_collection = chroma_client.get_or_create_collection("quickstart")

# define embedding function
embed_model = OpenAIEmbedding(model="text-embedding-3-large")

# load documents
documents = SimpleDirectoryReader("../data/downloaded/").load_data()

# set up ChromaVectorStore and load in data
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context, embed_model=embed_model
)

## Query

In [None]:
from IPython.display import Markdown, display

# Query Data from the Chroma Docker index
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
display(Markdown(f"<b>{response}</b>"))

In [None]:
response = query_engine.query("What are the different timelines in the essay?")
display(Markdown(f"<b>{response}</b>"))

## Update and Delete

While building toward a real application, you want to go beyond adding data, and also update and delete data. 

Chroma has users provide `ids` to simplify the bookkeeping here. `ids` can be the name of the file, or a combined has like `filename_paragraphNumber`, etc.

Here is a basic example showing how to do various operations:

In [None]:
doc_to_update = chroma_collection.get(limit=1)
doc_to_update["metadatas"][0] = {
    **doc_to_update["metadatas"][0],
    **{"author": "Paul Graham"},
}
chroma_collection.update(
    ids=[doc_to_update["ids"][0]], metadatas=[doc_to_update["metadatas"][0]]
)
updated_doc = chroma_collection.get(limit=1)
print(updated_doc["metadatas"][0])

# delete the last document
print("count before", chroma_collection.count())
chroma_collection.delete(ids=[doc_to_update["ids"][0]])
print("count after", chroma_collection.count())