**ChromaDB** is Open Source Vector Database helps users to store their documents in vector form and retrieve relevent documents from Vector DB based on user's query efficiently.
ChromaDB uses Cosine Matrix by default to findout Similarity in query and documents stored in collections.

**Common Distance Metrics for Multimodal Use Cases.**

**01: Cosine Distance (1 - Cosine Similarity)**
      Best For Images, Audio and Texts

**02: L2 Distance (Euclidean)**
      Best for Spatial data, High dimensional embeddings
      Uses whenabsilute magnitude matters

**03: Inner Product (IP)**
      Best for Ranking Based Retrieval
      When using dot-product-based models (e.g., CLIP with max similarity search)

In [1]:
!pip install chromadb

Collecting chromadb
  Downloading chromadb-0.6.3-py3-none-any.whl.metadata (6.8 kB)
Collecting build>=1.0.3 (from chromadb)
  Downloading build-1.2.2.post1-py3-none-any.whl.metadata (6.5 kB)
Collecting chroma-hnswlib==0.7.6 (from chromadb)
  Downloading chroma_hnswlib-0.7.6-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (252 bytes)
Collecting fastapi>=0.95.2 (from chromadb)
  Downloading fastapi-0.115.8-py3-none-any.whl.metadata (27 kB)
Collecting uvicorn>=0.18.3 (from uvicorn[standard]>=0.18.3->chromadb)
  Downloading uvicorn-0.34.0-py3-none-any.whl.metadata (6.5 kB)
Collecting posthog>=2.4.0 (from chromadb)
  Downloading posthog-3.11.0-py2.py3-none-any.whl.metadata (2.9 kB)
Collecting onnxruntime>=1.14.1 (from chromadb)
  Downloading onnxruntime-1.20.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.5 kB)
Collecting opentelemetry-exporter-otlp-proto-grpc>=1.2.0 (from chromadb)
  Downloading opentelemetry_exporter_otlp_proto_grpc-1.29.0-py3

In [3]:
!pip install requests




**Chroma DB Client**

**Chroma Clients**

Ephemeral Client

Persistent Client

Client-Server Mode

Python Http-Only Client

In [4]:
import chromadb
chroma_client = chromadb.Client() # Ephemeral Client


**Create Collection**

In [5]:
collection = chroma_client.get_or_create_collection(name="RAG_Application")


**Add External Resources to VectorDB (Chroma DB)**

In [14]:
# switch `add` to `upsert` to avoid adding the same documents every time
collection.upsert(
    documents=[
        "This is a document about pineapple",
        "This is a document about oranges",
        "My name is M Sheraz Rana and I am 25 years old. I am working on Learning Generative AI development."
    ],
    ids=["id1", "id2","id3"]
)


**Adding Extra Documents From Extrernal Resources**

In [32]:
# Assuming each document in the .txt file is separated by a newline
file_path = "/content/PersonalData.txt"  # Path to your .txt file

# Open and read the file
with open(file_path, "r") as file:
    documents = file.readlines()  # Read all lines in the file

# Strip any extra whitespace characters like newlines
documents = [doc.strip() for doc in documents]

# Generate unique ids (or you can assign custom ids)
ids = [f"id{i+1}" for i in range(len(documents))]

# Use the upsert method with the read documents
collection.upsert(
    documents=documents,
    ids=ids
)


**Now, Getting Query and get result from DB**

In [30]:


user_query = "what is Orange?"
response = collection.query(
    query_texts=user_query, # Chroma will embed this for you
    n_results=2 # how many results to return
)

retrieved_documents = response["documents"][0]
retrieved_documents
context = "\n".join(retrieved_documents)
final_promp = f"Context:\n{context}\n\nUser Query: {user_query}\n Answer:"
print(final_promp)
print(response)

Context:
This is a document about oranges
This is a document about pineapple

User Query: what is Orange?
 Answer:
{'ids': [['id2', 'id1']], 'embeddings': None, 'documents': [['This is a document about oranges', 'This is a document about pineapple']], 'uris': None, 'data': None, 'metadatas': [[None, None]], 'distances': [[0.4265100955963135, 1.3607076406478882]], 'included': [<IncludeEnum.distances: 'distances'>, <IncludeEnum.documents: 'documents'>, <IncludeEnum.metadatas: 'metadatas'>]}


**Now, Using LLM And observing Response fro it**

In [31]:
import os
import google.generativeai as genai
from google.colab import userdata
key = userdata.get('GEMINI_API_KEY')
genai.configure(api_key=key)

# Create the model
generation_config = {
  "temperature": 1.25,
  "top_p": 0.95,
  "top_k": 40,
  "max_output_tokens": 8192,
  "response_mime_type": "text/plain",
}

model = genai.GenerativeModel(
  model_name="gemini-2.0-flash-exp",
  generation_config=generation_config,
)

chat_session = model.start_chat(
  history=[
  ]
)

response = chat_session.send_message(final_promp)

print(response.text)

Based on the provided context, the answer should be:

**A document about oranges.**

