# Vector Databases: Focus on ChromaDB
An introduction to vector databases with detailed exploration of ChromaDB and its integration with LangChain.

## What are Vector Databases?
- Specialized databases for storing and querying large volumes of high-dimensional vectors.
- Enable semantic search by using similarity metrics like cosine similarity or Euclidean distance.
- Essential for retrieval-augmented generation (RAG), recommendation engines, and AI-powered search.

# ChromaDB Overview
ChromaDB is an open-source vector database designed for speed, scalability, and ease of use.
- Supports creation of collections (analogous to tables) to hold documents embedding vectors.
- Supports metadata for filtering and advanced queries.
- Real-time inserts and updates.
- Provides rich Python SDK for programmatic access.

## ChromaDB Core API Concepts
**Key classes and concepts:**
- `Client`: Connects to and manages storage backend.
- `Collection`: Holds embeddings and associated metadata.
- Functions: `add()`, `query()`, `update()`, `delete()`.

**Use-case:** Create collection, add embeddings, query by similarity with filters.

In [None]:
# Connect and create/get a collection
from chromadb import Client

client = Client()
collection = client.get_or_create_collection(name="my_documents")
print(collection.name)

## Adding Embeddings to Collection
Method: `collection.add()`
Input parameters:
- `embeddings`: list of vectors
- `metadatas`: optional list for associated metadata
- `documents`: optional list of text data
Output:
- Confirmation of inserted records.
Use cases: Index documents with embeddings + metadata.

In [None]:
# Example: Add sample embeddings
import numpy as np
ids = [str(i) for i in range(3)]
embeddings = [np.random.rand(128).tolist() for _ in range(3)]
documents = ["Doc 1 text", "Doc 2 text", "Doc 3 text"]
metadatas = [{"source": "source_a"}, {"source": "source_b"}, {"source": "source_c"}]

collection.add(ids=ids, embeddings=embeddings, documents=documents, metadatas=metadatas)
print("Added embeddings and documents.")

## Querying Embeddings
Method: `collection.query()`
Inputs:
- `query_embeddings`: list of query vectors
- `n_results`: number of top results to return per query
- Optional filters on metadata

Outputs:
- List of matching documents, embeddings, metadatas, scores.

In [None]:
# Query: Find top 2 similar embeddings
query_vector = np.random.rand(128).tolist()
results = collection.query(query_embeddings=[query_vector], n_results=2)
print("Query Results:")
print(results)

# Integration with LangChain
LangChain integrates with ChromaDB via the `langchain.vectorstores.Chroma` class.
- Simplifies embedding storage and similarity search for LangChain chains.
- Supports API-compatible methods like `add_texts()` and `similarity_search()`.

## LangChain Chroma API Overview
Key classes/methods:
- `Chroma`: Constructor takes collection name, embedding function.
- `add_texts(texts)`: Embeds texts and saves in Chroma collection.
- `similarity_search(query, k)`: Returns top k relevant docs.

In [None]:
# Setup LangChain Chroma integration
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings
import os

embedding_function = OpenAIEmbeddings()
vectorstore = Chroma(collection_name="my_documents_lc", embedding_function=embedding_function)
print("Chroma vectorstore initialized")

## Adding Texts to LangChain Chroma Store
`add_texts(texts: List[str])`
- Accepts list of documents, embeds them, adds to the Chroma collection.

In [None]:
# Add texts example
texts = ["AI is transforming many fields", "LangChain simplifies application development"]
vectorstore.add_texts(texts)
print("Added texts to LangChain Chroma vectorstore")

## Querying LangChain Chroma Vectorstore
`similarity_search(query: str, k: int)`
- Returns top-k documents similar to the query string based on embedding similarity.

In [None]:
# Similarity search example
query = "How is AI changing industries?"
results = vectorstore.similarity_search(query, k=2)
print("Top matches:")
for r in results:
    print(r.page_content)