# Chroma DB

Chroma DB is an open-source vector database designed for storing, managing, and searching high-dimensional embeddings generated by machine learning models. It is commonly used in AI and NLP applications for tasks like semantic search, retrieval-augmented generation (RAG), and recommendation systems.

## Key Features of Chroma DB

- Efficiently stores and indexes vector embeddings for fast similarity search.  
- Supports metadata storage alongside vectors, enabling rich filtering and retrieval.  
- Provides a simple Python API for adding, querying, and managing data.  
- Integrates well with popular frameworks like LangChain for building AI-powered applications.  
- Can be used locally or in distributed/cloud environments for scalability.  

## Summary

In summary, Chroma DB is a modern vector store that makes it easy to build applications requiring fast and flexible vector similarity search.

### Crawl the website

In [None]:
from langchain_text_splitters import HTMLHeaderTextSplitter

url = "https://www.chandanys.in/"

headers_to_split_on = [
    ("h1", "Header 1"),
    ("h2", "Header 2"),
    ("h3", "Header 3"),
    ("h4", "Header 4"),
]

html_splitter = HTMLHeaderTextSplitter(headers_to_split_on)
crawled_docs = html_splitter.split_text_from_url(url)

### Init embedding model

In [None]:
from langchain_openai import AzureOpenAIEmbeddings
from dotenv import load_dotenv

load_dotenv()

azOpenAIembeddings = AzureOpenAIEmbeddings(
    model="text-embedding-ada-002",
    api_version="2023-05-15",
)

### Chroma Vector DB

In [None]:
from langchain_chroma import Chroma

vectorstore = Chroma.from_documents(
    documents=crawled_docs,
    embedding=azOpenAIembeddings
)

### Semantic Search

In [None]:
vectorstore.similarity_search("Chandan's technical stack", k=3)

### Search with Score

In [None]:
vectorstore.similarity_search_with_score(query="technical stack", k=3)

### Search by Vector

In [None]:
user_query="Chandan's technical stack"
user_query_vector=azOpenAIembeddings.embed_query(user_query)
vectorstore.similarity_search_by_vector(embedding=user_query_vector, k=3)

### Vector Store as Retriever

We can also convert the vector store into a Retriever object. This makes it easy to integrate with other LangChain methods, as many of them are designed to work with retrievers. Essentially, it serves as a convenient interface.

In [None]:
retriever = vectorstore.as_retriever(search_kwargs = {"k": 3})
retriever.invoke("technical stack")

### Saving Chroma Vector DB Locally

In [None]:
vectorstore = Chroma.from_documents(
    documents=crawled_docs,
    embedding=azOpenAIembeddings,
    persist_directory="./chroma_vector_db"
)

### Load Chroma Vector DB from Local

In [None]:
new_db=Chroma(persist_directory="./chroma_vector_db", embedding_function=azOpenAIembeddings)
new_db.similarity_search("Chandan's technical stack", k=3)