## chroma db
Chroma DB is an open-source vector database for storing embeddings with built-in persistence.
It’s used for semantic and similarity search across text, images, or audio.
Unlike FAISS, it supports metadata filtering and query-based retrieval out of the box.
Python interface is simple: Chroma.from_documents(docs, embeddings) to store and db.similarity_search(query) to retrieve.
It’s production-ready, scalable, and ideal as a retriever in ML/NLP pipelines.

In [2]:
from langchain_chroma import Chroma
from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = TextLoader("speech.txt")
document = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=50, chunk_overlap=10)
split = text_splitter.split_documents(document)

In [6]:
embedding = OllamaEmbeddings(model="gemma:2b")
db = Chroma.from_documents(documents=split,embedding=embedding)
db

<langchain_chroma.vectorstores.Chroma at 0x1e6cb44d340>

In [7]:
query = "Find speeches or texts about the importance of continuous learning, personal growth, and motivation."
docs = db.similarity_search(query)
docs[0].page_content

'about the importance of learning and growth.'

save and load

In [8]:
db=Chroma.from_documents(documents=split,embedding=embedding,persist_directory="./chroma_db")

In [9]:
db2 = Chroma(persist_directory="./chroma_db",embedding_function=embedding)
docs=db2.similarity_search(query)
docs[0].page_content

'about the importance of learning and growth.'

retreiver

In [10]:
ret=db.as_retriever()
ret.invoke(query)[0].page_content

'about the importance of learning and growth.'