# Chroma
Chroma is a AI native open-source vector database focused on developer productivity and happiness.Chroma is under Apache under 2.0

In [3]:
# building a sample vectordb
from langchain_chroma import Chroma
from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

In [4]:
# Load the text file
loader=TextLoader("speech.txt")
document=loader.load()
document

[Document(metadata={'source': 'speech.txt'}, page_content='LangChain is a powerful framework designed to simplify the development of applications powered by large language models (LLMs). It helps developers create chains of components, such as prompt templates, memory, LLMs, and agents, to build context-aware, intelligent applications.\n\nOne of the core advantages of LangChain is its modularity. Developers can start with basic chains and progressively add more complex functionality such as custom tools, retrieval augmentation using vector stores, or multi-agent collaboration systems.\n\nLangChain supports multiple LLM providers such as OpenAI, Anthropic, Cohere, and Hugging Face. It also integrates with various vector stores like FAISS, Pinecone, Weaviate, and Chroma to enable efficient document retrieval and semantic search capabilities.\n\nFor data ingestion, LangChain provides several document loaders. These include loaders for plain text, PDFs, CSVs, Notion, GitHub repositories, a

In [5]:
#split using RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=10)
splits = text_splitter.split_documents(document)

In [6]:
embeddings=OllamaEmbeddings(model="gemma:2b")
# Create a Chroma vector store
vectordb=Chroma.from_documents(documents=splits,embedding=embeddings,)

  embeddings=OllamaEmbeddings(model="gemma:2b")


In [7]:
#query it
query = "For data ingestion, LangChain provides several document loaders. These include loaders for plain text,"

docs=vectordb.similarity_search(query)
docs[0].page_content

'For data ingestion, LangChain provides several document loaders. These include loaders for plain'

In [8]:
#saving to db
vectordb=Chroma.from_documents(documents=splits,embedding=embeddings,persist_directory="./Chroma_db")

In [9]:
#load from the disk
db2=Chroma(persist_directory="./Chroma_db",embedding_function=embeddings)
#query it again
docs=db2.similarity_search(query)
print(docs[0].page_content)


For data ingestion, LangChain provides several document loaders. These include loaders for plain


In [10]:
## similarity searfch score
docs=vectordb.similarity_search_with_score(query)
docs

[(Document(id='9dd76164-43cf-4322-ba24-1a21bc33c0b1', metadata={'source': 'speech.txt'}, page_content='For data ingestion, LangChain provides several document loaders. These include loaders for plain'),
  3096.984375),
 (Document(id='8174e4bd-e1f2-48de-929c-ed3d2f7356d4', metadata={'source': 'speech.txt'}, page_content='For data ingestion, LangChain provides several document loaders. These include loaders for plain'),
  3096.984375),
 (Document(id='27703f0d-85fe-472f-a598-3f53f8ce05c6', metadata={'source': 'speech.txt'}, page_content='To process and manage long documents, LangChain includes powerful text splitters like'),
  4377.20458984375),
 (Document(id='2fc72cd8-0b7a-47bd-8bde-018603c72e8c', metadata={'source': 'speech.txt'}, page_content='To process and manage long documents, LangChain includes powerful text splitters like'),
  4377.20458984375)]

In [11]:
#Retriver option
retriver=vectordb.as_retriever()
retriver.invoke(query)[0].page_content

'For data ingestion, LangChain provides several document loaders. These include loaders for plain'