# ChromaDB

[ChromaDB](https://www.trychroma.com/) is an open-source embedding database designed for building AI applications with large-scale vector search and retrieval. It is commonly used for storing and querying vector embeddings generated from text, images, or other data types, enabling efficient similarity search and retrieval-augmented generation (RAG) workflows.

**Key Features:**
- Fast and scalable vector search
- Simple Python API
- Integration with popular embedding models
- Persistent and in-memory storage options

**Typical Use Cases:**
- Semantic search
- Document retrieval
- Recommendation systems
- Retrieval-augmented generation (RAG) for LLMs

**Example Workflow:**
1. Generate embeddings for your data (e.g., using OpenAI, HuggingFace, etc.).
2. Store embeddings in ChromaDB.
3. Query ChromaDB to find similar items based on vector similarity.

For more information, visit the [ChromaDB documentation](https://docs.trychroma.com/).

In [6]:
from langchain_chroma import Chroma
from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter


In [7]:
loader = TextLoader("speech.txt")
data = loader.load()
data 
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
texts = text_splitter.split_documents(data)


In [8]:
embedding = OllamaEmbeddings(model="gemma:2b")
vectorstore = Chroma.from_documents(documents=texts,embedding=embedding)
vectorstore

<langchain_chroma.vectorstores.Chroma at 0x12562f0f650>

In [9]:
query = "What is the main topic of the speech?"
docs = vectorstore.similarity_search(query)
splits=docs[0].page_content

In [None]:
# saving to disk
db  = Chroma.from_documents(documents=texts, embedding=embedding,persist_directory="./chroma_db")


In [12]:
db2=Chroma(persist_directory="./chroma_db", embedding_function=embedding)
docs2 = db2.similarity_search(query)
print(docs[0].page_content)

In conclusion, space exploration is a testament to human curiosity and determination. It brings nations together, drives technological progress, and expands our understanding of the universe. The journey is far from over, and the next chapter promises even greater discoveries.


In [14]:
docs = vectorstore.similarity_search_with_score(query)
docs

[(Document(id='9f4d69e7-0290-4fe8-b54d-d513a7043398', metadata={'source': 'speech.txt'}, page_content='In conclusion, space exploration is a testament to human curiosity and determination. It brings nations together, drives technological progress, and expands our understanding of the universe. The journey is far from over, and the next chapter promises even greater discoveries.'),
  2960.291748046875),
 (Document(id='eac50de6-1a20-44a1-bc0f-3fffb4601d0c', metadata={'source': 'speech.txt'}, page_content='In conclusion, space exploration is a testament to human curiosity and determination. It brings nations together, drives technological progress, and expands our understanding of the universe. The journey is far from over, and the next chapter promises even greater discoveries.'),
  2960.291748046875),
 (Document(id='3daf5525-da98-4165-a4b8-0f8e94dcbb86', metadata={'source': 'speech.txt'}, page_content='The Hubble Space Telescope has provided breathtaking images of distant galaxies, nebu

In [None]:
# Retrieving documents by vector
retriever =db.as_retriever()
retriever.invoke(query)[0].page_content

'In conclusion, space exploration is a testament to human curiosity and determination. It brings nations together, drives technological progress, and expands our understanding of the universe. The journey is far from over, and the next chapter promises even greater discoveries.'

: 