## vector Database 

A vector is designed to store and manage high-dimensional vectors, which are often the outputs of machine learning models , particularly embeddings from neural networks.These vectors represent data (such as words , images,or any other obeject) in numerical form,enabling efficient similarity searches . the goal is to find data points (vectors) that are close to a given query vector using distance metrics like Euclidian distance , cosine similarity etc.

### why use a vector Database?

- fast similarity search : quickly find similar vectors (documents,images etc.) for recommedations or matching.
- scalability : Designed to handle millions or billions of vectors.
- embeddings-based search : useful in scenarious where traditional keyword search doesn't work well(eg.semantic search , recommendation engines)


## ChormaDB 

chromaDB is an open-source vector database built for embeddings, particularly used in applications like semantic search , recommendation systems and generative AI tasks. ChromaDB is optimized for integrations with various machine learning frameworks and models , allowing you to manage and query large-scale-vectorized data efficiently.

In [2]:
from langchain_community.vectorstores import Chroma 
from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

In [11]:
# Load the documents 
loader = TextLoader("solar_system.txt")
data = loader.load()
data 


[Document(metadata={'source': 'solar_system.txt'}, page_content="The Solar System is a vast and fascinating part of the universe, consisting of the Sun and everything bound to it by gravity. This includes eight planets, their moons, asteroids, comets, and meteoroids. The Sun, located at the center, is by far the largest object in the Solar System, containing about 99.8% of the system's total mass. The planets orbit the Sun in elliptical paths, with varying distances from the Sun.\n\nThe four inner planets—Mercury, Venus, Earth, and Mars—are terrestrial planets, composed mostly of rock and metal. The outer planets—Jupiter, Saturn, Uranus, and Neptune—are gas giants, consisting mainly of hydrogen and helium. Each planet has its unique characteristics and features, such as rings around Saturn and the massive storm on Jupiter known as the Great Red Spot.\n\nApart from the planets, the Solar System is also home to dwarf planets like Pluto, Ceres, and Eris. These celestial bodies share chara

In [12]:
# split 
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500,chunk_overlap=0)
spilts = text_splitter.split_documents(data)

In [13]:
print(spilts)

[Document(metadata={'source': 'solar_system.txt'}, page_content="The Solar System is a vast and fascinating part of the universe, consisting of the Sun and everything bound to it by gravity. This includes eight planets, their moons, asteroids, comets, and meteoroids. The Sun, located at the center, is by far the largest object in the Solar System, containing about 99.8% of the system's total mass. The planets orbit the Sun in elliptical paths, with varying distances from the Sun."), Document(metadata={'source': 'solar_system.txt'}, page_content='The four inner planets—Mercury, Venus, Earth, and Mars—are terrestrial planets, composed mostly of rock and metal. The outer planets—Jupiter, Saturn, Uranus, and Neptune—are gas giants, consisting mainly of hydrogen and helium. Each planet has its unique characteristics and features, such as rings around Saturn and the massive storm on Jupiter known as the Great Red Spot.'), Document(metadata={'source': 'solar_system.txt'}, page_content='Apart 

In [14]:
embeddings = OllamaEmbeddings(model='mxbai-embed-large')
vectordb = Chroma.from_documents(documents=spilts, embedding=embeddings)
vectordb

<langchain_community.vectorstores.chroma.Chroma at 0x151785703b0>

In [15]:
## query it 
query = "How does the asteroid belt between Mars and Jupiter?"
docs = vectordb.similarity_search(query)
docs[0].page_content

"Asteroids, mostly found in the asteroid belt between Mars and Jupiter, are rocky remnants from the early Solar System. Comets, with their glowing tails, originate from the outer regions of the Solar System and are composed of ice, dust, and rocky material. Meteoroids, which become meteors when they enter Earth's atmosphere, are small particles from comets or asteroids."

In [16]:
# save to the disk 

vectordb = Chroma.from_documents(documents=spilts,embedding=embeddings , persist_directory="./chroma.db")

In [17]:
# load the disk 
db2 = Chroma(persist_directory="./chroma.db" , embedding_function=embeddings)
docs = db2.similarity_search(query)
print(docs[0].page_content)

  db2 = Chroma(persist_directory="./chroma.db" , embedding_function=embeddings)


Asteroids, mostly found in the asteroid belt between Mars and Jupiter, are rocky remnants from the early Solar System. Comets, with their glowing tails, originate from the outer regions of the Solar System and are composed of ice, dust, and rocky material. Meteoroids, which become meteors when they enter Earth's atmosphere, are small particles from comets or asteroids.


In [18]:
# similarity search with score 
docs = vectordb.similarity_search_with_score(query)
docs[0]

(Document(metadata={'source': 'solar_system.txt'}, page_content="Asteroids, mostly found in the asteroid belt between Mars and Jupiter, are rocky remnants from the early Solar System. Comets, with their glowing tails, originate from the outer regions of the Solar System and are composed of ice, dust, and rocky material. Meteoroids, which become meteors when they enter Earth's atmosphere, are small particles from comets or asteroids."),
 181.56572332382945)

In [19]:
## retriever option 
retriever = vectordb.as_retriever()
retriever.invoke(query)[0].page_content

"Asteroids, mostly found in the asteroid belt between Mars and Jupiter, are rocky remnants from the early Solar System. Comets, with their glowing tails, originate from the outer regions of the Solar System and are composed of ice, dust, and rocky material. Meteoroids, which become meteors when they enter Earth's atmosphere, are small particles from comets or asteroids."