## Vector Database 
A vector database is designed to store and manage high-dimensional vectors, which are often the outputs of machine learning models, particularly embeddings from neural networks. These vectors represent data points (such as words, images, or any other object) in numerical form, enabling efficient similarity searches. The goal is to find data points (vectors) that are close to a given query vector using distance metrics like Euclidean distance, cosine similarity, etc.

#### Why Use a Vector Database?
- Fast Similarity Search: Quickly find similar vectors (documents, images, etc.) for recommendations or matching.
- Scalability: Designed to handle millions or billions of vectors.
- Embedding-based Search: Useful in scenarios where traditional keyword search doesn't work well (e.g., semantic search, recommendation engines).

   Popular vector databases include FAISS, Pinecone, and ChromaDB.

   Links - https://python.langchain.com/docs/integrations/vectorstores/

## ChromaDB 
ChromaDB is an open-source vector database built for embeddings, particularly used in applications like semantic search, recommendation systems, and generative AI tasks. ChromaDB is optimized for integrations with various machine learning frameworks and models, allowing you to manage and query large-scale vectorized data efficiently.

In [11]:
from langchain_community.vectorstores import Chroma
from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

In [12]:
# Load your documents
loader = TextLoader("solar_system.txt")
data = loader.load()
data



[Document(metadata={'source': 'solar_system.txt'}, page_content="The Solar System is a vast and fascinating part of the universe, consisting of the Sun and everything bound to it by gravity. This includes eight planets, their moons, asteroids, comets, and meteoroids. The Sun, located at the center, is by far the largest object in the Solar System, containing about 99.8% of the system's total mass. The planets orbit the Sun in elliptical paths, with varying distances from the Sun.\n\nThe four inner planets—Mercury, Venus, Earth, and Mars—are terrestrial planets, composed mostly of rock and metal. The outer planets—Jupiter, Saturn, Uranus, and Neptune—are gas giants, consisting mainly of hydrogen and helium. Each planet has its unique characteristics and features, such as rings around Saturn and the massive storm on Jupiter known as the Great Red Spot.\n\nApart from the planets, the Solar System is also home to dwarf planets like Pluto, Ceres, and Eris. These celestial bodies share chara

In [13]:
# Split
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
splits = text_splitter.split_documents(data)

In [14]:
embedding=OllamaEmbeddings(model='mxbai-embed-large')
vectordb=Chroma.from_documents(documents=splits,embedding=embedding)
vectordb

<langchain_community.vectorstores.chroma.Chroma at 0x19265ab4f20>

In [15]:
## query it
query = "How does the asteroid belt between Mars and Jupiter ?"
docs = vectordb.similarity_search(query)
docs[0].page_content

"Asteroids, mostly found in the asteroid belt between Mars and Jupiter, are rocky remnants from the early Solar System. Comets, with their glowing tails, originate from the outer regions of the Solar System and are composed of ice, dust, and rocky material. Meteoroids, which become meteors when they enter Earth's atmosphere, are small particles from comets or asteroids."

In [16]:
# save to the disk

vectordb = Chroma.from_documents(documents=splits,embedding=embedding, persist_directory="./chroma.db")

In [23]:
# load from disk
db2 = Chroma(persist_directory="./chroma_db", embedding_function=embedding)
docs=db2.similarity_search(query)
docs[0].page_content

IndexError: list index out of range

In [17]:
## similarity Search With Score
docs = vectordb.similarity_search_with_score(query)
docs

[(Document(metadata={'source': 'solar_system.txt'}, page_content="Asteroids, mostly found in the asteroid belt between Mars and Jupiter, are rocky remnants from the early Solar System. Comets, with their glowing tails, originate from the outer regions of the Solar System and are composed of ice, dust, and rocky material. Meteoroids, which become meteors when they enter Earth's atmosphere, are small particles from comets or asteroids."),
  181.56572332382945),
 (Document(metadata={'source': 'solar_system.txt'}, page_content="Asteroids, mostly found in the asteroid belt between Mars and Jupiter, are rocky remnants from the early Solar System. Comets, with their glowing tails, originate from the outer regions of the Solar System and are composed of ice, dust, and rocky material. Meteoroids, which become meteors when they enter Earth's atmosphere, are small particles from comets or asteroids."),
  181.56572332382945),
 (Document(metadata={'source': 'solar_system.txt'}, page_content='Apart 

In [18]:
### Retriever option
retriever=vectordb.as_retriever()
retriever.invoke(query)[0].page_content

"Asteroids, mostly found in the asteroid belt between Mars and Jupiter, are rocky remnants from the early Solar System. Comets, with their glowing tails, originate from the outer regions of the Solar System and are composed of ice, dust, and rocky material. Meteoroids, which become meteors when they enter Earth's atmosphere, are small particles from comets or asteroids."