##### Chroma

Chroma is Ai-native open- source vector database focused on developer productivity and happiness. chroma is licensed under Apache 2.0

In [3]:
### building a simple vectordb
from langchain_chroma import Chroma

In [5]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

In [6]:
loader= TextLoader("speech.txt")
data = loader.load()
data

[Document(metadata={'source': 'speech.txt'}, page_content="Today, I want to highlight a crucial element in the success of machine learning \napplications: data ingestion, particularly within the LangChain framework.\n\nData ingestion is the foundation that enables our language models to thrive. \nIt involves the systematic collection and preparation of data from various sources, ensuring that our models receive high-quality, relevant information. \nIn the context of LangChain, this process is vital for transforming raw data into actionable insights.\n\nWhy is data ingestion so important? First, the quality of the data directly influences the model's performance. \nBy providing diverse and accurate datasets, we empower our models to generate meaningful responses and understand context effectively. \nFurthermore, as we scale our applications, efficient data ingestion becomes essential. LangChain offers robust tools and integrations to simplify this process, allowing developers to automat

In [7]:
##split
text_spliter = RecursiveCharacterTextSplitter(chunk_size=500,chunk_overlap=0)
splits= text_spliter.split_documents(data)

In [8]:
embedding = OllamaEmbeddings()
vectordb = Chroma.from_documents(documents=splits, embedding=embedding)
vectordb

  embedding = OllamaEmbeddings()


<langchain_chroma.vectorstores.Chroma at 0x28fa76bde40>

In [10]:
### qurey it

qurey = "is data ingestion so important? First, the quality"
docs= vectordb.similarity_search(qurey)
docs[0].page_content

Number of requested results 4 is greater than number of elements in index 3, updating n_results = 3


'Today, I want to highlight a crucial element in the success of machine learning \napplications: data ingestion, particularly within the LangChain framework.\n\nData ingestion is the foundation that enables our language models to thrive. \nIt involves the systematic collection and preparation of data from various sources, ensuring that our models receive high-quality, relevant information. \nIn the context of LangChain, this process is vital for transforming raw data into actionable insights.'

In [11]:
###save to the desk 

vectordb = Chroma.from_documents(documents=splits, embedding=embedding, persist_directory="./chroma_db")

In [13]:
db2 = Chroma(persist_directory="./chroma_db", embedding_function=embedding)
docs = db2.similarity_search(qurey)
print(docs[0].page_content)

Number of requested results 4 is greater than number of elements in index 3, updating n_results = 3


Today, I want to highlight a crucial element in the success of machine learning 
applications: data ingestion, particularly within the LangChain framework.

Data ingestion is the foundation that enables our language models to thrive. 
It involves the systematic collection and preparation of data from various sources, ensuring that our models receive high-quality, relevant information. 
In the context of LangChain, this process is vital for transforming raw data into actionable insights.


In [14]:
#### similarity search with score

docs= vectordb.similarity_search_with_score(qurey)
docs

Number of requested results 4 is greater than number of elements in index 3, updating n_results = 3


[(Document(metadata={'source': 'speech.txt'}, page_content='Today, I want to highlight a crucial element in the success of machine learning \napplications: data ingestion, particularly within the LangChain framework.\n\nData ingestion is the foundation that enables our language models to thrive. \nIt involves the systematic collection and preparation of data from various sources, ensuring that our models receive high-quality, relevant information. \nIn the context of LangChain, this process is vital for transforming raw data into actionable insights.'),
  19838.82554154424),
 (Document(metadata={'source': 'speech.txt'}, page_content='In conclusion, prioritizing effective data ingestion not only enhances the capabilities of our language models but also leads to more impactful applications. \nBy leveraging the tools available in LangChain, we can unlock the full potential of our data and deliver exceptional user experiences.'),
  27097.61618238708),
 (Document(metadata={'source': 'speech

In [15]:
### REtreriver option

retriever = vectordb.as_retriever()
retriever.invoke(qurey)[0].page_content

Number of requested results 4 is greater than number of elements in index 3, updating n_results = 3


'Today, I want to highlight a crucial element in the success of machine learning \napplications: data ingestion, particularly within the LangChain framework.\n\nData ingestion is the foundation that enables our language models to thrive. \nIt involves the systematic collection and preparation of data from various sources, ensuring that our models receive high-quality, relevant information. \nIn the context of LangChain, this process is vital for transforming raw data into actionable insights.'