# Create and query a Chroma DB
Sample of how to store some texts in a Chroma Vector Database using state-of-the-art sentence-transformers and oriented to creating a RAG system.

In [1]:
# https://python.langchain.com/docs/integrations/vectorstores/chroma/

In [2]:
#!pip install chromadb sentence-transformers langchain_huggingface langchain_chroma

Create a list of text to store in vector database

In [3]:
textos = ["I had chocolate chip pancakes and scrambled eggs for breakfast this morning.",
          "The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",
          "Building an exciting new project with LangChain - come check it out!",
          "Robbers broke into the city bank and stole $1 million in cash.",
          "Wow! That was an amazing movie. I can't wait to see it again.",
          "Is the new iPhone worth the price? Read this review to find out.",
          "The top 10 soccer players in the world right now.",
          "LangGraph is the best framework for building stateful, agentic applications!",
          "The stock market is down 500 points today due to fears of a recession.",
          "I have a bad feeling I am going to get deleted :(",]

Select **model** for embeddings

In [4]:
from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")

  from .autonotebook import tqdm as notebook_tqdm


Populate database with **texts and embeddings**

In [5]:
from langchain_chroma import Chroma


vector_store = Chroma.from_texts(
    texts=textos,
    collection_name="some_facts",
    embedding=embeddings,
    persist_directory="./chroma_some_facts",
)

**Searching** data in vectorstore

In [6]:
results = vector_store.similarity_search(
    "LangChain provides abstractions to make working with LLMs easy",
    k=3  #number of results
)
for res in results:
    print(res)

page_content='Building an exciting new project with LangChain - come check it out!'
page_content='LangGraph is the best framework for building stateful, agentic applications!'
page_content='Is the new iPhone worth the price? Read this review to find out.'


In [7]:
results = vector_store.similarity_search_with_score(
    "Will it be hot tomorrow?", k=3,
)
for res, score in results:
    print(f"* [SIM={score:3f}] {res.page_content}")

* [SIM=0.809472] The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.
* [SIM=1.515494] I have a bad feeling I am going to get deleted :(
* [SIM=1.540267] I had chocolate chip pancakes and scrambled eggs for breakfast this morning.


We can configure a **RETRIEVER**, a key component in Langchain used to find relevant information from document collections

In [8]:
retriever = vector_store.as_retriever(
    search_type="similarity",  search_kwargs={"k": 3}
)
retriever.invoke("Stealing from the bank is a crime")

[Document(metadata={}, page_content='Robbers broke into the city bank and stole $1 million in cash.'),
 Document(metadata={}, page_content='The stock market is down 500 points today due to fears of a recession.'),
 Document(metadata={}, page_content='I had chocolate chip pancakes and scrambled eggs for breakfast this morning.')]