### Vectorstore and Retrievers

These vectorstores and retrievers are abstractions provided by LangChain, designed to store and retrieve data from (vector) databases and other sources, used in LLM workflows. Especially in RAG

In [21]:
from langchain_core.documents import Document
from langchain_chroma import Chroma
from langchain_groq import ChatGroq
from langchain_openai import OpenAIEmbeddings
from dotenv import load_dotenv
import os

load_dotenv()

True

In [22]:
model = ChatGroq(model="llama-3.1-8b-instant")
model

ChatGroq(profile={'max_input_tokens': 131072, 'max_output_tokens': 8192, 'image_inputs': False, 'audio_inputs': False, 'video_inputs': False, 'image_outputs': False, 'audio_outputs': False, 'video_outputs': False, 'reasoning_output': False, 'tool_calling': True}, client=<groq.resources.chat.completions.Completions object at 0x169919590>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x169a12510>, model_name='llama-3.1-8b-instant', model_kwargs={}, groq_api_key=SecretStr('**********'))

In [23]:
documents = [
    Document(
        page_content = "Dogs are great companions, known for their loyalty and companionship. They are also known for their intelligence and ability to learn new tricks and behaviors.",
        metadata = {"source": "mammal_pet_doc"}
    ),
    Document(
        page_content = "Cats are small, furry, and nocturnal creatures. They are independent and often enjoy their own space.",
        metadata= {"source": "mammal_pet_doc"}
    ),
    Document(
        page_content= "Fish are a diverse group of aquatic creatures, known for their unique characteristics and behavior. They are relatively easy to take care.",
        metadata= {"source": "fish_pet_doc"}
    ),
    Document(
        page_content= "Parrots are intelligent birds capable of mimiking human speach",
        metadata= {"source": "bird_ped_doc"}
    ),
    Document(
        page_content= "Rabbits are social animals, they require plenty of space to hop around and eat.",
        metadata= {"source": "mammal_pet_doc"}
    )
]

In [24]:
documents

[Document(metadata={'source': 'mammal_pet_doc'}, page_content='Dogs are great companions, known for their loyalty and companionship. They are also known for their intelligence and ability to learn new tricks and behaviors.'),
 Document(metadata={'source': 'mammal_pet_doc'}, page_content='Cats are small, furry, and nocturnal creatures. They are independent and often enjoy their own space.'),
 Document(metadata={'source': 'fish_pet_doc'}, page_content='Fish are a diverse group of aquatic creatures, known for their unique characteristics and behavior. They are relatively easy to take care.'),
 Document(metadata={'source': 'bird_ped_doc'}, page_content='Parrots are intelligent birds capable of mimiking human speach'),
 Document(metadata={'source': 'mammal_pet_doc'}, page_content='Rabbits are social animals, they require plenty of space to hop around and eat.')]

In [25]:
# embeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

In [26]:
# vectorstore
chroma_store = Chroma.from_documents(documents, embeddings)


In [27]:
chroma_store.similarity_search("cat")

[Document(id='8e2bf76b-bb8f-4984-8bb8-3a752f62559c', metadata={'source': 'mammal_pet_doc'}, page_content='Cats are small, furry, and nocturnal creatures. They are independent and often enjoy their own space.'),
 Document(id='c613baa4-2a37-48b5-a145-efc366a71f25', metadata={'source': 'fish_pet_doc'}, page_content='Fish are a diverse group of aquatic creatures, known for their unique characteristics and behavior. They are relatively easy to take care.'),
 Document(id='e89b4af7-1c62-4d11-b4f0-c2f7c5ac539a', metadata={'source': 'mammal_pet_doc'}, page_content='Dogs are great companions, known for their loyalty and companionship. They are also known for their intelligence and ability to learn new tricks and behaviors.'),
 Document(id='fecaaafe-7476-4d41-a911-0bcef523e5f8', metadata={'source': 'mammal_pet_doc'}, page_content='Rabbits are social animals, they require plenty of space to hop around and eat.')]

In [28]:
# async query

query = "cat"
results = await chroma_store.asimilarity_search(query, k=2)
print(results)

[Document(id='8e2bf76b-bb8f-4984-8bb8-3a752f62559c', metadata={'source': 'mammal_pet_doc'}, page_content='Cats are small, furry, and nocturnal creatures. They are independent and often enjoy their own space.'), Document(id='c613baa4-2a37-48b5-a145-efc366a71f25', metadata={'source': 'fish_pet_doc'}, page_content='Fish are a diverse group of aquatic creatures, known for their unique characteristics and behavior. They are relatively easy to take care.')]


In [30]:
chroma_store.similarity_search_with_score("cat")

[(Document(id='8e2bf76b-bb8f-4984-8bb8-3a752f62559c', metadata={'source': 'mammal_pet_doc'}, page_content='Cats are small, furry, and nocturnal creatures. They are independent and often enjoy their own space.'),
  1.1800775527954102),
 (Document(id='c613baa4-2a37-48b5-a145-efc366a71f25', metadata={'source': 'fish_pet_doc'}, page_content='Fish are a diverse group of aquatic creatures, known for their unique characteristics and behavior. They are relatively easy to take care.'),
  1.5863585472106934),
 (Document(id='e89b4af7-1c62-4d11-b4f0-c2f7c5ac539a', metadata={'source': 'mammal_pet_doc'}, page_content='Dogs are great companions, known for their loyalty and companionship. They are also known for their intelligence and ability to learn new tricks and behaviors.'),
  1.6029309034347534),
 (Document(id='fecaaafe-7476-4d41-a911-0bcef523e5f8', metadata={'source': 'mammal_pet_doc'}, page_content='Rabbits are social animals, they require plenty of space to hop around and eat.'),
  1.62816238

### Retrievers

Vectorstores are not subclass of Runnables in LangChain, and hence cannot be called in a chain

Where as Retrievers are Runnables and implement standard methods such as invoke() (both synchronus and asynchronus) and are designed to be incorporated in LCEL

Lets see a simple implementation below

In [31]:
from typing import List
from langchain_core.runnables import RunnableLambda

In [35]:
vs_retriever = RunnableLambda(chroma_store.similarity_search).bind(k=1)
vs_retriever.batch(["cat", "dog"])

[[Document(id='8e2bf76b-bb8f-4984-8bb8-3a752f62559c', metadata={'source': 'mammal_pet_doc'}, page_content='Cats are small, furry, and nocturnal creatures. They are independent and often enjoy their own space.')],
 [Document(id='e89b4af7-1c62-4d11-b4f0-c2f7c5ac539a', metadata={'source': 'mammal_pet_doc'}, page_content='Dogs are great companions, known for their loyalty and companionship. They are also known for their intelligence and ability to learn new tricks and behaviors.')]]

vectorstore also implements as_retrievers which generates a retriever, specifically as VectorStoreRetriever. Lets see how that is done.

In [39]:
retriever = chroma_store.as_retriever(
    search_kwargs={"k": 1},
    search_type="similarity",
)
retriever.batch(["cat", "dog"])

[[Document(id='8e2bf76b-bb8f-4984-8bb8-3a752f62559c', metadata={'source': 'mammal_pet_doc'}, page_content='Cats are small, furry, and nocturnal creatures. They are independent and often enjoy their own space.')],
 [Document(id='e89b4af7-1c62-4d11-b4f0-c2f7c5ac539a', metadata={'source': 'mammal_pet_doc'}, page_content='Dogs are great companions, known for their loyalty and companionship. They are also known for their intelligence and ability to learn new tricks and behaviors.')]]

Let us now integrate vectorstore and retrievers in our chain

In [40]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

message = """
Answer the question as best you can, based on the provided context only.

Context: {context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_messages(messages=[('human', message)])

In [41]:
rag_chain = {"context": retriever, "question": RunnablePassthrough()} | prompt | model

In [43]:
response = rag_chain.invoke("which pet is considered a good companion?")
response.content

'Dogs.'