# Vector Stores and Retrievers

-- it will explain integration with langchain and vectorestore for data retiveral using reterivers abstraction.It's important when we working in Rag applications and llm data fetching.

## Document --
LangChain implements a Document abstraction to represent text and associated metadata, facilitating document handling within its framework. This abstraction allows for the storage and retrieval of text content alongside contextual information, enabling efficient document-based operations in applications like semantic search and information retrieval. 
Here's a breakdown of how LangChain implements document abstraction:
1. The Document Class:
At its core, LangChain uses a Document class to represent a unit of text. 
This class has three main attributes:
pageContent: A string containing the text content of the document. 
metadata: A dictionary holding arbitrary metadata associated with the document, such as the source, author, or creation date. 
id: An optional string identifier for the document. 
This structure allows LangChain to manage both the textual data and its contextual information, which is crucial for many applications

In [2]:
from langchain_core.documents import Document

documents = [
    Document(
        page_content="Dogs are great companions, known for their loyalty and friendliness.",
        metadata={"source": "mammal-pets-doc"}
    ),
    Document(
        page_content="Cats are independent pets that often enjoy their own space.",
        metadata={"source": "mammal-pets-doc"}
    ),
    Document(
        page_content="Parrots are highly intelligent birds that can mimic human speech.",
        metadata={"source": "bird-pets-doc"}
    ),
    Document(
        page_content="Goldfish are low-maintenance aquatic pets ideal for beginners.",
        metadata={"source": "fish-pets-doc"}
    ),
    Document(
        page_content="Hamsters are small rodents that enjoy running on wheels and burrowing.",
        metadata={"source": "rodent-pets-doc"}
    ),
    Document(
        page_content="Rabbits are social animals that can be litter trained and love to hop around.",
        metadata={"source": "mammal-pets-doc"}
    ),
    Document(
        page_content="Turtles have long lifespans and require both water and dry basking areas.",
        metadata={"source": "reptile-pets-doc"}
    ),
    Document(
        page_content="Snakes can be docile pets, but they require a specific habitat and diet.",
        metadata={"source": "reptile-pets-doc"}
    ),
    Document(
        page_content="Guinea pigs are gentle and vocal, making them great for young children.",
        metadata={"source": "rodent-pets-doc"}
    ),
    Document(
        page_content="Ferrets are playful and curious pets that need plenty of exercise.",
        metadata={"source": "mammal-pets-doc"}
    ),
]


In [3]:
documents

[Document(metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great companions, known for their loyalty and friendliness.'),
 Document(metadata={'source': 'mammal-pets-doc'}, page_content='Cats are independent pets that often enjoy their own space.'),
 Document(metadata={'source': 'bird-pets-doc'}, page_content='Parrots are highly intelligent birds that can mimic human speech.'),
 Document(metadata={'source': 'fish-pets-doc'}, page_content='Goldfish are low-maintenance aquatic pets ideal for beginners.'),
 Document(metadata={'source': 'rodent-pets-doc'}, page_content='Hamsters are small rodents that enjoy running on wheels and burrowing.'),
 Document(metadata={'source': 'mammal-pets-doc'}, page_content='Rabbits are social animals that can be litter trained and love to hop around.'),
 Document(metadata={'source': 'reptile-pets-doc'}, page_content='Turtles have long lifespans and require both water and dry basking areas.'),
 Document(metadata={'source': 'reptile-pets-doc'}, pa

In [4]:
import os
from dotenv import load_dotenv
from langchain_groq import ChatGroq
load_dotenv()
groq_api_key = os.getenv("GROQ_API_KEY")
os.environ["HF_TOKEN"] = os.getenv("HF_TOKEN")

llm = ChatGroq(groq_api_key=groq_api_key,model = "Llama3-8b-8192")
llm

ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x00000200CED91DB0>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x00000200CEEC99F0>, model_name='Llama3-8b-8192', model_kwargs={}, groq_api_key=SecretStr('**********'))

In [5]:
from langchain_huggingface import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name = "all-MiniLM-L6-v2")

In [6]:
### VectorStore
from langchain_chroma import Chroma
vectorstore = Chroma.from_documents(documents,embeddings)
vectorstore

<langchain_chroma.vectorstores.Chroma at 0x200cd10eb30>

In [7]:
vectorstore.similarity_search("cat")

[Document(id='dc6182e4-b821-4eb5-a6f8-b5f22860a257', metadata={'source': 'mammal-pets-doc'}, page_content='Cats are independent pets that often enjoy their own space.'),
 Document(id='5deb6a2f-23d9-434f-9c23-48325429efc1', metadata={'source': 'mammal-pets-doc'}, page_content='Ferrets are playful and curious pets that need plenty of exercise.'),
 Document(id='18f22ecf-617f-4e29-92ad-561058027bc9', metadata={'source': 'mammal-pets-doc'}, page_content='Rabbits are social animals that can be litter trained and love to hop around.'),
 Document(id='fdeaf258-d43a-4c89-b77d-1572657a33de', metadata={'source': 'rodent-pets-doc'}, page_content='Hamsters are small rodents that enjoy running on wheels and burrowing.')]

In [8]:
###ASYNC query
await vectorstore.asimilarity_search("cat")

[Document(id='dc6182e4-b821-4eb5-a6f8-b5f22860a257', metadata={'source': 'mammal-pets-doc'}, page_content='Cats are independent pets that often enjoy their own space.'),
 Document(id='5deb6a2f-23d9-434f-9c23-48325429efc1', metadata={'source': 'mammal-pets-doc'}, page_content='Ferrets are playful and curious pets that need plenty of exercise.'),
 Document(id='18f22ecf-617f-4e29-92ad-561058027bc9', metadata={'source': 'mammal-pets-doc'}, page_content='Rabbits are social animals that can be litter trained and love to hop around.'),
 Document(id='fdeaf258-d43a-4c89-b77d-1572657a33de', metadata={'source': 'rodent-pets-doc'}, page_content='Hamsters are small rodents that enjoy running on wheels and burrowing.')]

In [9]:
vectorstore.similarity_search_with_score("cat")

[(Document(id='dc6182e4-b821-4eb5-a6f8-b5f22860a257', metadata={'source': 'mammal-pets-doc'}, page_content='Cats are independent pets that often enjoy their own space.'),
  0.9351056814193726),
 (Document(id='5deb6a2f-23d9-434f-9c23-48325429efc1', metadata={'source': 'mammal-pets-doc'}, page_content='Ferrets are playful and curious pets that need plenty of exercise.'),
  1.4927723407745361),
 (Document(id='18f22ecf-617f-4e29-92ad-561058027bc9', metadata={'source': 'mammal-pets-doc'}, page_content='Rabbits are social animals that can be litter trained and love to hop around.'),
  1.506852388381958),
 (Document(id='fdeaf258-d43a-4c89-b77d-1572657a33de', metadata={'source': 'rodent-pets-doc'}, page_content='Hamsters are small rodents that enjoy running on wheels and burrowing.'),
  1.5450692176818848)]

### Retrievers
VectorStores: These are primarily for storing and querying embeddings (vector representations of data). They implement methods like similarity_search_with_score.
Retrievers: These act as interfaces that accept a query and return relevant documents. They are more general than vector stores and can interface with various data sources, not just vector stores. 
The Key takeaway: While you can get a Retriever from a VectorStore (using a method like .as_retriever()), the VectorStore itself is not a Runnable. The Retriever, which can be constructed from a VectorStore, is a Runnable and therefore has methods like invoke. Retrievers implement the standard Runnable interface, allowing for synchronous and asynchronous operations, batching, and integration into LangChain Expression Language (LCEL) chains.

In [10]:
retriver =vectorstore.as_retriever(
    search_type = "similarity",
    search_kwargs={"k":1}

)
retriver.batch(["cat","dog"])

[[Document(id='dc6182e4-b821-4eb5-a6f8-b5f22860a257', metadata={'source': 'mammal-pets-doc'}, page_content='Cats are independent pets that often enjoy their own space.')],
 [Document(id='02e85714-dcec-4589-b23e-6cfc73a01ad0', metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great companions, known for their loyalty and friendliness.')]]

In [14]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

message = """
          Answer this question using the provided context only.
          {question}

          Context:
          {context}
         """
prompt = ChatPromptTemplate.from_messages(
    [
        ("human",message)
    ]
)
rag_chain ={
    "context":retriver,
    "question":RunnablePassthrough()
} | prompt | llm

response = rag_chain.invoke("tell me about dogs?")
response.content

'According to the provided context, dogs are great companions, known for their loyalty and friendliness.'