## VECTOR STORES

Langchain Vector Store and retrieve abstractions. These abstractions are designed to support retrieval of data -- from vector databases and other sources -- for integrations with LLM workflows. They are important for applications that fetch data to be reasoned as part of model inference as in the case of retrieval augmented generation

In [1]:
from langchain_core.documents import Document

documents = [
    Document(
        page_content="Dogs are great companions, known for their loyalty and friendliness.",
        metadata={"source-doc": "mammal-pet-doc"},
    ),
    Document(
        page_content="Cats are independent pets, valued for their curiosity, agility, and affectionate behavior.",
        metadata={"source-doc": "mammal-pet-doc"},
    ),
    Document(
        page_content="Goldfish are popular aquatic pets, appreciated for their vibrant colors and relatively simple care requirements.",
        metadata={"source-doc": "fish-pet-doc"},
    ),
    Document(
        page_content="Parrots are intelligent birds known for their ability to mimic sounds and their colorful feathers.",
        metadata={"source-doc": "bird-pet-doc"},
    ),
    Document(
        page_content="Rabbits are gentle herbivorous mammals, often kept as pets for their soft fur and playful nature.",
        metadata={"source-doc": "mammal-pet-doc"},
    ),
]


In [2]:
documents


[Document(metadata={'source-doc': 'mammal-pet-doc'}, page_content='Dogs are great companions, known for their loyalty and friendliness.'),
 Document(metadata={'source-doc': 'mammal-pet-doc'}, page_content='Cats are independent pets, valued for their curiosity, agility, and affectionate behavior.'),
 Document(metadata={'source-doc': 'fish-pet-doc'}, page_content='Goldfish are popular aquatic pets, appreciated for their vibrant colors and relatively simple care requirements.'),
 Document(metadata={'source-doc': 'bird-pet-doc'}, page_content='Parrots are intelligent birds known for their ability to mimic sounds and their colorful feathers.'),
 Document(metadata={'source-doc': 'mammal-pet-doc'}, page_content='Rabbits are gentle herbivorous mammals, often kept as pets for their soft fur and playful nature.')]

In [14]:
import os 
from dotenv import load_dotenv
load_dotenv()
# Workaround: Force import of LanguageModelInput to fix dynamic import issue
from langchain_core.language_models.base import LanguageModelInput

from langchain_groq import ChatGroq

groq_api_key = os.getenv("GROQ_API_KEY")
os.environ["HF_TOKEN"] = os.getenv("HF_TOKEN")

llm = ChatGroq(groq_api_key=groq_api_key, model="llama-3.1-8b-instant")
llm


ChatGroq(profile={'max_input_tokens': 131072, 'max_output_tokens': 8192, 'image_inputs': False, 'audio_inputs': False, 'video_inputs': False, 'image_outputs': False, 'audio_outputs': False, 'video_outputs': False, 'reasoning_output': False, 'tool_calling': True}, client=<groq.resources.chat.completions.Completions object at 0x00000257DBEE2290>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x00000257DBEE2320>, model_name='llama-3.1-8b-instant', model_kwargs={}, groq_api_key=SecretStr('**********'))

In [4]:
from langchain_huggingface import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

In [5]:
## VectorStores
from langchain_chroma import Chroma

vectorStore = Chroma.from_documents(documents,embedding=embeddings)



In [6]:
vectorStore.similarity_search("cat")

[Document(id='50f2fdc2-0d91-4a85-89ba-fac66fd0c19a', metadata={'source-doc': 'mammal-pet-doc'}, page_content='Cats are independent pets, valued for their curiosity, agility, and affectionate behavior.'),
 Document(id='b2b36756-798a-4243-bdc0-fc3b7bfe12e3', metadata={'source-doc': 'mammal-pet-doc'}, page_content='Rabbits are gentle herbivorous mammals, often kept as pets for their soft fur and playful nature.'),
 Document(id='4253eb80-4f99-4a11-971e-8502fab82f75', metadata={'source-doc': 'mammal-pet-doc'}, page_content='Dogs are great companions, known for their loyalty and friendliness.'),
 Document(id='4f56664a-d9cc-48eb-9765-0ad74fd7ade0', metadata={'source-doc': 'bird-pet-doc'}, page_content='Parrots are intelligent birds known for their ability to mimic sounds and their colorful feathers.')]

In [7]:
## Async Search
await vectorStore.asimilarity_search("cat")

[Document(id='50f2fdc2-0d91-4a85-89ba-fac66fd0c19a', metadata={'source-doc': 'mammal-pet-doc'}, page_content='Cats are independent pets, valued for their curiosity, agility, and affectionate behavior.'),
 Document(id='b2b36756-798a-4243-bdc0-fc3b7bfe12e3', metadata={'source-doc': 'mammal-pet-doc'}, page_content='Rabbits are gentle herbivorous mammals, often kept as pets for their soft fur and playful nature.'),
 Document(id='4253eb80-4f99-4a11-971e-8502fab82f75', metadata={'source-doc': 'mammal-pet-doc'}, page_content='Dogs are great companions, known for their loyalty and friendliness.'),
 Document(id='4f56664a-d9cc-48eb-9765-0ad74fd7ade0', metadata={'source-doc': 'bird-pet-doc'}, page_content='Parrots are intelligent birds known for their ability to mimic sounds and their colorful feathers.')]

In [8]:
vectorStore.similarity_search_with_score("cat")

[(Document(id='50f2fdc2-0d91-4a85-89ba-fac66fd0c19a', metadata={'source-doc': 'mammal-pet-doc'}, page_content='Cats are independent pets, valued for their curiosity, agility, and affectionate behavior.'),
  0.9809685945510864),
 (Document(id='b2b36756-798a-4243-bdc0-fc3b7bfe12e3', metadata={'source-doc': 'mammal-pet-doc'}, page_content='Rabbits are gentle herbivorous mammals, often kept as pets for their soft fur and playful nature.'),
  1.473861575126648),
 (Document(id='4253eb80-4f99-4a11-971e-8502fab82f75', metadata={'source-doc': 'mammal-pet-doc'}, page_content='Dogs are great companions, known for their loyalty and friendliness.'),
  1.574089765548706),
 (Document(id='4f56664a-d9cc-48eb-9765-0ad74fd7ade0', metadata={'source-doc': 'bird-pet-doc'}, page_content='Parrots are intelligent birds known for their ability to mimic sounds and their colorful feathers.'),
  1.7242729663848877)]

## RETRIEVERS

LangChain VectorStores objects do not subclass Runnable, and so cannot immediately be integrated into LangChain Expression Language Chains

In [9]:
from typing import List

from langchain_core.documents import Document
from langchain_core.runnables import RunnableLambda

retriever = RunnableLambda(vectorStore.similarity_search).bind(k=1)
retriever.batch(["cat","dog"])

[[Document(id='50f2fdc2-0d91-4a85-89ba-fac66fd0c19a', metadata={'source-doc': 'mammal-pet-doc'}, page_content='Cats are independent pets, valued for their curiosity, agility, and affectionate behavior.')],
 [Document(id='4253eb80-4f99-4a11-971e-8502fab82f75', metadata={'source-doc': 'mammal-pet-doc'}, page_content='Dogs are great companions, known for their loyalty and friendliness.')]]

VectorStores implement an as_retriever method that will generate a Retriever, specifically a VectorStoreRetriever.
These retrievers include search_type and search_kwargs attributes that identify what methods of the underlying vector store to call,
how to parameterize them. For instance we can replicate the above with following

In [10]:
retriever = vectorStore.as_retriever(
    search_type="similarity",
    search_kwargs={"k":1}
)

retriever.batch(["cat","dog"])

[[Document(id='50f2fdc2-0d91-4a85-89ba-fac66fd0c19a', metadata={'source-doc': 'mammal-pet-doc'}, page_content='Cats are independent pets, valued for their curiosity, agility, and affectionate behavior.')],
 [Document(id='4253eb80-4f99-4a11-971e-8502fab82f75', metadata={'source-doc': 'mammal-pet-doc'}, page_content='Dogs are great companions, known for their loyalty and friendliness.')]]

In [15]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough


message = """ 
 Answer this question using the provided context only.

{question}

Context :
{context} 
"""

# Format documents to string for context
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Fix: Use tuple format for from_messages
prompt = ChatPromptTemplate.from_messages([("human", message)])

# Fix: Format documents and pass question correctly
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
)

response = rag_chain.invoke("tell me about dogs")

print(response.content)

Dogs are known for their loyalty and friendliness.
