# SESSION 10 : Retrievers in LangChain | Generative AI using LangChain | Video 13

https://youtu.be/pJdMxwXBsk0?list=PLKnIA16_RmvaTbihpo4MtzVm4XOQa0ER0

### What are Retrievers

- A retriever is a component in LangChain that __fetches relevant documents__ from a data source in response to a user’s query. 



- __There are multiple types of retrievers__


- All retrievers in LangChain are runnables


* They are used in **RAG pipelines** (Retrieve → Augment → Generate).


* Input = query string, Output = list of `Document` objects.

![Screenshot%202025-08-25%20031833.png](attachment:Screenshot%202025-08-25%20031833.png)

## Types of Retrievers (data source based) : 

### Wikipedia Retriever

A Wikipedia Retriever is a retriever that queries the Wikipedia API to fetch relevant content for a given query.

![image.png](attachment:image.png)

code : https://colab.research.google.com/drive/1vuuIYmJeiRgFHsH-ibH_NUFjtdc5D9P6?usp=sharing

## Types of Retrievers (Strategy based) :

### 1. **VectorStoreRetriever** (most common)


- A Vector Store Retriever in LangChain is the most common type of retriever that lets you search and fetch documents from a vector store based on __semantic similarity using vector embeddings.__

![image.png](attachment:image.png)

```python
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

docs = ["LangChain is great", "I love pizza", "LLMs are powerful"]
embeddings = OpenAIEmbeddings()
db = FAISS.from_texts(docs, embeddings)

retriever = db.as_retriever(search_kwargs={"k": 2})
print(retriever.invoke("Tell me about LangChain"))
```

### 2. **MultiQueryRetriever**

Expands the query into multiple variations → retrieves more relevant docs.

```python
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_openai import ChatOpenAI

retriever = MultiQueryRetriever.from_llm(retriever=db.as_retriever(), llm=ChatOpenAI())
print(retriever.invoke("What is LangChain?"))
```

### 3. **ParentDocumentRetriever**

Keeps large parent docs but indexes smaller chunks → returns whole parent.

```python
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=50)
docstore = InMemoryStore()

retriever = ParentDocumentRetriever(
    vectorstore=db, docstore=docstore, child_splitter=splitter
)
```

### 4. **EnsembleRetriever**

Combines multiple retrievers (e.g., vector + keyword) with weights.

```python
from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever

bm25 = BM25Retriever.from_texts(docs)
ensemble = EnsembleRetriever(retrievers=[db.as_retriever(), bm25], weights=[0.5, 0.5])

print(ensemble.invoke("LangChain use cases"))
```

### 5. **TimeWeightedVectorStoreRetriever**

Prioritizes documents by **recency + relevance**.

```python
from langchain.retrievers import TimeWeightedVectorStoreRetriever

retriever = TimeWeightedVectorStoreRetriever(vectorstore=db, decay_rate=0.01)
print(retriever.invoke("LangChain"))
```

### 6. **ContextualCompressionRetriever**

Uses an LLM/compression step to **shrink retrieved docs** to most relevant content.

```python
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor
from langchain_openai import ChatOpenAI

compressor = LLMChainExtractor.from_llm(ChatOpenAI())
compression_retriever = ContextualCompressionRetriever(base_compressor=compressor, base_retriever=db.as_retriever())

print(compression_retriever.invoke("Summarize LangChain"))
```

✅ **In short**:

* **VectorStoreRetriever** → embeddings + similarity search


* **MultiQueryRetriever** → multiple query variations


* **ParentDocumentRetriever** → retrieves parent docs for context


* **EnsembleRetriever** → combines different retrievers


* **TimeWeightedVectorStoreRetriever** → time + relevance aware


* **ContextualCompressionRetriever** → compresses retrieved docs
