# LangChain Retrieval Agents

Conversational agents can struggle with data freshness, knowledge about specific domains, or accessing internal documentation. By coupling agents with retrieval augmentation tools we no longer have these problems.

One the other side, using "naive" retrieval augmentation without the use of an agent means we will retrieve contexts with *every* query. Again, this isn't always ideal as not every query requires access to external knowledge.

Merging these methods gives us the best of both worlds. In this notebook we'll learn how to do this.

To begin, we must install the prerequisite libraries that we will be using in this notebook.

In [13]:
!pip install --upgrade datasets numpy langchain langchain_openai langchain-pinecone pinecone-notebooks

Collecting numpy
  Using cached numpy-2.2.4-cp310-cp310-macosx_14_0_arm64.whl.metadata (62 kB)
Using cached numpy-2.2.4-cp310-cp310-macosx_14_0_arm64.whl (5.4 MB)
Installing collected packages: numpy
  Attempting uninstall: numpy
    Found existing installation: numpy 1.26.4
    Uninstalling numpy-1.26.4:
      Successfully uninstalled numpy-1.26.4
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gensim 4.3.3 requires numpy<2.0,>=1.18.5, but you have numpy 2.2.4 which is incompatible.
langchain-chroma 0.2.2 requires numpy<2.0.0,>=1.22.4; python_version < "3.12", but you have numpy 2.2.4 which is incompatible.
scipy 1.12.0 requires numpy<1.29.0,>=1.22.4, but you have numpy 2.2.4 which is incompatible.
tensorflow 2.19.0 requires numpy<2.2.0,>=1.26.0, but you have numpy 2.2.4 which is incompatible.[0m[31m
[0mSuccessfully installed numpy-2.2.4


If you're using a Jupyter notebook or a similar environment, restart the kernel to ensure the changes take effect.

## Building the Knowledge Base

We start by constructing our knowledge base. We'll use a mostly prepared dataset called **S**tanford **Qu**estion-**A**nswering **D**ataset (SQuAD) hosted on Hugging Face *Datasets*. We download it like so:

In [14]:
from datasets import load_dataset

data = load_dataset('squad', split='train')
data

Dataset({
    features: ['id', 'title', 'context', 'question', 'answers'],
    num_rows: 87599
})

The dataset does contain duplicate contexts, which we can remove like so:

In [15]:
data = data.to_pandas()
data.head()

Unnamed: 0,id,title,context,question,answers
0,5733be284776f41900661182,University_of_Notre_Dame,"Architecturally, the school has a Catholic cha...",To whom did the Virgin Mary allegedly appear i...,"{'text': ['Saint Bernadette Soubirous'], 'answ..."
1,5733be284776f4190066117f,University_of_Notre_Dame,"Architecturally, the school has a Catholic cha...",What is in front of the Notre Dame Main Building?,"{'text': ['a copper statue of Christ'], 'answe..."
2,5733be284776f41900661180,University_of_Notre_Dame,"Architecturally, the school has a Catholic cha...",The Basilica of the Sacred heart at Notre Dame...,"{'text': ['the Main Building'], 'answer_start'..."
3,5733be284776f41900661181,University_of_Notre_Dame,"Architecturally, the school has a Catholic cha...",What is the Grotto at Notre Dame?,{'text': ['a Marian place of prayer and reflec...
4,5733be284776f4190066117e,University_of_Notre_Dame,"Architecturally, the school has a Catholic cha...",What sits on top of the Main Building at Notre...,{'text': ['a golden statue of the Virgin Mary'...


In [16]:
data.drop_duplicates(subset='context', keep='first', inplace=True)
data.head()

Unnamed: 0,id,title,context,question,answers
0,5733be284776f41900661182,University_of_Notre_Dame,"Architecturally, the school has a Catholic cha...",To whom did the Virgin Mary allegedly appear i...,"{'text': ['Saint Bernadette Soubirous'], 'answ..."
5,5733bf84d058e614000b61be,University_of_Notre_Dame,"As at most other universities, Notre Dame's st...",When did the Scholastic Magazine of Notre dame...,"{'text': ['September 1876'], 'answer_start': [..."
10,5733bed24776f41900661188,University_of_Notre_Dame,The university is the major seat of the Congre...,Where is the headquarters of the Congregation ...,"{'text': ['Rome'], 'answer_start': [119]}"
15,5733a6424776f41900660f51,University_of_Notre_Dame,The College of Engineering was established in ...,How many BS level degrees are offered in the C...,"{'text': ['eight'], 'answer_start': [487]}"
20,5733a70c4776f41900660f64,University_of_Notre_Dame,All of Notre Dame's undergraduate students are...,What entity provides help with the management ...,"{'text': ['Learning Resource Center'], 'answer..."


### Initialize the Embedding Model and Vector DB

We'll be using OpenAI's `text-embedding-ada-002` model initialize via LangChain and the Pinecone vector DB. We start by initializing the embedding model, for this we need an [OpenAI API key](https://platform.openai.com/).

*(Note that OpenAI is a paid service and so running the remainder of this notebook may incur some small cost)*

In [17]:
import os
from getpass import getpass
from langchain_openai import OpenAIEmbeddings
from dotenv import load_dotenv, find_dotenv

_ = load_dotenv(find_dotenv())
OPENAI_API_KEY  = os.getenv('OPENAI_API_KEY')

# get API key from top-right dropdown on OpenAI website
# OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") or getpass("Enter your OpenAI API key: ")
model_name = 'text-embedding-ada-002'

embed = OpenAIEmbeddings(
    model=model_name,
    openai_api_key=OPENAI_API_KEY
)

Now we create our vector DB to store our vectors. For this we need to get a [free Pinecone API key](https://app.pinecone.io) — the API key can be found in the "API Keys" button found in the left navbar of the Pinecone dashboard.

In [18]:
from pinecone import Pinecone

# initialize connection to pinecone (get API key at app.pinecone.io)
api_key = os.getenv("PINECONE_API_KEY") or getpass("Enter your Pinecone API key: ")

# configure client
pc = Pinecone(api_key=api_key)

Now we setup our index specification, this allows us to define the cloud provider and region where we want to deploy our index. You can find a list of all [available providers and regions here](https://docs.pinecone.io/docs/projects).

In [19]:
from pinecone import ServerlessSpec

spec = ServerlessSpec(
    cloud="aws", region="us-east-1"
)

Creating an index, we set `dimension` equal to to dimensionality of Ada-002 (`1536`), and use a `metric` also compatible with Ada-002 (this can be either `cosine` or `dotproduct`). We also pass our `spec` to index initialization.

In [20]:
import time

# Define the name for our vector index that will power retrieval
index_name = "langchain-retrieval-agent"
# Get list of existing indexes to avoid recreating one that exists
existing_indexes = [
    index_info["name"] for index_info in pc.list_indexes()
]

# Check if our vector index already exists
if index_name not in existing_indexes:
    # Create a new Pinecone vector index with appropriate dimensions for OpenAI embeddings
    pc.create_index(
        index_name,
        dimension=1536,  # OpenAI's text-embedding-ada-002 produces 1536-dimensional unit vectors that are already normalised
        metric='dotproduct',  # Dotproduct is faster than cosine for normalized embeddings, since we forgoe the normalisation step
        spec=spec  # Configuration for index storage and performance characteristics
    )
    # Poll until index is ready for use - initialization takes time
    while not pc.describe_index(index_name).status['ready']:
        time.sleep(1)

# Connect to our vector index now that we know it exists
index = pc.Index(index_name)
time.sleep(1)  # Brief pause to ensure connection is established
# Check index statistics - useful to confirm document count and vector dimensions
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'metric': 'dotproduct',
 'namespaces': {'': {'vector_count': 18891}},
 'total_vector_count': 18891,
 'vector_type': 'dense'}

We should see that the new Pinecone index has a `total_vector_count` of `0`, as we haven't added any vectors yet.

## Indexing

We can perform the indexing task using the LangChain vector store object. But for now it is much faster to do it via the Pinecone python client directly. We will do this in batches of `100` or more.

In [21]:
from tqdm.auto import tqdm
batch_size = 100  # Process documents in chunks to avoid memory issues
texts = []
metadatas = []
for i in tqdm(range(0, len(data), batch_size)):
   # Calculate end index, ensuring we don't exceed data length
   i_end = min(len(data), i+batch_size)
   batch = data.iloc[i:i_end]
   
   # Extract metadata from each document for retrieval context
   metadatas = [{
       'title': record['title'],
       'text': record['context']
   } for j, record in batch.iterrows()]
   
   # Get raw text for embedding generation
   documents = batch['context']
   
   # Convert text to vector embeddings - this is the core of RAG
   embeds = embed.embed_documents(documents)
   
   # Use consistent IDs to enable updates and retrieval
   ids = batch['id']
   
   # Store vectors with their metadata in Pinecone
   # zip() combines the parallel arrays into (id, embedding, metadata) tuples
   index.upsert(vectors=zip(ids, embeds, metadatas))

  0%|          | 0/189 [00:00<?, ?it/s]

We've indexed everything, now we can check the number of vectors in our index like so:

In [22]:
# View index statistics to confirm data ingestion
index.describe_index_stats()

# Output shows:
# - dimension: 1536 (matches text-embedding-ada-002 dimensions)
# - index_fullness: 0.0 (plenty of space available)
# - metric: dotproduct (efficient similarity search)
# - vector_count: 18891 (documents successfully ingested)
# - vector_type: dense (standard embedding format)

{'dimension': 1536,
 'index_fullness': 0.0,
 'metric': 'dotproduct',
 'namespaces': {'': {'vector_count': 18891}},
 'total_vector_count': 18891,
 'vector_type': 'dense'}

## Creating a Vector Store and Querying

Now that we've build our index we can switch back over to LangChain. We start by initializing a vector store using the same index we just built. We do that like so:

In [28]:
# Create a vector store abstraction to enable semantic search
from langchain_pinecone import PineconeVectorStore

text_field = "text"  # Metadata field containing document content
# Connect LangChain's retrieval interfaces to our Pinecone index
vectorstore = PineconeVectorStore(
   index=index,      # Our Pinecone index
   embedding=embed,  # Same embedding model used during indexing
   text_key=text_field  # Field to return in search results
)

As in previous examples, we can use the `similarity_search` method to do a pure semantic search (without the generation component).

In [29]:
# Perform semantic search to find relevant documents for our query
query = "when was the college of engineering in the University of Notre Dame established?"
vectorstore.similarity_search(
   query,  # Question we want to answer
   k=3     # Retrieve top 3 most semantically similar documents
)
# This search uses vector embeddings to find contextually relevant information
# even if exact keywords don't match (unlike traditional search)

[Document(id='57338724d058e614000b5c9f', metadata={'title': 'University_of_Notre_Dame'}, page_content="In 1919 Father James Burns became president of Notre Dame, and in three years he produced an academic revolution that brought the school up to national standards by adopting the elective system and moving away from the university's traditional scholastic and classical emphasis. By contrast, the Jesuit colleges, bastions of academic conservatism, were reluctant to move to a system of electives. Their graduates were shut out of Harvard Law School for that reason. Notre Dame continued to grow over the years, adding more colleges, programs, and sports teams. By 1921, with the addition of the College of Commerce, Notre Dame had grown from a small college to a university with five colleges and a professional law school. The university continued to expand and add new residence halls and buildings with each subsequent president."),
 Document(id='5733a6424776f41900660f51', metadata={'title': '

Looks like we're getting good results. Let's take a look at how we can begin integrating this into a conversational agent.

## Initializing the Conversational Agent

Our conversational agent needs a Chat LLM, conversational memory, and a `RetrievalQA` chain to initialize. We create these using:

In [30]:
from langchain_openai import ChatOpenAI
from langchain.chains.conversation.memory import ConversationBufferWindowMemory
from langchain.chains import RetrievalQA

# Initialize LLM for generating answers from retrieved context
llm = ChatOpenAI(
   openai_api_key=OPENAI_API_KEY,
   model_name='gpt-3.5-turbo',
   temperature=0.0  # Deterministic responses for predictable behavior
)

# Set up memory to maintain conversation context
conversational_memory = ConversationBufferWindowMemory(
   memory_key='chat_history',
   k=5,  # Store last 5 interactions
   return_messages=True  # Return as message objects for LLM chain
)

# Create RAG pipeline:
# 1. Query → 2. Retrieve relevant docs → 3. Generate answer
qa = RetrievalQA.from_chain_type(
   llm=llm,
   chain_type="stuff",  # "Stuff" method passes all retrieved docs to LLM at once
   retriever=vectorstore.as_retriever()  # Connect to our vector database
)

  conversational_memory = ConversationBufferWindowMemory(


Using these we can generate an answer using the `run` method:

In [31]:
qa.run(query)

  qa.run(query)


'The College of Engineering at the University of Notre Dame was established in 1920.'

But this isn't yet ready for our conversational agent. For that we need to convert this retrieval chain into a tool. We do that like so:

In [32]:
from langchain.agents import Tool

# Define tools that the agent can use to solve tasks
tools = [
   Tool(
       name='Knowledge Base',  # Tool identifier
       func=qa.run,            # Function to execute (our RAG pipeline)
       description=(
           'use this tool when answering general knowledge queries to get '
           'more information about the topic'
       )  # Helps agent decide when to use this tool
   )
]
# Tools enable agents to interact with external systems
# This gives our agent access to the RAG system we built

Now we can initialize the agent like so:

In [33]:
from langchain.agents import initialize_agent

# Create an LLM agent that can use tools to solve complex tasks
agent = initialize_agent(
   agent='chat-conversational-react-description',  # Agent type that maintains conversation and explains actions
   tools=tools,                                    # Tools the agent can use (our RAG system)
   llm=llm,                                        # Language model for reasoning
   verbose=True,                                   # Show agent's thought process
   max_iterations=3,                               # Prevent infinite loops
   early_stopping_method='generate',               # Stop if agent can't make progress
   memory=conversational_memory                    # Access to conversation history
)
# This agent can now use RAG to answer questions about domain-specific content

  agent = initialize_agent(


With that our retrieval augmented conversational agent is ready and we can begin using it.

### Using the Conversational Agent

To make queries we simply call the `agent` directly.

In [34]:
agent(query)

  agent(query)




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m```json
{
    "action": "Knowledge Base",
    "action_input": "Establishment date of the College of Engineering at the University of Notre Dame"
}
```[0m
Observation: [36;1m[1;3mThe College of Engineering at the University of Notre Dame was established in 1920.[0m
Thought:[32;1m[1;3m```json
{
    "action": "Final Answer",
    "action_input": "The College of Engineering at the University of Notre Dame was established in 1920."
}
```[0m

[1m> Finished chain.[0m


{'input': 'when was the college of engineering in the University of Notre Dame established?',
 'chat_history': [],
 'output': 'The College of Engineering at the University of Notre Dame was established in 1920.'}

Looks great, now what if we ask it a non-general knowledge question?

In [35]:
agent("what is 2 * 7?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m```json
{
    "action": "Final Answer",
    "action_input": "The result of 2 * 7 is 14."
}
```[0m

[1m> Finished chain.[0m


{'input': 'what is 2 * 7?',
 'chat_history': [HumanMessage(content='when was the college of engineering in the University of Notre Dame established?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='The College of Engineering at the University of Notre Dame was established in 1920.', additional_kwargs={}, response_metadata={})],
 'output': 'The result of 2 * 7 is 14.'}

Perfect, the agent is able to recognize that it doesn't need to refer to it's general knowledge tool for that question. Let's try some more questions.

In [36]:
agent("can you tell me some facts about the University of Notre Dame?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m```json
{
    "action": "Knowledge Base",
    "action_input": "University of Notre Dame facts"
}
```[0m
Observation: [36;1m[1;3mThe University of Notre Dame, located in South Bend, Indiana, is a Catholic research university with a large undergraduate and graduate program. It is known for its strong alumni network, research institutes, and notable landmarks like the Golden Dome and the Basilica. The university offers a variety of degree programs, including a MD-PhD program in collaboration with IU medical School. Notre Dame has a diverse student body representing all 50 states and 100 countries. The university is also recognized for its intramural sports program and annual events like the Bookstore Basketball tournament and the Bengal Bouts tournament.[0m
Thought:[32;1m[1;3m```json
{
    "action": "Final Answer",
    "action_input": "The University of Notre Dame, located in South Bend, Indiana, is a Catholic research uni

{'input': 'can you tell me some facts about the University of Notre Dame?',
 'chat_history': [HumanMessage(content='when was the college of engineering in the University of Notre Dame established?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='The College of Engineering at the University of Notre Dame was established in 1920.', additional_kwargs={}, response_metadata={}),
  HumanMessage(content='what is 2 * 7?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='The result of 2 * 7 is 14.', additional_kwargs={}, response_metadata={})],
 'output': 'The University of Notre Dame, located in South Bend, Indiana, is a Catholic research university with a large undergraduate and graduate program. It is known for its strong alumni network, research institutes, and notable landmarks like the Golden Dome and the Basilica. The university offers a variety of degree programs, including a MD-PhD program in collaboration with IU medical School. Notre Dame has a div

In [37]:
agent("can you summarize these facts in two short sentences")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m```json
{
    "action": "Final Answer",
    "action_input": "The University of Notre Dame is a Catholic research university known for its strong alumni network, research institutes, and notable landmarks. It offers a variety of degree programs and has a diverse student body representing all 50 states and 100 countries."
}
```[0m

[1m> Finished chain.[0m


{'input': 'can you summarize these facts in two short sentences',
 'chat_history': [HumanMessage(content='when was the college of engineering in the University of Notre Dame established?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='The College of Engineering at the University of Notre Dame was established in 1920.', additional_kwargs={}, response_metadata={}),
  HumanMessage(content='what is 2 * 7?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='The result of 2 * 7 is 14.', additional_kwargs={}, response_metadata={}),
  HumanMessage(content='can you tell me some facts about the University of Notre Dame?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='The University of Notre Dame, located in South Bend, Indiana, is a Catholic research university with a large undergraduate and graduate program. It is known for its strong alumni network, research institutes, and notable landmarks like the Golden Dome and the Basilica. The univ

Looks great! We're also able to ask questions that refer to previous interactions in the conversation and the agent is able to refer to the conversation history to as a source of information.

That's all for this example of building a retrieval augmented conversational agent with OpenAI and Pinecone (the OP stack) and LangChain.

Once finished, we delete the Pinecone index to save resources:

In [38]:
pc.delete_index(index_name)

---