[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/docs/langchain-retrieval-agent.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/docs/langchain-retrieval-agent.ipynb)

#### [LangChain Handbook](https://www.pinecone.io/learn/series/langchain/)

# Retrieval Agents

We've seen in previous chapters how powerful [retrieval augmentation](https://www.pinecone.io/learn/series/langchain/langchain-retrieval-augmentation/) and [conversational agents](https://www.pinecone.io/learn/series/langchain/langchain-agents/) can be. They become even more impressive when we begin using them together.

Conversational agents can struggle with data freshness, knowledge about specific domains, or accessing internal documentation. By coupling agents with retrieval augmentation tools we no longer have these problems.

One the other side, using "naive" retrieval augmentation without the use of an agent means we will retrieve contexts with *every* query. Again, this isn't always ideal as not every query requires access to external knowledge.

Merging these methods gives us the best of both worlds. In this notebook we'll learn how to do this.

[![Open full notebook](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/full-link.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/generation/langchain/handbook/08-langchain-retrieval-agent.ipynb)


# Prerequisites

To begin, we must install several libraries that we will be using in this notebook.

In [1]:
!pip install -qU \
  pinecone==5.4.2 \
  pinecone-datasets==1.0.2 \
  pinecone-notebooks==0.1.1 \
  langchain==0.3.20 \
  langchain-openai==0.3.9 \
  langchain-pinecone==0.2.3 \
  langgraph==0.3.14 \
  tqdm

## Building the Knowledge Base

For this demonstration, we will download a pre-embedded dataset using `pinecone-datasets`. This will allow us to skip the data preparation steps, if you'd rather work through those steps you can find the [full notebook here](https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/generation/langchain/handbook/08-langchain-retrieval-agent.ipynb).

We will be using embeddings prepared from a subset of the [Stanford Question Answering Dataset (SQuAD)](https://huggingface.co/datasets/rajpurkar/squad). SQuAD is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.

In this demo, we will use context about each topic from the dataset and incorporate that into a chat retrieval agent's knowledge base.


In [2]:
from pinecone_datasets import load_dataset

dataset = load_dataset("squad-text-embedding-ada-002")
dataset.head()

Loading documents parquet files: 100%|██████████| 1/1 [00:35<00:00, 35.11s/it]


Unnamed: 0,id,values,sparse_values,metadata,blob
0,5733be284776f41900661182,"[-0.010262451963272523, 0.02222637996192584, -...",,"{'text': 'Architecturally, the school has a Ca...",
1,5733bf84d058e614000b61be,"[-0.009786712423983223, -0.013988726438873078,...",,"{'text': 'As at most other universities, Notre...",
2,5733bed24776f41900661188,"[0.013343917696606181, -0.0007001232846109822,...",,{'text': 'The university is the major seat of ...,
3,5733a6424776f41900660f51,"[-0.0085222901071539, 0.004399558219521822, -0...",,{'text': 'The College of Engineering was estab...,
4,5733a70c4776f41900660f64,"[-0.006695996885869355, -0.02067068565761649, ...",,{'text': 'All of Notre Dame's undergraduate st...,


In [3]:
len(dataset)

18891

We'll format the dataset ready for upsert and reduce what we use to a subset of the full dataset.

In [4]:
# we drop sparse_values as they are not needed for this example
dataset.documents.drop(['sparse_values', 'blob'], axis=1, inplace=True)

dataset.head()

Unnamed: 0,id,values,metadata
0,5733be284776f41900661182,"[-0.010262451963272523, 0.02222637996192584, -...","{'text': 'Architecturally, the school has a Ca..."
1,5733bf84d058e614000b61be,"[-0.009786712423983223, -0.013988726438873078,...","{'text': 'As at most other universities, Notre..."
2,5733bed24776f41900661188,"[0.013343917696606181, -0.0007001232846109822,...",{'text': 'The university is the major seat of ...
3,5733a6424776f41900660f51,"[-0.0085222901071539, 0.004399558219521822, -0...",{'text': 'The College of Engineering was estab...
4,5733a70c4776f41900660f64,"[-0.006695996885869355, -0.02067068565761649, ...",{'text': 'All of Notre Dame's undergraduate st...


In [None]:
topics = set()

print("Here are some example topics in our Knowledge Base:\n")
for r in dataset.documents.iloc[:].to_dict(orient="records"):
    topics.add(r['metadata']['title'])

for topic in sorted(topics)[50:75]:
    print(f"- {topic}")

## Initializing the Pinecone client

Now the data is ready, we can set up our index to store it.

We begin by instantiating a Pinecone client. To do this we need a [free API key](https://app.pinecone.io).

In [6]:
import os

if not os.environ.get("PINECONE_API_KEY"):
    from pinecone_notebooks.colab import Authenticate
    Authenticate()

In [7]:
from pinecone import Pinecone

# Instantiate client
pc = Pinecone(api_key=os.environ.get("PINECONE_API_KEY"))

### Creating a Pinecone Index

When creating the index we need to define several configuration properties. 

- `name` can be anything we like. The name is used as an identifier for the index when performing other operations such as `describe_index`, `delete_index`, and so on. 
- `metric` specifies the similarity metric that will be used later when you make queries to the index.
- `dimension` should correspond to the dimension of the dense vectors produced by your embedding model. In this quick start, we are using made-up data so a small value is simplest.
- `spec` holds a specification which tells Pinecone how you would like to deploy our index. You can find a list of all [available providers and regions here](https://docs.pinecone.io/troubleshooting/available-cloud-regions).

There are more configurations available, but this minimal set will get us started.

In [8]:
from pinecone import ServerlessSpec

index_name = 'langchain-retrieval-agent-fast'

if not pc.has_index(name=index_name):
    # Create a new index
    pc.create_index(
        name=index_name,
        dimension=1536,  # dimensionality of text-embedding-ada-002
        metric='dotproduct',
        spec=ServerlessSpec(
            cloud='aws',
            region='us-east-1'
        )
    )

pc.describe_index(name=index_name)

{
    "name": "langchain-retrieval-agent-fast",
    "dimension": 1536,
    "metric": "dotproduct",
    "host": "langchain-retrieval-agent-fast-dojoi3u.svc.aped-4627-b74a.pinecone.io",
    "spec": {
        "serverless": {
            "cloud": "aws",
            "region": "us-east-1"
        }
    },
    "status": {
        "ready": true,
        "state": "Ready"
    },
    "deletion_protection": "disabled"
}

## Upserting data into your Pinecone Index

In [None]:
# Instantiate an Index client
index = pc.Index(name=index_name)

index.describe_index_stats()

We should see that the new Pinecone index has a `total_vector_count` of `0`, as we haven't added any vectors yet.

Now we upsert the data to Pinecone:

In [10]:
index.upsert_from_dataframe(dataset.documents, batch_size=100)

sending upsert requests: 100%|██████████| 18891/18891 [02:39<00:00, 118.24it/s]


{'upserted_count': 18891}

We've indexed everything, now we can check the number of vectors in our index. We may see `total_vector_count` is slightly less than the total vectors in our dataset but this is expected as Pinecone is eventually consistent. If you check back again a few moments later you should see the expected total.

In [13]:
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 18891}},
 'total_vector_count': 18891}

## Working with Langchain

Now that we've built our index we can switch over to LangChain. LangChain defines standard interfaces that are helpful for using Pinecone with other components in your AI stack.

We start by initializing `PineconeVectorStore` which implements LangChain's standard interface for vector stores. We configure it to interact with the `'langchain-retrieval-agent-fast'` index we just built. 

We'll also need to setup an Embedding Model component to embed our queries using `text-embedding-ada-002`, the same OpenAI model that was used to create embeddings in the pre-embedded dataset we upserted into our Pinecone index.

We do that like so:

In [14]:
from langchain_openai import OpenAIEmbeddings

openai_api_key = os.environ.get('OPENAI_API_KEY') or 'OPENAI_API_KEY'

embed = OpenAIEmbeddings(
    model='text-embedding-ada-002',
    openai_api_key=openai_api_key
)

In [15]:
from langchain_pinecone import PineconeVectorStore

pinecone_vectorstore = PineconeVectorStore(
    index_name=index_name, 
    embedding=embed, 
    text_key="text"
)

As in previous examples, we can use the `similarity_search` method to do a pure semantic search (without the generation component).

In [16]:
from pprint import pprint

query = "When was the college of engineering in the University of Notre Dame established?"

documents = pinecone_vectorstore.similarity_search(
    query=query,
    k=3  # return 3 most relevant docs
)

for doc in documents:
    pprint(doc.__dict__)
    print()

{'id': '57338724d058e614000b5c9f',
 'metadata': {'title': 'University_of_Notre_Dame'},
 'page_content': 'In 1919 Father James Burns became president of Notre Dame, '
                 'and in three years he produced an academic revolution that '
                 'brought the school up to national standards by adopting the '
                 "elective system and moving away from the university's "
                 'traditional scholastic and classical emphasis. By contrast, '
                 'the Jesuit colleges, bastions of academic conservatism, were '
                 'reluctant to move to a system of electives. Their graduates '
                 'were shut out of Harvard Law School for that reason. Notre '
                 'Dame continued to grow over the years, adding more colleges, '
                 'programs, and sports teams. By 1921, with the addition of '
                 'the College of Commerce, Notre Dame had grown from a small '
                 'college to a university w

Looks like we're getting good results. Let's take a look at how we can begin integrating this into a conversational agent.

## Initializing the Conversational Agent

Our conversational agent needs a Chat LLM component. We create that using:

In [17]:
from langchain_openai import ChatOpenAI
from langchain.chains.conversation.memory import ConversationBufferWindowMemory
from langchain.chains import RetrievalQA

# Chat completion LLM
llm = ChatOpenAI(
    openai_api_key=openai_api_key,
    model_name='gpt-3.5-turbo',
    temperature=0.0
)

Next we need to build a chain that can incorporate context from our `PineconeVectorStore` instance into prompts passed to the LLM. 

In [18]:
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Based on the RAG template from https://smith.langchain.com/hub/rlm/rag-prompt
template=(
    "You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise."
    "Question: {question}"
    "Context: {context}"
    "Answer:"
)
prompt = PromptTemplate(input_variables=["question", "context"], template=template)

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Retrieval Question-Answer chain
qa_chain = (
    {
        "context": pinecone_vectorstore.as_retriever() | format_docs,
        "question": RunnablePassthrough(),
    }
    | prompt
    | llm
    | StrOutputParser()
)

Using these we can generate an answer by invoking the chain:

In [19]:
qa_chain.invoke("When was the college of engineering in the University of Notre Dame established?")

'The College of Engineering in the University of Notre Dame was established in 1920. Today, the college includes five departments of study and offers eight B.S. degrees.'

# Integrating a retrieval chain into a Tool

But this isn't yet ready for our conversational agent. For that we need to convert this retrieval chain into a Langchain Tool. Tools are Langchain Runnables that can be invoked by agents. We do that like this, giving the chain a name `knowledge-base` that we will see later on when the LLM invokes the tool:

In [None]:
knowledge_base_tool = qa_chain.as_tool(
        name='knowledge-base',
        description=(
            'use this tool when answering general knowledge queries to get '
            'more information about the topic'
        )
)

Now we are ready to incorporate these pieces into an LangGraph agent.

## Building a knowledgeable chatbot agent with LangGraph

**LangGraph** is a framework from **LangChain** for building AI applications using a state machine to model complex workflows. 

To begin, we first define the `State` for our agent, which keeps track of a list of messages.

In [24]:
from typing import Annotated
from typing_extensions import TypedDict

from langgraph.graph import StateGraph
from langgraph.graph.message import add_messages

class State(TypedDict):
    messages: Annotated[list, add_messages]

graph_builder = StateGraph(State)

Next we need to add some nodes into our graph. First, a chatbot node and a tools node along with edges that describe how it can transition from one node to the next in the state machine represented by the graph.

In [25]:
from langgraph.prebuilt import ToolNode, tools_condition

tools = [knowledge_base_tool]
llm_with_tools = llm.bind_tools(tools)

def chatbot(state: State):
    return {"messages": [llm_with_tools.invoke(state["messages"])]}

graph_builder.add_node("chatbot", chatbot)

tool_node = ToolNode(tools=tools)
graph_builder.add_node("tools", tool_node)

graph_builder.add_conditional_edges(
    "chatbot",
    tools_condition,
)
graph_builder.add_edge("tools", "chatbot")
graph_builder.set_entry_point("chatbot")

<langgraph.graph.state.StateGraph at 0xffff2e2bea20>

We also want to add in a `checkpointer` and compile the graph. Checkpointing is what allows our agent to have context on earlier messages.

In [26]:
from langgraph.checkpoint.memory import MemorySaver

memory = MemorySaver()
graph = graph_builder.compile(checkpointer=memory)

Finally, let's wrap our graph into a `agent` function to simplify interacting with it.

In [27]:
def agent(user_message):
    config = {"configurable": {"thread_id": "1"}}
    
    # The config is the **second positional argument** to stream() or invoke()!
    events = graph.stream(
        {"messages": [{"role": "user", "content": user_message}]},
        config,
        stream_mode="values",
    )
    for event in events:
        event["messages"][-1].pretty_print()

## Using the chat agent

Now we are ready to chat with our agent.

In [28]:
agent("Hi there! My name is Jen.")


Hi there! My name is Jen.

Hello Jen! How can I assist you today?


#### Testing the conversational memory

Let's see if it remembers what we just told it.

In [29]:
agent("Do you remember my name?")


Do you remember my name?

Yes, your name is Jen. How can I help you today, Jen?


#### Leveraging context from our Knowledge Base

Next let's try asking it a question that requires the LLM to invoke our knowledge base for context. Recall that our Knowledge Base has been filled with facts from Wikipedia in the [SQuAD dataset](https://huggingface.co/datasets/rajpurkar/squad), including several entries about the **University of Notre Dame**.

In the output you can see the `Tool Calls` where the `knowledge-base` is invoked. `knowledge-base` is the identifier of the question answer chain `qa_chain` we defined above that uses a combination of OpenAI to embed queries and Pinecone to find context relevant to the question.

In [30]:
agent("Do you know anything about the University of Notre Dame?")


Do you know anything about the University of Notre Dame?
Tool Calls:
  knowledge-base (call_32ppeZqB9OoORtqlL85wlFSw)
 Call ID: call_32ppeZqB9OoORtqlL85wlFSw
  Args:
    __arg1: University of Notre Dame
Name: knowledge-base

The University of Notre Dame is a Catholic research university located in South Bend, Indiana. It is known for its recognizable landmarks such as the Golden Dome and the Basilica. Notre Dame offers undergraduate programs in four colleges and has a strong alumni network.

The University of Notre Dame is a Catholic research university located in South Bend, Indiana. It is known for its recognizable landmarks such as the Golden Dome and the Basilica. Notre Dame offers undergraduate programs in four colleges and has a strong alumni network. If you would like more information, feel free to ask!


#### Follow-up question 

We can ask a follow-up question, and the agent remembers that "it" in this case is the University of Notre Dame.

In [32]:
agent("When was it founded?")


When was it founded?
Tool Calls:
  knowledge-base (call_UZa9Ptw4U2GTL8O7KjweZnit)
 Call ID: call_UZa9Ptw4U2GTL8O7KjweZnit
  Args:
    __arg1: University of Notre Dame founding date
Name: knowledge-base

The University of Notre Dame was founded in 1842.

The University of Notre Dame was founded in 1842. If you have any more questions or need further information, feel free to ask!


#### General knowledge question

Let's try asking it about something that is not in the knowledge-base.

In [33]:
agent("What is 14 * 9?")


What is 14 * 9?

The result of 14 multiplied by 9 is 126.


Great, no `Tool Calls` are shown which means the agent correctly recognized it did not need to invoke the knowledge-base to answer this question.

## Interactive chat

We've left the following cell commented by default so as not to break our automated testing of this notebook.

But if you want to try a continuous series of interactions with the chat bot, uncomment the code below.

In [None]:
# print("Type 'quit' to exit")
# while True:
#     user_input = input("User: ")
#     if user_input.lower() in ["quit", "exit", "q"]:
#         print("Goodbye!")
#         break

#     agent(user_input)

## Wrapup

That's all for this example of building a retrieval augmented conversational agent with OpenAI and Pinecone (the OP stack) using LangChain.

To recap, we:
- Built a Pinecone index using embeddings derived from facts in the SQuAD dataset
- Configured a `PineconeVectorStore` instance to interact with Pineco
- Built a RAG chain using our Pinecone-backed knowledge base
- Integrated that chain into an LLM-powered chat agent

## Demo cleanup

Once finished, we delete the Pinecone index to save resources:

In [None]:
pc.delete_index(name=index_name)

---