[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/docs/langchain-retrieval-agent.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/docs/langchain-retrieval-agent.ipynb)


# Retrieval Agents with Pinecone Assistant, Langchain and LangGraph

We've seen in previous chapters how powerful [retrieval augmentation](https://www.pinecone.io/learn/series/langchain/langchain-retrieval-augmentation/) and [conversational agents](https://www.pinecone.io/learn/series/langchain/langchain-agents/) can be. They become even more impressive when we begin using them together.

Conversational agents can struggle with data freshness, knowledge about specific domains, or accessing internal documentation. By coupling agents with retrieval augmentation tools we no longer have these problems.

One the other side, using "naive" retrieval augmentation without the use of an agent means we will retrieve contexts with *every* query. Again, this isn't always ideal as not every query requires access to external knowledge.

Merging these methods gives us the best of both worlds. In this notebook we'll learn how to do this.


# Prerequisites

To begin, we must install several libraries that we will be using in this notebook.

In [3]:
!pip install -qU \
  pinecone==7.0.2 \
  pinecone-notebooks==0.1.1 \
  langchain \
  langchain-openai \
  langchain-pinecone \
  langgraph==0.3.14 \
  tqdm \
  pinecone-plugin-assistant

## Building our knowledge agent: Pinecone Assistant.

In this demo, we'll instantiate a Pinecone Assistant with a set of textbooks, to build a learning assistant of sort! We'll use an OpenAI LLM as a sort of tutor, and we'll use a Pinecone Assistant as a subject matter expert on our chosen textbooks. This is a simple multi-agent system that allows us to create queries that we pass to subagent for answering.

To orchestrate this, we'll use LangChain and LangGraph, a popular agentic development framework.

In [2]:
import os
from getpass import getpass

def get_pinecone_api_key():
    """
    Get Pinecone API key from environment variable or prompt user for input.
    Returns the API key as a string.

    Only necessary for notebooks. When using Pinecone yourself, 
    you can use environment variables or the like to set your API key.
    """
    api_key = os.environ.get("PINECONE_API_KEY")
    
    if api_key is None:
        try:
            # Try Colab authentication if available
            from pinecone_notebooks.colab import Authenticate
            Authenticate()
            # If successful, key will now be in environment
            api_key = os.environ.get("PINECONE_API_KEY")
        except ImportError:
            # If not in Colab or authentication fails, prompt user for API key
            print("Pinecone API key not found in environment.")
            api_key = getpass("Please enter your Pinecone API key: ")
            # Save to environment for future use in session
            os.environ["PINECONE_API_KEY"] = api_key
    
    return api_key

api_key = get_pinecone_api_key()

Pinecone API key not found in environment.


In [4]:
from pinecone import Pinecone

pc = Pinecone(
    # source_tag isn't necessary for projects, so feel free to remove in production
    source_tag="pinecone_examples:docs:langchain_retrieval_agent",
    api_key=api_key)




In [5]:
assistant = pc.assistant.create_assistant(
    assistant_name="textbook-assistant", 
    instructions="Help answer questions about provided textbooks with aim toward creating study guides and grounded learning materials", # Description or directive for the assistant to apply to all responses.
    region="us", # Region to deploy assistant. Options: "us" (default) or "eu".
    timeout=30 # Maximum seconds to wait for assistant status to become "Ready" before timing out.
    
)

## Uploading our data

We'll upload a massive 1000 page textbook to our assistant to showcase how easy it is to get started with it. The textbook we use here for demo purposes is a freely available textbook, Introduction to Computer Science, made available by OpenStax [online here](https://openstax.org/details/books/introduction-computer-science).



In [7]:
# Download our textbook, thank you Openstax!

url = "https://assets.openstax.org/oscms-prodcms/media/documents/Introduction_To_Computer_Science_-_WEB.pdf"


import requests

response = requests.get(url)

with open("textbook.pdf", "wb") as f:
    f.write(response.content)


In [None]:
assistant = pc.assistant.Assistant(
    assistant_name="textbook-assistant", 
)

# Upload a file. Will take about three minutes to process
response = assistant.upload_file(
    file_path="textbook.pdf",
    timeout=None
)

In [18]:
# Let's try querying it!
from pinecone_plugins.assistant.models.chat import Message

msg = Message(role="user", content="Can you teach me about how simple web applications are architected? I don't understand how they work.")

# The highlights parameter allows us to get in-line citations in our responses. Handy for fact-checking!
resp = assistant.chat(messages=[msg], include_highlights=True, model="claude-3-7-sonnet")


In [19]:
print(resp["message"]["content"])

# Web Application Architecture Basics

## Evolution of Web Applications

The World Wide Web started as a way to link content (primarily text and images) stored on different servers or machines. It was invented by Tim Berners-Lee in 1989 while working at CERN. He created the Hypertext Transfer Protocol (HTTP) that operates on top of TCP/IP, the principal protocols used on the Internet.

In the early days (Web 1.0), user interaction was limited primarily to reading and selecting web pages. A web page is a document commonly written in HTML and viewed in a browser. This simple request and response paradigm used a client-server model that was easy to implement.

## Core Web Technologies

Three technologies dominate web programming:
1. HTML (hypertext markup language): used to describe the structure and content of web pages
2. CSS (cascading style sheets): used to alter the presentation style of the content found in HTML
3. JavaScript (JS): a scripting language that adds interactivity to web

In [20]:
resp["citations"]

[{'position': 192,
  'references': [{'file': {'name': 'textbook.pdf',
     'id': 'e33526ed-1e5c-4263-9cc0-b225ec93c734',
     'metadata': None,
     'created_on': '2025-06-16T20:09:50.703258033Z',
     'updated_on': '2025-06-16T20:11:32.980858639Z',
     'status': 'Available',
     'percent_done': 1.0,
     'signed_url': 'https://storage.googleapis.com/knowledge-prod-files/168c07ce-d86e-4e2c-86b8-fe2b7124a845%2F3e9a6c82-b4e9-4163-8f36-d8543b9ecc87%2Fe33526ed-1e5c-4263-9cc0-b225ec93c734.pdf?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=ke-prod-1%40pc-knowledge-prod.iam.gserviceaccount.com%2F20250616%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20250616T201810Z&X-Goog-Expires=3600&X-Goog-SignedHeaders=host&response-content-disposition=inline&response-content-type=application%2Fpdf&X-Goog-Signature=0d1d8ebd9203dfe33333c19f40ec6b85cb8af33484479b0810d7b7d9aedbb0eaf6144942ffa6df86147e8b904620f800ad6f2d2a0c141368c04327177aa1db0900522d4feae55ab23a5f90b34a5fa6c267145e400570a17f79a2da22e0b

## Recieving context snippets instead of responses

That's not all! Instead of offloading generation to our Pinecone Assistant, we can choose just to leverage the query-understanding and subquerying capabilities to return the optimal contexts instead. This is handy for when we want to have a specialized workflow after retrieving these snippets, instead of getting the generation directly.

To do so, we use the Context API like this:



In [27]:
context_result = assistant.context(query="Can you teach me about how simple web applications are architected? I don't understand how they work.")



In [30]:
context_snippets = context_result.snippets


for num, snippet in enumerate(context_snippets):
    print(f"Snippet {num+1}:")
    print(snippet.content)
    print("-"*100)
    print("\n")

Snippet 1:
Figure11.1 It takes many roles to build a responsive design in web applications development for multiple system applications.
(credit: modification of “190827-F-ND912-035” by Tech. Sgt. R. J. Biermann/Lt. Col. Wilson/U.S. Air Force, Public Domain)
Chapter Outline
11.1Modern Web Applications Architectures
11.2Sample Responsive WAD with Bootstrap and Django
11.3Sample Responsive WAD with Bootstrap/React and Node
11.4Sample Responsive WAD with Bootstrap/React and Django
11.5Sample Native WAD with React Native and Node or Django
11.6Sample Ethereum Blockchain Web 2.0/Web 3.0 Application
Introduction
TechWorks is creating several web applications this year for a new product line. One application is an AI-image
generator website and auction house for selling images. An outside consultant has been brought in, and they
have determined that a hybrid Web 2.0/3.0 architecture is best suited for this solution. However, the
engineering team who will perform the work needs to gain experie

## Working with Langchain

Now that we've built our index we can switch over to LangChain. LangChain defines standard interfaces that are helpful for using Pinecone with other components in your AI stack.

We start by initializing `PineconeVectorStore` which implements LangChain's standard interface for vector stores. We configure it to interact with the `'langchain-retrieval-agent-fast'` index we just built. 

We'll also need to setup an Embedding Model component to embed our queries using `text-embedding-ada-002`, the same OpenAI model that was used to create embeddings in the pre-embedded dataset we upserted into our Pinecone index.

We do that like so:

In [31]:
#setup OpenAI API Key

def get_openai_api_key():
    """
    Get OpenAI API key from environment variable or prompt user for input.
    Returns the API key as a string.
    """
    api_key = os.environ.get("OPENAI_API_KEY")
    
    if api_key is None:
        try:
            api_key = getpass("Please enter your OpenAI API key: ")
            # Save to environment for future use in session
            os.environ["OPENAI_API_KEY"] = api_key
        except Exception as e:
            print(f"Error getting OpenAI API key: {e}")
            return None
    
    return api_key

OPENAI_API_KEY = get_openai_api_key()

In [46]:
from typing import Annotated
from langchain.chat_models import init_chat_model
from typing_extensions import TypedDict
from langgraph.graph import StateGraph, START
from langgraph.graph.message import add_messages


class State(TypedDict):
    messages: Annotated[list, add_messages]


graph_builder = StateGraph(State)


llm = init_chat_model("openai:gpt-4o-mini")




In [47]:
# Let's add our assistant as a tool to our graph



from langchain_core.tools import tool

@tool("ask_textbook_subagent_tool", parse_docstring=True)
def ask_textbook_assistant(subquery):
    """Request information from our Pinecone Assistant.

    Pinecone textbook assistant has access to a 1k page computer science textbook
    Will return relevant context snippets from the textbook given a query.

    Useful for constructing study guides, informative answers, or looking up information from our textbook quickly
    """

    assistant = pc.assistant.Assistant(
        assistant_name="textbook-assistant", 
    )

    response = assistant.context(query=subquery)
    response_snippets = "\n\n".join([f"Snippet {i+1}:\n{snippet['content']}\nCited Pages: {', '.join(map(str, snippet['reference']['pages']))}" for i, snippet in enumerate(response.snippets)])

    return response_snippets


In [48]:
tools = [ask_textbook_assistant]
print(ask_textbook_assistant.invoke("What's an algorithm?"))

Snippet 1:
 manageable parts, identifying patterns, extracting essential
information, and devising systematic solutions. This process not only applies to technical fields, but also to
everyday situations.
For example, imagine someone trying to manage their monthly expenses within a tight budget. Here's how
you might apply computational thinking to this common problem of managing a monthly budget:
1. Decomposition: Break down the financial challenge into different categories such as rent, groceries,
utilities, and entertainment.
2. Pattern recognition: Analyze past spending to identify patterns.
3. Abstraction: Focus on key areas where costs can be reduced.
2.1 • Computational Thinking 454. Algorithmic thinking: Develop a systematic approach to allocate monthly income.
By using computational thinking, you can manage your finances more effectively, ensuring they cover
essential costs while maximizing their savings.
Abstraction
Abstraction makes it possible to pull out the important detai

In [49]:
from langgraph.prebuilt import ToolNode, tools_condition


llm_with_tools = llm.bind_tools(tools)

def chatbot(state: State):
    return {"messages": [llm_with_tools.invoke(state["messages"])]}

graph_builder.add_node("chatbot", chatbot)


tool_node = ToolNode(tools=tools)
graph_builder.add_node("tools", tool_node)

graph_builder.add_conditional_edges(
    "chatbot",
    tools_condition,
)
# Any time a tool is called, we return to the chatbot to decide the next step
graph_builder.add_edge("tools", "chatbot")
graph_builder.add_edge(START, "chatbot")
graph = graph_builder.compile()

In [50]:
def stream_graph_updates(user_input: str):
    for event in graph.stream({"messages": [{"role": "user", "content": user_input}]}):
        for value in event.values():
            print("Assistant:", value["messages"][-1].content)

while True:
    try:
        user_input = input("User: ")
        if user_input.lower() in ["quit", "exit", "q"]:
            print("Goodbye!")
            break

        stream_graph_updates(user_input)
    except:
        # fallback if input() is not available
        user_input = "How do programming algorithms work?"
        print("User: " + user_input)
        stream_graph_updates(user_input)
        break

Assistant: 
Assistant: Snippet 1:
.1Introduction to Data Structures and Algorithms
3.2Algorithm Design and Discovery
3.3Formal Properties of Algorithms
3.4Algorithmic Paradigms
3.5Sample Algorithms by Problem
3.6Computer Science Theory
Introduction
Online maps help people navigate a rapidly changing world. It was not long ago that maps were on paper and
that knowledge came from non-digital, trusted sources. In this chapter, we will study how computer scientists
design and analyze the foundational structures behind many of today’s technologies. Data structures and
algorithms are not only foundational to map apps, but also enable an amazing variety of other technologies
too. From self-driving cars to inventory management to simulating the movement of galaxies to transferring
data between computers—all these applications use data structur es and algorithms to efficiently organize and
process large amounts of information.
3.1 Introduction to Data Structures and Algorithms
Learning Objectiv

---