[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/generation/langchain/handbook/08-langchain-retrieval-agent.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/generation/langchain/handbook/08-langchain-retrieval-agent.ipynb)

#### [LangChain Handbook](https://pinecone.io/learn/langchain)

# Retrieval Agents

We've seen in previous chapters how powerful [retrieval augmentation](https://www.pinecone.io/learn/langchain-retrieval-augmentation/) and [conversational agents](https://www.pinecone.io/learn/langchain-agents/) can be. They become even more impressive when we begin using them together.

Conversational agents can struggle with data freshness, knowledge about specific domains, or accessing internal documentation. By coupling agents with retrieval augmentation tools we no longer have these problems.

One the other side, using "naive" retrieval augmentation without the use of an agent means we will retrieve contexts with *every* query. Again, this isn't always ideal as not every query requires access to external knowledge.

Merging these methods gives us the best of both worlds. In this notebook we'll learn how to do this.

To begin, we must install the prerequisite libraries that we will be using in this notebook.

In [1]:
!pip install -qU \
    openai==0.27.7 \
    "pinecone-client[grpc]"==2.2.1 \
    langchain==0.0.162 \
    tiktoken==0.4.0 \
    datasets==2.12.0

## Building the Knowledge Base

We start by constructing our knowledge base. We'll use a mostly prepared dataset called **S**tanford **Qu**estion-**A**nswering **D**ataset (SQuAD) hosted on Hugging Face *Datasets*. We download it like so:

In [2]:
from datasets import load_dataset

data = load_dataset('json', data_files='documents.jsonl')
data

  from .autonotebook import tqdm as notebook_tqdm


Downloading and preparing dataset json/default to C:/Users/rshul/.cache/huggingface/datasets/json/default-688b2138d7811470/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4...


Downloading data files: 100%|██████████| 1/1 [00:00<?, ?it/s]
Extracting data files: 100%|██████████| 1/1 [00:00<00:00,  1.69it/s]
                                                        

Dataset json downloaded and prepared to C:/Users/rshul/.cache/huggingface/datasets/json/default-688b2138d7811470/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4. Subsequent calls will reuse this data.


100%|██████████| 1/1 [00:00<00:00, 66.65it/s]


DatasetDict({
    train: Dataset({
        features: ['id', 'text', 'source', 'chapter'],
        num_rows: 1603
    })
})

The dataset does contain duplicate contexts, which we can remove like so:

In [3]:
import pandas as pd

df = pd.read_json('documents.jsonl', lines=True)

# Display the DataFrame
df.head()

Unnamed: 0,id,text,source,chapter
0,dffded34c968-0,"Last updated: September 10, 2012 \n \n \n \n \...",Page 1,
1,9e23eabf1336-0,Copyright © 2012 by Ivan Marsic. All rights re...,Page 2,
2,6da08886c042-0,i \nPreface \n \nThis book reviews important ...,Page 3,
3,6da08886c042-1,focus on core concepts should be appealing to ...,Page 3,
4,6da08886c042-2,software engineering of Web applications. It a...,Page 3,


### Initialize the Embedding Model and Vector DB

We'll be using OpenAI's `text-embedding-ada-002` model initialize via LangChain and the Pinecone vector DB. We start by initializing the embedding model, for this we need an [OpenAI API key](https://platform.openai.com/).

*(Note that OpenAI is a paid service and so running the remainder of this notebook may incur some small cost)*

In [4]:
from getpass import getpass
from langchain.embeddings.openai import OpenAIEmbeddings

OPENAI_API_KEY = "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"  # platform.openai.com
model_name = 'text-embedding-ada-002'

embed = OpenAIEmbeddings(
    model=model_name,
    openai_api_key=OPENAI_API_KEY
)

Next we initialize the vector database. For this we need a [free API key](https://app.pinecone.io/), then we create the index:

In [9]:
import pinecone

# find API key in console at app.pinecone.io
PINE_API_KEY = "9b578655-1032-4ab4-af7f-4e7ff8386769"
# find ENV (cloud region) next to API key in console
PINE_ENV = "gcp-starter"

index_name = 'combined'

pinecone.init(
    api_key=PINE_API_KEY,
    environment=PINE_ENV
)
print(pinecone.list_indexes())

['combined']


Then connect to the index:

In [33]:

pinecone.delete_index("combined")

pinecone.create_index(name=index_name, metric="dotproduct", dimension=1536)

In [35]:
index = pinecone.Index(index_name)
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {},
 'total_vector_count': 0}

We should see that the new Pinecone index has a `total_vector_count` of `0`, as we haven't added any vectors yet.

## Indexing

We can perform the indexing task using the LangChain vector store object. But for now it is much faster to do it via the Pinecone python client directly. We will do this in batches of `100` or more.

In [39]:
from tqdm.auto import tqdm
from uuid import uuid4

batch_size = 100

texts = []
metadatas = []
data = df

for i in tqdm(range(0, len(data), batch_size)):
    # get end of batch
    i_end = min(len(data), i+batch_size)
    batch = data.iloc[i:i_end]

    # first get metadata fields for this record
    for j, record in batch.iterrows():
        if (record['chapter'] == "N/A"):
            metadatas.append( {
            'text': record['text'],
            'page': int(record['source'][5:]),
        })
        else:
            metadatas.append({
                'text': record['text'],
                'page': int(record['source'][5:]),
                'chapter': int(record['chapter']),
            })

    # get the list of contexts / documents
    documents = batch['text']

    # create document embeddings
    embeds = embed.embed_documents(documents)

    # Ensure embeds is in the right format (e.g., list or numpy array)
    embeds = list(embeds) # Or numpy.array(embeds)

    # Convert IDs to string format
    ids = batch['id'].astype(str)

    # add everything to pinecone
    index.upsert(vectors=zip(ids, embeds, metadatas))


  0%|          | 0/17 [00:00<?, ?it/s]

100%|██████████| 17/17 [00:35<00:00,  2.06s/it]


We've indexed everything, now we can check the number of vectors in our index like so:

In [40]:
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.01603,
 'namespaces': {'': {'vector_count': 1603}},
 'total_vector_count': 1603}

## Creating a Vector Store and Querying

Now that we've build our index we can switch back over to LangChain. We start by initializing a vector store using the same index we just built. We do that like so:

In [41]:
from langchain.vectorstores import Pinecone

text_field = "text"

vectorstore = Pinecone(
    index, embed.embed_query, text_field
)

As in previous examples, we can use the `similarity_search` method to do a pure semantic search (without the generation component).

In [42]:
query = "Make a study guide on HTTP protocol as it relates to RESTful APIs"

vectorstore.similarity_search(
    query,  # our search query
    k=6  # return 3 most relevant docs
)

[Document(page_content='Ivan Marsic \uf0b7 Rutgers University \n 20\n\uf0b7 The Rube Goldberg design makes unrealistic assumptions, such as that the rabbit will not \nmove unless frightened by an exploding cap. \n\uf0b7 The Rube Goldberg design uses unneces sary links in the operational chain. \nWe will continue discussion of software design wh en we introduce the object model in Section \n1.4. Recurring issues of software design include: \n\uf0b7 Design quality evaluation : Optimal design may be an unrealistic goal given the \ncomplexity of real-world applications. A mo re reasonable goal is to find criteria for \ncomparing two designs and deciding which one is better. The principles for good object-\noriented design are introduced in Section \uf0202.6 and elaborated in subsequent chapters. \n\uf0b7 Design for change : Useful software lives for years or decades and must undergo \nmodifications and extensions to account for the changing world in which it operates. \nChapter 5 describes

Looks like we're getting good results. Let's take a look at how we can begin integrating this into a conversational agent.

## Initializing the Conversational Agent

Our conversational agent needs a Chat LLM, conversational memory, and a `RetrievalQA` chain to initialize. We create these using:

In [43]:
from langchain.chat_models import ChatOpenAI
from langchain.chains.conversation.memory import ConversationBufferWindowMemory
from langchain.chains import RetrievalQAChain

# chat completion llm
llm = ChatOpenAI(
    openai_api_key=OPENAI_API_KEY,
    model_name='gpt-3.5-turbo',
    temperature=0.0
)
# conversational memory
conversational_memory = ConversationBufferWindowMemory(
    memory_key='chat_history',
    k=5,
    return_messages=True
)
# retrieval qa chain
qa_with_sources = RetrievalQAChain.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever()
)

But this isn't yet ready for our conversational agent. For that we need to convert this retrieval chain into a tool. We do that like so:

In [47]:
from langchain.chains import RetrievalQA

qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever()
)

# Create retriever
from langchain.chains.query_constructor.base import AttributeInfo
from langchain.retrievers import SelfQueryRetriever


# Define our metadata
metadata_field_info = [
    AttributeInfo(
        name="chapter",
        description="the chapter of the book that the text is from",
        type="integer",
    ),
    AttributeInfo(
        name="page",
        description="the page number that the text is from",
        type="integer",
    )
]
document_content_description = "Text from a book"

# Define self query retriver

retriever = SelfQueryRetriever.from_llm(
    llm, vectorstore, document_content_description, metadata_field_info, verbose=True
)
qa_with_chaper_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever,)

In [70]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

compressor = LLMChainExtractor.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(base_compressor=compressor, base_retriever=vectorstore.as_retriever())

In [76]:
retriever.get_relevant_documents("chapter 1")

query='chapter 1' filter=None


[Document(page_content='Chapter 1 \uf0b7 Introduction 13\nAs can be observed throughout this  text, the graphic notation is of ten trivial and can be mastered \nrelatively quickly. The key is in the skills in cr eating various models—it can take considerable \namount of time to gain this expertise. \n1.2.2 Requirements Analysis and System Specification \nWe start with the customer statement of work  (also known as customer statement of \nrequirements ), if the project is sponsored by a specific customer, or the vision statement , if the \nproject does not have a sponsor. The statement of  work describes what the envisioned system-to-\nbe is about, followed by a list of features /services  it will provide or tasks/activities it will support. \nGiven the statement of work, the first step in the software development process is called \nrequirements analysis  or systems analysis . During this activity the developer attempts to \nunderstand the problem and delimit its scope. Th e result is 

In [77]:
from langchain.agents import Tool

tools = [
    Tool(
        name='Textbook Search',
        func=qa.run,
        description=(
            #play around with this prompt to describe your document
            'use this tool when answering general knowledge queries about the textbook.'
        )
    ),
    Tool(
        name='Chapter Search',
        func=retriever.get_relevant_documents,
        description=(
            'use this tool when you need to retrieve information from a specific chapter of the textbook. Always pass the entire query to this tool.'
        )
    )
]

Now we can initialize the agent like so:

In [78]:
from langchain.agents import initialize_agent

agent = initialize_agent(
    agent='chat-conversational-react-description',
    tools=tools,
    llm=llm,
    verbose=True,
    max_iterations=3,
    early_stopping_method='generate',
    memory=conversational_memory
)

With that our retrieval augmented conversational agent is ready and we can begin using it.

### Using the Conversational Agent

To make queries we simply call the `agent` directly.

In [79]:
agent("What is chapter 3 about?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m{
    "action": "Chapter Search",
    "action_input": "Chapter 3"
}[0mquery='Chapter 3' filter=None

Observation: [33;1m[1;3m[Document(page_content='1.3.1 Case Study 1: From Home Access Control to \nAdaptive Homes \nFigure 1-16 illustrates our case-study system that is used in the rest of the text to illustrate the \nsoftware engineering methods. In a basic versi on, the system offers house access control. The \nsystem could be required to authenticate  (“Are you who you claim to be?”) and validate  (“Are \nyou supposed to be entering this building?”) pe ople attempting to enter a building. Along with \ncontrolling the locks, the system may also contro l other household devices, such as the lighting, \nair conditioning, heating, alarms, etc. \nAs typical of most software engineering project s, a seemingly innocuous problem actually hides \nmany complexities, which will be  revealed as we progress thr ough the development c

{'input': 'What is chapter 3 about?',
 'chat_history': [HumanMessage(content="I still don't understand chapter 3", additional_kwargs={}, example=False),
  AIMessage(content="Chapter 3 is titled 'Modeling and System Specification.' It covers topics such as what a system is, world phenomena and their abstractions, states and state variables, events, signals, and messages.", additional_kwargs={}, example=False),
  HumanMessage(content='What is chapter 1 about?', additional_kwargs={}, example=False),
  AIMessage(content='Chapter 1 is the introduction chapter of the book. It discusses topics such as graphic notation, requirements analysis and system specification, object-oriented analysis and the domain model, and the practicality of the book with examples, code, and solved problems.', additional_kwargs={}, example=False),
  HumanMessage(content='What is chapter 3 about?', additional_kwargs={}, example=False),
  AIMessage(content='Chapter 3 is about modeling and system specification. It cov

Looks great, now what if we ask it a non-general knowledge question?

In [54]:
agent("what is 2 * 7?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m{
    "action": "Final Answer",
    "action_input": "The result of 2 * 7 is 14."
}[0m

[1m> Finished chain.[0m


{'input': 'what is 2 * 7?',
 'chat_history': [HumanMessage(content='What is chapter 1 about?', additional_kwargs={}, example=False),
  AIMessage(content='Chapter 1 is the introduction chapter of the book. It discusses topics such as graphic notation, requirements analysis and system specification, object-oriented analysis and the domain model, and the practicality of the book.', additional_kwargs={}, example=False),
  HumanMessage(content='What is the textbook about?', additional_kwargs={}, example=False),
  AIMessage(content='The introduction chapter of the textbook discusses the relationship between software engineering and programming. It emphasizes the importance of understanding business problems, inventing solutions, evaluating alternatives, and making design tradeoffs and choices. The chapter also highlights the importance of delivering value for the customer and mentions that both code and documentation are valuable in software engineering.', additional_kwargs={}, example=False

Perfect, the agent is able to recognize that it doesn't need to refer to it's general knowledge tool for that question. Let's try some more questions.

In [55]:
agent("")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m{
    "action": "Final Answer",
    "action_input": "The textbook is about software engineering and programming, emphasizing the importance of understanding business problems, inventing solutions, evaluating alternatives, and making design tradeoffs and choices."
}[0m

[1m> Finished chain.[0m


{'input': '',
 'chat_history': [HumanMessage(content='What is chapter 1 about?', additional_kwargs={}, example=False),
  AIMessage(content='Chapter 1 is the introduction chapter of the book. It discusses topics such as graphic notation, requirements analysis and system specification, object-oriented analysis and the domain model, and the practicality of the book.', additional_kwargs={}, example=False),
  HumanMessage(content='What is the textbook about?', additional_kwargs={}, example=False),
  AIMessage(content='The introduction chapter of the textbook discusses the relationship between software engineering and programming. It emphasizes the importance of understanding business problems, inventing solutions, evaluating alternatives, and making design tradeoffs and choices. The chapter also highlights the importance of delivering value for the customer and mentions that both code and documentation are valuable in software engineering.', additional_kwargs={}, example=False),
  HumanMess

In [57]:
agent("I still don't understand chapter 3")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m{
    "action": "Chapter Search",
    "action_input": "Chapter 3"
}[0mquery='Chapter 3' filter=None

Observation: [33;1m[1;3mChapter 3 is titled "Modeling and System Specification." It covers topics such as what a system is, world phenomena and their abstractions, states and state variables, events, signals, and messages.[0m
Thought:[32;1m[1;3m{
    "action": "Final Answer",
    "action_input": "Chapter 3 is titled 'Modeling and System Specification.' It covers topics such as what a system is, world phenomena and their abstractions, states and state variables, events, signals, and messages."
}[0m

[1m> Finished chain.[0m


{'input': "I still don't understand chapter 3",
 'chat_history': [HumanMessage(content='What is chapter 1 about?', additional_kwargs={}, example=False),
  AIMessage(content='Chapter 1 is the introduction chapter of the book. It discusses topics such as graphic notation, requirements analysis and system specification, object-oriented analysis and the domain model, and the practicality of the book.', additional_kwargs={}, example=False),
  HumanMessage(content='What is the textbook about?', additional_kwargs={}, example=False),
  AIMessage(content='The introduction chapter of the textbook discusses the relationship between software engineering and programming. It emphasizes the importance of understanding business problems, inventing solutions, evaluating alternatives, and making design tradeoffs and choices. The chapter also highlights the importance of delivering value for the customer and mentions that both code and documentation are valuable in software engineering.', additional_kwar

Looks great! We're also able to ask questions that refer to previous interactions in the conversation and the agent is able to refer to the conversation history to as a source of information.

That's all for this example of building a retrieval augmented conversational agent with OpenAI and Pinecone (the OP stack) and LangChain.

Once finished, we delete the Pinecone index to save resources:

In [None]:
pinecone.delete_index(index_name)

---