#### [LangChain Handbook](https://qdrant.tech/articles/langchain-integration/)

# LangChain Retrieval Agent

`Conversational agents` like chatgpt although being very accurate, face some issues with data freshness, accessing internal documentations and knowledge about specific domains. On the other hand `retrieval augmentation` solves these issues but then it would always retrieve on every query which is inefficient in most of simple cases in which retrieval is not required. Using both of these methods simultaneously gives us a system which can answer simple questions directly and seek for extra knowledge when queried with complex questions. We will see how to do so with LangChain and Qdrant in this notebook.


## Install Dependencies
Let's get started by installing the packages needed for notebook to run:

In [1]:
!pip install -qU openai==0.27.8 qdrant-client==1.3.1 langchain==0.0.225 datasets==2.13.1 tiktoken==0.4.0

## Import libraries

In [2]:
from datasets import load_dataset
import os
import openai
from langchain.embeddings.openai import OpenAIEmbeddings
from qdrant_client import QdrantClient
from qdrant_client.http import models
from tqdm.auto import tqdm
from pathlib import Path
from langchain.vectorstores import Qdrant
import qdrant_client
from langchain.chat_models import ChatOpenAI
from langchain.chains.conversation.memory import ConversationBufferWindowMemory
from langchain.chains import RetrievalQA
from langchain.agents import initialize_agent, Tool
from time import sleep

C:\Users\karti\AppData\Local\Programs\Python\Python311\Lib\site-packages\numpy\.libs\libopenblas.FB5AE2TYXYH2IJRDKGDGQ3XBKLKTF43H.gfortran-win_amd64.dll
C:\Users\karti\AppData\Local\Programs\Python\Python311\Lib\site-packages\numpy\.libs\libopenblas64__v0.3.21-gcc_10_3_0.dll


## Building the Knowledge Base

Our knowledge base will be prepared from a dataset from Hugging Face called `vietgpt/multi_news_en`, it consists of about 45k records of news articles and human-written summaries of these articles.

In [12]:
data = load_dataset("vietgpt/multi_news_en", split="train")
data

Found cached dataset parquet (C:/Users/karti/.cache/huggingface/datasets/vietgpt___parquet/vietgpt--multi_news_en-4921e62a5a375465/0.0.0/14a00e99c0d15a23649d0db8944380ac81082d4b021f398733dd84f3a6c569a7)


Dataset({
    features: ['document', 'summary'],
    num_rows: 44972
})

We convert the dataset into a pandas dataframe for further use:

In [13]:
data = data.to_pandas()
data.head()

Unnamed: 0,document,summary
0,"National Archives \n \n Yes, it’s that time ag...",– The unemployment rate dropped to 8.2% last m...
1,LOS ANGELES (AP) — In her first interview sinc...,"– Shelly Sterling plans ""eventually"" to divorc..."
2,"GAITHERSBURG, Md. (AP) — A small, private jet ...",– A twin-engine Embraer jet that the FAA descr...
3,Tucker Carlson Exposes His Own Sexism on Twitt...,– Tucker Carlson is in deep doodoo with conser...
4,A man accused of removing another man's testic...,– What are the three most horrifying words in ...


### Initialize Embedding Model

To store our data in qdrant we need to convert the data in vector representations which capture the semantic meaning of our data and later cosine similarity is used to match the query with our data to find the best matching data. There are many options for creating vector embeddings for our data. We will use OpenAI model `text-embedding-ada-002` to do so.

In [8]:
model_name = (
    "text-embedding-ada-002"  # will be used to create embeddings of summary column.
)

openai_api_key = os.getenv("OPENAI_API_KEY") or "OPENAI_API_KEY"

embed = OpenAIEmbeddings(model=model_name, openai_api_key=openai_api_key)
embed  # will be use later while creating vector store with langchain

OpenAIEmbeddings(client=<class 'openai.api_resources.embedding.Embedding'>, model='text-embedding-ada-002', deployment='text-embedding-ada-002', openai_api_version='', openai_api_base='', openai_api_type='', openai_proxy='', embedding_ctx_length=8191, openai_api_key='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', openai_organization='', allowed_special=set(), disallowed_special='all', chunk_size=1000, max_retries=6, request_timeout=None, headers=None, tiktoken_model_name=None)

## Initialize Qdrant client

In [9]:
# Initialize Qdrant client

current_folder = Path.cwd()  # Get the current folder
qdrant_folder = current_folder / "qdrant"
qdrant_folder.mkdir()  # Create qdrant folder to store collection

client = QdrantClient(path=qdrant_folder.resolve())  # path to new qdrant folder

collection_name = "langchain-retrieval-agent"

collections = client.get_collections()
print(collections)

# only create collection if it doesn't exist
if collection_name not in collections:
    client.recreate_collection(
        collection_name=collection_name,
        vectors_config=models.VectorParams(
            size=1536,  # specifying dimensionality of vectors output by model
            distance=models.Distance.COSINE,  # specifying which metric will be used to check similarity of vectors
        ),
    )
collections = client.get_collections()
print(collections)

collections=[]
collections=[CollectionDescription(name='langchain-retrieval-agent')]


## Generate Embeddings -> Store in Qdrant
Now we will generate embeddings for our summary column. We will do so in batches which is much faster than doing it individually. And then send a single api call to upsert the batch (also much faster).

In qdrant, we need an id (a unique value), embedding (embeddings for the summary column), and metadata for each document in the dataset. The metadata is a dictionary containing data relevant to our embeddings.

In [46]:
%%time
batch_size = 1024  # specify batch size according to your RAM and compute, higher batch size = more RAM usage

for i in tqdm(range(0, len(data), batch_size)):
    i_end = min(len(data), i + batch_size)  # get end of batch
    batch = data.iloc[i:i_end]  # extract batch
    meta = batch.to_dict(orient="records")  # first get metadata fields for this record
    # create embeddings (try-except added to avoid RateLimitError)
    try:
        res = openai.Embedding.create(
            input=batch["summary"].tolist(), engine=model_name
        )
    except:
        done = False
        while not done:
            sleep(5)
            try:
                res = openai.Embedding.create(
                    input=batch["summary"].tolist(), engine=model_name
                )
                done = True
            except:
                pass
    embeds = [record["embedding"] for record in res["data"]]
    ids = list(range(i, i_end))  # create unique IDs

    # upsert to qdrant
    client.upsert(
        collection_name=collection_name,
        points=models.Batch(ids=ids, vectors=embeds, payloads=meta),
    )

collection_vector_count = client.get_collection(
    collection_name=collection_name
).vectors_count
print(f"Vector count in collection: {collection_vector_count}")
assert collection_vector_count == len(data)

  0%|          | 0/44 [00:00<?, ?it/s]

Vector count in collection: 44972
CPU times: total: 3min 26s
Wall time: 13min 52s


Let's check our collection info:

In [48]:
client.get_collection(collection_name=collection_name)

CollectionInfo(status=<CollectionStatus.GREEN: 'green'>, optimizer_status=<OptimizersStatusOneOf.OK: 'ok'>, vectors_count=44972, indexed_vectors_count=0, points_count=44972, segments_count=1, config=CollectionConfig(params=CollectionParams(vectors=VectorParams(size=1536, distance=<Distance.COSINE: 'Cosine'>, hnsw_config=None, quantization_config=None, on_disk=None), shard_number=None, replication_factor=None, write_consistency_factor=None, on_disk_payload=None), hnsw_config=HnswConfig(m=16, ef_construct=100, full_scan_threshold=10000, max_indexing_threads=0, on_disk=None, payload_m=None), optimizer_config=OptimizersConfig(deleted_threshold=0.2, vacuum_min_vector_number=1000, default_segment_number=0, max_segment_size=None, memmap_threshold=None, indexing_threshold=20000, flush_interval_sec=5, max_optimization_threads=1), wal_config=WalConfig(wal_capacity_mb=32, wal_segments_ahead=0), quantization_config=None), payload_schema={})

## Creating a Vector Store

We will reuse the same collection to create a vector store of langchain.

In [50]:
qdrant = Qdrant(
    client=client,
    collection_name=collection_name,
    embeddings=embed,
    content_payload_key="summary",
)
qdrant

<langchain.vectorstores.qdrant.Qdrant at 0x1c4308d7e50>

## Querying
Now with the help of langchain we can directly do `similarity search`(without generation component).

In [61]:
query = "When did the biggest terror attack on USA happen?"
qdrant.similarity_search(query, k=3)

[Document(page_content='– A federal law enforcement official called today\'s fatal explosions in Boston a "terrorist attack" but said it wasn\'t clear whether the responsible party was foreign or domestic, CNN reports. Meanwhile speculation, dread, and political agendas are swirling around the Internet as the story unfolds. Slate notes the Patriot\'s Day connection, reporting that other major attacks occurred on or around the holiday, including the Columbine School shooting (1999), the Oklahoma City bombing (1995), and the Waco assault (1993). In fact, Waco inspired anti-government activists to hold their own darker version of Patriot\'s Day. Radio host Alex Jones tweets that the attacks look like a "false flag" operation, meaning he thinks the government or some other powerful group perpetrated the attacks for political reasons, reports Mediaite. The Washington Post reports that in the Middle East and elsewhere, the message "Please Don\'t Be a Muslim" has been retweeted hundreds of ti

Looks like we're getting good results. Let's take a look at how we can begin integrating this into a conversational agent.

## Initializing the Conversational Agent

We will use `gpt-3.5-turbo` as out chat LLM, we will also need `conversational memory` to store previous conversations and a `RetrievalQA` chain to retrieve extra data when needed.

In [112]:
# chat completion llm
llm = ChatOpenAI(
    openai_api_key=openai_api_key, model_name="gpt-3.5-turbo", temperature=0.0
)
# conversational memory
conversational_memory = ConversationBufferWindowMemory(
    memory_key="chat_history", k=5, return_messages=True
)
# retrieval qa chain using vector store
qa = RetrievalQA.from_chain_type(
    llm=llm, chain_type="stuff", retriever=qdrant.as_retriever()
)

Now we can generate answer to our query using the `run` method:

In [113]:
qa.run(query)

'The biggest terror attack on the USA happened on September 11, 2001, commonly referred to as 9/11.'

For doing retrieval augmentation we also need to convert the previously initialized retrieval chain into a tool.

In [114]:
tools = [
    Tool(
        name="Knowledge Base",
        func=qa.run,
        description=(
            "use this tool when answering general knowledge queries to get "
            "more information about the topic"
        ),
    )
]

Now we can initialize the agent like so:

In [115]:
agent = initialize_agent(
    agent="chat-conversational-react-description",  # the type of agent to use
    tools=tools,  # providing with retrieval chain tool
    llm=llm,  # chat llm
    verbose=True,  # to print additional information during execution
    max_iterations=3,  # maximum iterations agent performs before stopping
    early_stopping_method="generate",  # method used to determine when to stop early
    memory=conversational_memory,  # memory agent used, we are using ConversationBufferWindowMemory
    handle_parsing_errors=True,  # to handle parsing errors
)

Now all the components are ready.
We just need to pass our query to the agent to generate answer.

In [116]:
agent(query)



[1m> Entering new  chain...[0m
[32;1m[1;3m{
    "action": "Knowledge Base",
    "action_input": "biggest terror attack on USA date"
}[0m
Observation: [36;1m[1;3mThe biggest terror attack on the USA to date is the September 11, 2001 attacks, commonly referred to as 9/11.[0m
Thought:[32;1m[1;3m{
    "action": "Final Answer",
    "action_input": "The biggest terror attack on the USA to date is the September 11, 2001 attacks, commonly referred to as 9/11."
}[0m

[1m> Finished chain.[0m


{'input': 'When did the biggest terror attack on USA happen?',
 'chat_history': [],
 'output': 'The biggest terror attack on the USA to date is the September 11, 2001 attacks, commonly referred to as 9/11.'}

We get the correct answer, and it generates the answer from the observation. Now let's see for a common question.

In [117]:
agent("what is 2 * 7?")



[1m> Entering new  chain...[0m
[32;1m[1;3m{
    "action": "Final Answer",
    "action_input": "The product of 2 multiplied by 7 is 14."
}[0m

[1m> Finished chain.[0m


{'input': 'what is 2 * 7?',
 'chat_history': [HumanMessage(content='When did the biggest terror attack on USA happen?', additional_kwargs={}, example=False),
  AIMessage(content='The biggest terror attack on the USA to date is the September 11, 2001 attacks, commonly referred to as 9/11.', additional_kwargs={}, example=False)],
 'output': 'The product of 2 multiplied by 7 is 14.'}

This time the agent do not refer to external knowledge and directly answers the question. Also we can see the chat history.

In [118]:
agent(
    "What are the most important factors that account for the GDP growth of a country?"
)



[1m> Entering new  chain...[0m
[32;1m[1;3m{
    "action": "Knowledge Base",
    "action_input": "factors that account for GDP growth of a country"
}[0m
Observation: [36;1m[1;3mThere are several factors that can account for GDP growth of a country. Some of the key factors include:

1. Consumer spending: Increased consumer spending on goods and services can contribute to GDP growth. When consumers have more disposable income and confidence in the economy, they are more likely to spend, which stimulates economic growth.

2. Investment: Both private and public investment in infrastructure, businesses, and technology can drive GDP growth. Investment leads to increased production capacity, job creation, and innovation, all of which contribute to economic expansion.

3. Government spending: Government expenditure on public goods and services, such as education, healthcare, and defense, can have a significant impact on GDP growth. Government spending can stimulate economic activity an

{'input': 'What are the most important factors that account for the GDP growth of a country?',
 'chat_history': [HumanMessage(content='When did the biggest terror attack on USA happen?', additional_kwargs={}, example=False),
  AIMessage(content='The biggest terror attack on the USA to date is the September 11, 2001 attacks, commonly referred to as 9/11.', additional_kwargs={}, example=False),
  HumanMessage(content='what is 2 * 7?', additional_kwargs={}, example=False),
  AIMessage(content='The product of 2 multiplied by 7 is 14.', additional_kwargs={}, example=False)],
 'output': 'There are several factors that can account for GDP growth of a country. Some of the key factors include consumer spending, investment, government spending, exports and imports, labor force and productivity, technological advancements, and monetary and fiscal policies. These factors can interact with each other and vary in their significance depending on the specific country and its economic conditions.'}

In [119]:
agent("can you summarize these facts in two short sentences")



[1m> Entering new  chain...[0m
[32;1m[1;3mCould not parse LLM output: Certainly! Here's a summary of the factors that account for GDP growth in a country: Consumer spending, investment, government spending, exports and imports, labor force and productivity, technological advancements, and monetary and fiscal policies all play important roles in driving GDP growth. These factors interact and vary in significance depending on the specific country and its economic conditions.[0m
Observation: Invalid or incomplete response
Thought:[32;1m[1;3m{
    "action": "Final Answer",
    "action_input": "Several factors contribute to GDP growth in a country, including consumer spending, investment, government spending, exports and imports, labor force and productivity, technological advancements, and monetary and fiscal policies."
}[0m

[1m> Finished chain.[0m


{'input': 'can you summarize these facts in two short sentences',
 'chat_history': [HumanMessage(content='When did the biggest terror attack on USA happen?', additional_kwargs={}, example=False),
  AIMessage(content='The biggest terror attack on the USA to date is the September 11, 2001 attacks, commonly referred to as 9/11.', additional_kwargs={}, example=False),
  HumanMessage(content='what is 2 * 7?', additional_kwargs={}, example=False),
  AIMessage(content='The product of 2 multiplied by 7 is 14.', additional_kwargs={}, example=False),
  HumanMessage(content='What are the most important factors that account for the GDP growth of a country?', additional_kwargs={}, example=False),
  AIMessage(content='There are several factors that can account for GDP growth of a country. Some of the key factors include consumer spending, investment, government spending, exports and imports, labor force and productivity, technological advancements, and monetary and fiscal policies. These factors can

We are getting the answers in the way we wanted. The agent can refer to previous conversation as a source of information as well as decide when to look for external knowledge and when to answer without it.
That's all we wanted to showcase. You can do more queries.

In [None]:
client.delete_collection(collection_name=collection_name)

---