![](images/promo.png)

# Question Answering with 🦜🔗 Langchain and Qdrant without boilerplate

Building applications with Large Language Models don’t have to be complicated. A lot has been going on recently to simplify the development, so you can utilize already pre-trained models and support even complex pipelines with a few lines of code. [LangChain](https://github.com/hwchase17/langchain) provides unified interfaces to different libraries, so you can avoid writing boilerplate code and focus on the value you want to bring.


![](images/chatgpt-babies.png)

*Source: https://github.com/giuven95/chatgpt-failures*

> *ChatGPT-like models struggle with generating factual statements if no context is provided.*

## Natural Question Answering

There is plenty of public datasets available, and [Natural Questions](https://ai.google.com/research/NaturalQuestions) is one of them. It consists of the whole HTML content of the websites they were scraped from. That means we need some preprocessing to extract plain text content. As a result, we’re going to have two lists of strings - one for questions and the other one for the answers.

In [1]:
# All the examples come from https://ai.google.com/research/NaturalQuestions
# This is a sample of the training set that we download and extract for some
# futher processing.

! wget -c https://storage.googleapis.com/dataset-natural-questions/questions.json
! wget -c https://storage.googleapis.com/dataset-natural-questions/answers.json

--2023-02-21 14:16:07--  https://storage.googleapis.com/dataset-natural-questions/questions.json
Resolving storage.googleapis.com (storage.googleapis.com)... 142.250.75.16, 142.250.186.208, 142.250.203.208, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.250.75.16|:443... connected.
HTTP request sent, awaiting response... 416 Requested range not satisfiable

    The file is already fully retrieved; nothing to do.

--2023-02-21 14:16:08--  https://storage.googleapis.com/dataset-natural-questions/answers.json
Resolving storage.googleapis.com (storage.googleapis.com)... 172.217.16.48, 216.58.215.80, 142.250.186.208, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|172.217.16.48|:443... connected.
HTTP request sent, awaiting response... 416 Requested range not satisfiable

    The file is already fully retrieved; nothing to do.



In [2]:
import json

with open("questions.json", "r") as fp:
    questions = json.load(fp)

with open("answers.json", "r") as fp:
    answers = json.load(fp)

In [3]:
questions[0]

'when is the last episode of season 8 of the walking dead'

In [4]:
answers[0]

"No . overall No. in season Title Directed by Written by Original air date U.S. viewers ( millions ) 100 `` Mercy '' Greg Nicotero Scott M. Gimple October 22 , 2017 ( 2017 - 10 - 22 ) 11.44 Rick , Maggie , and Ezekiel rally their communities together to take down Negan . Gregory attempts to have the Hilltop residents side with Negan , but they all firmly stand behind Maggie . The group attacks the Sanctuary , taking down its fences and flooding the compound with walkers . With the Sanctuary defaced , everyone leaves except Gabriel , who reluctantly stays to save Gregory , but is left behind when Gregory abandons him . Surrounded by walkers , Gabriel hides in a trailer , where he is trapped inside with Negan . 101 `` The Damned '' Rosemary Rodriguez Matthew Negrete & Channing Powell October 29 , 2017 ( 2017 - 10 - 29 ) 8.92 Rick 's forces split into separate parties to attack several of the Saviors ' outposts , during which many members of the group are killed ; Eric is critically injur

## How can we fix Large Language Models?

Models, such as ChatGPT, have some general knowledge but cannot guarantee to produce a valid answer consistently. Thus, it is better to provide some facts we know are actual, so it can just choose the valid parts and extract them from all the provided contextual data to give a comprehensive answer.

Vector database, such as [Qdrant](https://qdrant.tech), is of great help here, as their ability to perform a semantic search over a huge knowledge base is crucial to preselect some possibly valid documents, so they can be provided into the LLM. 

## Why do we need a vector database?

Deep Neural Models might be used to create fixed-dimensional vector representations of any data. These embeddings might then be easily compared, as the assumption is similar example should be represented so that their vectors will be close in space. And that's the objective of training those networks.

![](images/vector-embeddings.png)

## Why do we need a vector database?

The introduction of Approximate Nearest Neighbours methods started the renaissance of k-NN-like approaches. We’re all in desperate need of scalable yet simply interpretable methods, and similarity-based ones were great candidates but offered poor performance back in the day. KNN doesn't scale well, but ANN does it seamlessly. And if we want to include some additional filtering, [Qdrant](https://qdrant.tech) offers a unique custom implementation of HNSW (the best ANN algorithm, according to benchmarks), with additional filters already built-in.

![](images/filtered-vector-search.png)

## How can we fix Large Language Models?

We cannot simply integrate our knowledge base into Large Language Models. We could possibly fine-tune them for some custom tasks, but including new facts with that approach would be an endless pain and huge cost, as reality changes rapidly in most cases.

We can, however, integrate two pieces together. First of all, if we could have a good semantic search mechanism, we could extract some candidate documents which should include the answer for our question. Then, those candidates might be provided to a LLM, so it can extract the exact answer from that context.

Vector database, such as Qdrant, is of great help here, as their ability to perform a semantic search over a huge knowledge base is crucial to preselect some possibly valid documents, so they can be provided into the LLM. That’s also one of the chains implemented in LangChain, which is called `VectorDBQA`. And Qdrant got integrated with the library, so it might be used to build it effortlessly.

There will be two models required to set things up. First of all, we need an **embedding model** that will convert the set of facts into vectors, and store those into Qdrant. That’s an identical process to any other semantic search application. We’re going to use one of the `SentenceTransformers` models, so it can be hosted locally. The embeddings created by that model will be put into Qdrant and used to retrieve the most similar documents, given the query.

In [5]:
from langchain.embeddings import HuggingFaceEmbeddings

In [6]:
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-mpnet-base-v2"
)

### Pipeline definition

Once we have the embeddings model chosen, we can create a Qdrant collection in which we'll keep all the answers, along with the vector representations.

In [7]:
! docker run -d -p "6333:6333" -p "6334:6334" qdrant/qdrant:v1.0.1

e66ee367988abdc0e4786f3aacc074cdd2db7da60955cc498e79dcd57b0b89bc


In [8]:
from langchain.vectorstores import Qdrant

In [9]:
doc_store = Qdrant.from_texts(
    answers, embeddings, host="localhost" 
)

In [10]:
doc_store.similarity_search(questions[0])

[Document(page_content="No . overall No. in season Title Directed by Written by Original air date U.S. viewers ( millions ) 100 `` Mercy '' Greg Nicotero Scott M. Gimple October 22 , 2017 ( 2017 - 10 - 22 ) 11.44 Rick , Maggie , and Ezekiel rally their communities together to take down Negan . Gregory attempts to have the Hilltop residents side with Negan , but they all firmly stand behind Maggie . The group attacks the Sanctuary , taking down its fences and flooding the compound with walkers . With the Sanctuary defaced , everyone leaves except Gabriel , who reluctantly stays to save Gregory , but is left behind when Gregory abandons him . Surrounded by walkers , Gabriel hides in a trailer , where he is trapped inside with Negan . 101 `` The Damned '' Rosemary Rodriguez Matthew Negrete & Channing Powell October 29 , 2017 ( 2017 - 10 - 29 ) 8.92 Rick 's forces split into separate parties to attack several of the Saviors ' outposts , during which many members of the group are killed ; E

### Pipeline definition

The first part of the whole pipeline is actually solved. We can perform the semantic search over the answers, but instead of a short and meaningful extract we get a bunch of documents. The second piece is a magic that Langchain simplified.

When we receive a query, there are two steps involved. First of all, we ask Qdrant to provide the most relevant documents and simply combine all of them into a single text. Then, we build a prompt to the LLM, including those documents as a context, of course together with the question asked. So the input to the LLM looks like the following:

> Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
> 
> It's as certain as 2 + 2 = 4
> 
> ...
> 
> Question: How much is 2 + 2?
> 
> Helpful Answer:

There might be several context documents combined, and it is solely up to LLM to choose the right piece of content. But our expectation is, the model should respond with just `4`.

Why do we need two different models? Both solve some different tasks. The first model performs feature extraction, by converting the text into vectors, while the second one helps in text generation or summarization. *Disclaimer: This is not the only way to solve that task with LangChain. Such a chain is called `stuff` in the library nomenclature.*

![](images/flow-diagram.png)

### Pipeline definition

This sounds like a pretty complex application, as it involves several systems. But with LangChain, it might be implemented in just a few lines of code, thanks to the recent integration with Qdrant. We’re not even going to work directly with `QdrantClient`, as everything is already done in the background by LangChain.

There are various LLMs implemented, with OpenAI being one of them.

In [11]:
from langchain.llms import OpenAI
from langchain.chains import VectorDBQA

In [12]:
llm = OpenAI() # OpenAI API key is read from env variables

In [13]:
qa = VectorDBQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    vectorstore=doc_store,
    return_source_documents=False,
)

### Testing it out

Those few lines of code were enough to bring a QA-system to life. It will be using our knowledge base to serve the answers without any additional model fine-tuning. That enables really fast prototyping and solves the issues with LLM factuality. At least if that works. Let's find out!


In [14]:
import random

random.seed(76)
selected_questions = random.choices(questions, k=5)

In [16]:
for question in selected_questions:
    print(">", question)
    print(qa.run(question), end="\n\n")

> what kind of music is scott joplin most famous for
 Scott Joplin is most famous for ragtime music.

> who died from the band faith no more
 Chuck Mosley

> when does maggie come on grey's anatomy
 Maggie (Kelly McCreary) made her first appearance in the season 10 episode "I'm Winning," which aired on April 10, 2014.

> can't take my eyes off you lyrics meaning
 I don't know.

> who lasted the longest on alone season 2
 David McIntyre lasted the longest on Alone Season 2. He won the competition after 66 days.



## Langchain chain types

The approach we used so far uses `stuff` chain type which is the easiest possible. There are, however, some other chain types available.

*Source: https://langchain.readthedocs.io/en/latest/modules/indexes/combine_docs.html*

### Stuffing

Stuffing is the simplest method, whereby you simply stuff all the related data into the prompt as context to pass to the language model. 

**Pros:** Only makes a single call to the LLM. When generating text, the LLM has access to all the data at once.

**Cons:** Most LLMs have a context length, and for large documents (or many documents) this will not work as it will result in a prompt larger than the context length.

### Map Reduce

This method involves an initial prompt on each chunk of data (for summarization tasks, this could be a summary of that chunk; for question-answering tasks, it could be an answer based solely on that chunk). Then a different prompt is run to combine all the initial outputs. 

**Pros:** Can scale to larger documents (and more documents) than stuffing. The calls to the LLM on individual documents are independent and can therefore be parallelized.

**Cons:** Requires many more calls to the LLM than stuffing. Loses some information during the final combining call.

### Refine

This method involves an initial prompt on the first chunk of data, generating some output. For the remaining documents, that output is passed in, along with the next document, asking the LLM to refine the output based on the new document.

**Pros:** Can pull in more relevant context, and may be less lossy than Map Reduce.

**Cons:** Requires many more calls to the LLM than stuffing. The calls are also NOT independent, meaning they cannot be paralleled like Map Reduce. There is also some potential dependencies on the ordering of the documents.

### Map-Rerank

This method involves running an initial prompt on each chunk of data, that not only tries to complete a task but also gives a score for how certain it is in its answer. The responses are then ranked according to this score, and the highest score is returned.

**Pros:** Similar pros as Map Reduce, but requires fewer calls.

**Cons:** Cannot combine information between documents. This means it is most useful when you expect there to be a single simple answer in a single document.

## Going beyond a simple QA system

The Question Answering system we've just implemented solves the main problem. But what if we rather wanted to have a conversational agent that will also remember the previous conversation and take it into account while providing the answer? That's also feasible with a different chain - `ChatVectorDBChain`.

In [29]:
from langchain.chains import ChatVectorDBChain

In [30]:
chat_qa = ChatVectorDBChain.from_llm(
    llm=llm,
    vectorstore=doc_store,
)

In [32]:
chat_history = []
while True:
    query = input("> Question: ").strip()
    if len(query) == 0:
        break
    result = chat_qa({"question": query, "chat_history": chat_history})
    chat_history.append((query, result["answer"]))
    print(result["answer"], end="\n\n")

> Question: Who is a famous ragtime musician?
 Paul Williams

> Question: What is he known for?
 Paul Williams is a composer and singer known for his work on the musical film Bugsy Malone and songs like "Fat Sam's Grand Slam", "Tomorrow", "Bad Guys", "I'm Feeling Fine", "My Name Is Tallulah", "Liberty", "So You Wanna Be a Boxer?", "Ordinary Fool", "Down and Out" and "You Give a Little Love".

> Question: Does he play in any band?


 No, Paul Williams is not a member of any band.

> Question: 
