Suppose you have some text documents (PDF, blog, Notion pages, etc.) and want to ask questions related to the contents of those documents.

LLMs, given their proficiency in understanding text, are a great tool for this.

In this walkthrough we'll go over how to build a question-answering over documents application using LLMs.

Two very related use cases which we cover elsewhere are:

QA over structured data (e.g., SQL)
QA over code (e.g., Python)

<img src="https://python.langchain.com/assets/images/qa_flow-9fbd91de9282eb806bda1c6db501ecec.jpeg" width="1000" >


In [4]:
from langchain.document_loaders import WebBaseLoader, CSVLoader  #more integrations here https://integrations.langchain.com/
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.chat_models import ChatOpenAI
from langchain import hub
from langchain.schema.runnable import RunnablePassthrough
from dotenv import load_dotenv
load_dotenv()

# Load documents
loader = WebBaseLoader(['https://python.langchain.com/docs/get_started/introduction', 'https://python.langchain.com/docs/integrations/document_loaders/pandas_dataframe'])

In [5]:
# Split documents
# Context-aware splitters keep the location ("context") of each split in the original Document

text_splitter = RecursiveCharacterTextSplitter(chunk_size = 500, chunk_overlap = 0)
splits = text_splitter.split_documents(loader.load())

In [6]:
splits[0]

Document(page_content='Introduction | ğŸ¦œï¸�ğŸ”— Langchain', metadata={'source': 'https://python.langchain.com/docs/get_started/introduction', 'title': 'Introduction | ğŸ¦œï¸�ğŸ”— Langchain', 'description': 'LangChain is a framework for developing applications powered by language models. It enables applications that:', 'language': 'en'})

In [7]:
# Embed and store splits
#https://api.python.langchain.com/en/latest/embeddings/langchain.embeddings.huggingface.HuggingFaceEmbeddings.html
#https://huggingface.co/models?library=sentence-transformers&sort=downloads

model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': False}

#pip install urllib3<2  https://stackoverflow.com/questions/76414514/cannot-import-name-default-ciphers-from-urllib3-util-ssl-on-aws-lambda-us
emb = HuggingFaceEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)

#emb = OpenAIEmbeddings()

vectorstore = Chroma.from_documents(documents=splits, embedding=emb)  #can also use local FAISS db
retriever = vectorstore.as_retriever()

## Retriever configs:

* Retrieve more documents with higher diversity- useful if your dataset has many similar documents
> docsearch.as_retriever(search_type="mmr", search_kwargs={'k': 6, 'lambda_mult': 0.25})

* Fetch more documents for the MMR algorithm to consider, but only return the top 5
> docsearch.as_retriever(search_type="mmr", search_kwargs={'k': 5, 'fetch_k': 50})

* Only retrieve documents that have a relevance score above a certain threshold
> docsearch.as_retriever(search_type="similarity_score_threshold", search_kwargs={'score_threshold': 0.8})

* Only get the single most similar document from the dataset
> docsearch.as_retriever(search_kwargs={'k': 1})

* Use a filter to only retrieve documents from a specific paper
> docsearch.as_retriever(search_kwargs={'filter': {'paper_title':'GPT-4 Technical Report'}})

In [9]:
# Prompt 
# https://smith.langchain.com/hub/rlm/rag-prompt

# You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
# Question: {question} 
# Context: {context} 
# Answer:


rag_prompt = hub.pull("rlm/rag-prompt")
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

In [16]:
llm.predict('what is langchain?')

'Langchain is a decentralized blockchain platform that aims to provide language services and solutions. It utilizes blockchain technology to create a secure and transparent ecosystem for language-related transactions, such as translation, interpretation, and language learning. Langchain aims to connect language service providers and users directly, eliminating intermediaries and reducing costs. It also offers features like smart contracts, reputation systems, and payment solutions to ensure efficient and reliable language services.'

In [37]:
# RAG chain 

rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()} 
    | rag_prompt 
    | llm 
)

In [39]:
rag_chain.invoke('what is langchain?')

AIMessage(content='LangChain is a framework for developing applications powered by language models. It enables applications that are context-aware and can reason based on provided context. LangChain provides components that are modular and easy-to-use for working with language models.')

# Testing another LLM

In [1]:
# open ai works fine!
#llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

# network error
#from langchain.llms import HuggingFaceEndpoint
# endpoint_url = (
#                 "https://abcdefghijklmnop.us-east-1.aws.endpoints.huggingface.cloud"
#             )
# hf = HuggingFaceEndpoint(
#     endpoint_url=endpoint_url,
# )

#takes long time to load
#from langchain.llms import HuggingFaceHub  # examples https://python.langchain.com/docs/integrations/llms/huggingface_hub.html
#repo_id = "databricks/dolly-v2-3b"         # See https://huggingface.co/models?pipeline_tag=text-generation&sort=downloads for some other options
#repo_id = "Qwen/Qwen-7B"
#hf = HuggingFaceHub(repo_id=repo_id, model_kwargs={"temperature": 0.5, "max_length": 64})   #"gpt2" very bad

from langchain.llms import GPT4All
model = GPT4All(model="./wizardlm-13b-v1.1-superhot-8k.ggmlv3.q4_0.bin", n_threads=8) #https://gpt4all.io/models/

Found model file at  ./wizardlm-13b-v1.1-superhot-8k.ggmlv3.q4_0.bin


In [2]:
model('what is langchain?')

'\n\nLangChain (LNG) is a decentralized platform for creating, managing and monetizing multilingual content. The LangChain ecosystem enables users to create, translate, edit, store, manage, and distribute multilingual content in an efficient, transparent, and secure manner.'

In [10]:
# RAG chain with LLM

rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()} 
    | rag_prompt 
    | model 
)

rag_chain.invoke('what is langchain?')

' LangChain is a framework for developing applications powered by language models, enabling context-aware and reasonable solutions with components that are modular, easy to use, and supported by various implementations.'

# Another approach with new lib

In [23]:
from langchain.chains import RetrievalQA

llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

# RetrievalQA
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectorstore.as_retriever(),
    chain_type_kwargs={"prompt": rag_prompt}
)
question = 'what is langchain?'
result = qa_chain({"query": question})
result["result"]

'LangChain is a framework for developing applications powered by language models. It enables applications to be context-aware and rely on a language model to reason and provide responses based on provided context. LangChain provides components that are modular and easy-to-use for working with language models.'