# Generate Answers with Your Documents

## Table of Contents
- [RAG by Directly Passing Context](#RAG-by-Directly-Passing-Context)
- [RAG by Similarity Search](#RAG-by-Similarity-Search)

This notebook show an example of Retrieval Augmented Generation that utilizes LangChain to perform question-answering tasks by combining retrieval and generation techniques.

We will practice retrieval augmented generation once by passing the context straight to the language model, and once by using FAISS and similarity searching.

[FAISS](https://faiss.ai/) is an efficient and high-performance library for vector similarity search. It allows us to find similar things in mountains of data, fast.

---

## What is Retrieval Augmented Generation?
Retrieval Augmented Generation (RAG)'s purpose is to increase the relevance, accuracy and truthfulness of generation. This way we remove the "data freshness" problem that LLM's inherently have.

It allows you to retrieve relevant documents based on a query and use them as context to generate concise answers to user questions.

For more information, see:

- [Retrieval Documentation](https://python.langchain.com/docs/modules/data_connection/)

- [Advanced Retrieval Types](https://python.langchain.com/docs/modules/data_connection/retrievers/)

- [QA with RAG Use Case Documentation](https://python.langchain.com/docs/use_cases/question_answering/)

---

### Dependencies

The Retrieval Augmented Generation module relies on the following dependencies:

- `langchain`: The core LangChain library for building the pipeline and running the modules.

- `langchain_community`: The LangChain community package that provides additional modules and utilities.

- `langchain_core`: The core components and utilities of LangChain.

In [1]:
%pip install -Uq langchain openai unstructured chardet faiss-gpu

Note: you may need to restart the kernel to use updated packages.


### Enable LangChain tracing (Optional)

In [2]:
import os
os.environ['LANGSMITH_TRACING_V2'] = "true"
#os.environ['LANGCHAIN_PROJECT'] = "unstructuredfileloader" # Defaults to 'default'
os.environ['LANGCHAIN_ENDPOINT'] = "https://api.smith.langchain.com"
os.environ['LANGCHAIN_API_KEY'] = ""

# RAG by Directly Passing Context
For the first RAG example, we will pass the context directly to the language model after splitting it.

In [3]:
import chardet

from langchain_openai.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_community.document_loaders.unstructured import UnstructuredFileLoader
from unstructured.cleaners.core import clean_extra_whitespace

# Set API Key
os.environ['OPENAI_API_KEY'] = ""

In [5]:
# Step 1: Define the retriever function using UnstructuredFileLoader
loader = UnstructuredFileLoader(
    "./data/france.txt", post_processors=[clean_extra_whitespace],
    )
# The files contents: "The Capital of France is Paris."
docs = loader.load()

In [6]:
docs[0].page_content[:40]

'The Capital of France is Paris.'

In [7]:
# Step 2: Define the prompt template with the file loader as the context
template = """You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question} 
Context: {context} 
Answer:
"""
prompt = ChatPromptTemplate.from_template(template, context=docs)

In [8]:
print(prompt)

input_variables=['context', 'question'] messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:\n"))]


### Creating a LangChain `chain`

• A Chain is a sequence of Runnables that are executed in a specific order.

• Chains provide a way to string together multiple Runnables to create a workflow or pipeline.

• Each Runnable in the Chain takes the output of the previous Runnable as its input.

• Chains can be used to build complex applications by combining and orchestrating the execution of multiple Runnables.

• They provide a higher-level abstraction for organizing and structuring the flow of data and operations.

• Examples of Chains include data processing pipelines, machine learning workflows, and API request/response sequences.

In [9]:
# Step 3: Create the LangChain pipeline
llm = ChatOpenAI(model_name="gpt-3.5-turbo-1106", temperature=0)
output_parser = StrOutputParser()

chain = (
    {"context": RunnablePassthrough(), "question": RunnablePassthrough()}
    | prompt
    | llm
    | output_parser
)

It's worth noting that you could easily augment the above code cell to include a `retriever`, but we'll learn more about that later when we test out FAISS.

In [10]:
# Step 4: Invoke the LangChain pipeline with a question
question = "What is the capital of France?"
answer = chain.invoke({"question": question})

In [11]:
print(answer)

The capital of France is Paris.


### Explanation
In this example, the Retrieval Augmented Generation module is used to answer a user's question about the capital of France. The module retrieves relevant documents based on the query using the `UnstructuredFileLoader` and incorporates them into the prompt template. 

The LangChain pipeline is then created by chaining together the retriever, prompt, language model (LLM), and output parser components. Finally, the pipeline is invoked with the user's question, and the generated answer is printed.

## Customization

The Retrieval Augmented Generation module can be customized to fit your specific needs. Here are some areas to consider:

• Modify the prompt template to structure the prompt according to your requirements. You can include placeholders for the question, retrieved context, or any other information you want to provide to the language model.

• Use different language models by specifying the desired model name when creating the `ChatOpenAI` instance. You can explore different models provided by OpenAI, open-source models on the HuggingFace Hub, or use your own fine-tuned models.

• Customize the output parser to parse the generated answer in a format that suits your application's needs.

---

# RAG by Similarity Search
Now let's move on to the second example, using FAISS and similarity search. 

## Load Documents

We'll first need some new context to test out. Let's try using the `WebBaseLoader`, a flexible loader for online content. It works by loading HTML pages using urllib and parsing them with BeautifulSoup.

See the [API Documentation](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.web_base.WebBaseLoader.html?highlight=webbaseloader#langchain_community.document_loaders.web_base.WebBaseLoader) for more details.

In [None]:
from langchain_community.document_loaders import WebBaseLoader

loader = WebBaseLoader("https://python.langchain.com/docs/expression_language/")

docs = loader.load()

In [None]:
len(docs)

1

## Split documents

Previously, when working to pass context directly to the LLM, we didn't split our context into chunks. However, because we're going to perform similarity searching using a FAISS index, we need to split our documents into chunks.

That way when we enter a query, only the most relevant information from the index will be passed in as context.

### Split Data into Chunks with `RecursiveCharacterTextSplitter`

To make effective use of our loaded documents (files) we need to split them into manageable chunks.

Generally speaking, smaller chunks warrant more accurate results, but may take longer to process.

### Go Deeper

#### Accuracy with Smaller Chunks
* **Increased Focus**: Smaller chunks of text or queries allow the system to focus on a more specific set of information. This specificity can lead to more accurate and relevant results because the system is not overwhelmed by too much or too broad information.
* **Contextual Relevance**: With a narrower focus, the likelihood of retrieving information that is contextually relevant to more specific queries, enhancing the accuracy of the response.

#### Processing Time
* **Multiple Queries**: Smaller chunks might require multiple queries to cover a topic comprehensively. Each query involves a separate retrieval process, which cumulatively can take more time.
* **Trade-off Between Depth and Breadth**: While smaller queries allow for a depth in a specific area, they might necessitate multiple rounds of retrieval to get a broad understanding, thus increasing overall processing time.

#### System Limitations and Efficiency:
* **Computational Load**: Smaller chunks means more frequent calls to the retrieval system. Depending on the efficiency of the system, this can either slow down the process due to computational load or, if the system is highly efficient, might not significantly impact the processing time.

The following cell demonstrates how to split the loaded data into chunks using the Langchain library. We'll instantiate a variable, `text_splitter`, with the `RecursiveCharacterTextSplitter` class.

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter


text_splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=0)
documents = text_splitter.split_documents(docs)

In [None]:
len(documents)

11

## Index Documents

Next we'll index the documents using FAISS. The following cell, you can practice saving and loading your FAISS index locally, but it's not a necessary step.

In [None]:
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

embeddings = OpenAIEmbeddings()
# Initialize FAISS as our vector database and embed our documents inside
vector = FAISS.from_documents(documents, embeddings)
# Now our vector database is ready to be queried

Optional cell:

In [None]:
# Save the index to disk
vector.save_local("test_index")

# Load the index from disk
load_local_vector = FAISS.load_local("test_index", embeddings)

## Querying Documents

### Similarity Searching for relevant context from a swath of data

We'll be performing two different types of searching against our index, first by using a basic, and subsequently a more advanced approach.

## ***Basic Retrieval***:

Before we do retrieval, let's set up a prompt template we can easily pass data through when the user asks a question. We'll do this with `ChatPromptTemplate` and then pass it into `create_stuff_documents_chain`.

In [None]:
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

# Create a prompt template
prompt = ChatPromptTemplate.from_template("""Answer the following question based only on the provided context:

<context>
{context}
</context>

Question: {input}""")
llm = ChatOpenAI(model_name="gpt-3.5-turbo-1106")

document_chain = create_stuff_documents_chain(llm, prompt)

In [None]:
from langchain.chains import create_retrieval_chain

# Use our Vector Database as our `Retriever`
retriever = vector.as_retriever()

# Create our basic retrieval chain
retrieval_chain = create_retrieval_chain(retriever, document_chain)

In [None]:
# Invoke the basic chain
response = retrieval_chain.invoke(
    {"input": "how can langsmith help with testing?"}
    )
print(response["answer"])

## ***Advanced Retrieval***:

We'll use the `MultiQueryRetriever` class to query our index using multiple queries.
This specific retriever works by being given a query, and using an LLM to write a set of queries for more specific results.

In [None]:
from langchain.retrievers import MultiQueryRetriever

# Pass in `retriever` directly, NOT wrappers `retrieval_chain` or `response`
advanced_retriever = MultiQueryRetriever.from_llm(retriever=retriever, llm=llm)

In [None]:
# Create our advanced retrieval chain
retrieval_chain = create_retrieval_chain(advanced_retriever, document_chain)

In [None]:
# Invoke the advanced chain
response = retrieval_chain.invoke(
    {"input": "how can langsmith help with testing?"}
    )
print(response["answer"])

LangSmith can help with testing by allowing users to quickly edit examples and add them to datasets to expand the evaluation sets, fine-tune a model for improved quality or reduced costs, monitor application performance, log all traces, visualize latency and token usage statistics, troubleshoot specific issues, and extract insights from logged runs. Additionally, LangSmith simplifies the approach to constructing datasets and provides examples for integrating with third parties for testing purposes.


## Conclusion

The Retrieval Augmented Generation module provides a convenient way to perform question-answering tasks by combining retrieval and generation techniques. It allows you to retrieve relevant documents based on a query and generate concise answers using LangChain. With its customization options, you can tailor the module to suit your specific needs and integrate it into your applications for enhanced question-answering capabilities.