# Contextual compression in document retrieval

This notebook demonstrates how to implement contextual compression in a document retrieval system using LangChain and OpenAI's language models. In traditional document retrieval systems, queries typically return entire documents or large chunks of text, which may include irrelevant sections. Contextual compression helps address this issue by intelligently extracting only the most relevant content from the document, improving both the relevance and the efficiency of information retrieval.


In [1]:
import os
from dotenv import load_dotenv
from langchain.retrievers.document_compressors import LLMChainExtractor
from langchain.retrievers import ContextualCompressionRetriever
from langchain.chains import RetrievalQA
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.vectorstores import FAISS

# Load environment variables from a .env file
load_dotenv()

# Access the API key
os.environ["OPENAI_API_KEY"] = os.getenv('OPENAI_API_KEY')

### Load PDF documents
We will use `PyPDFLoader` to extract text from the PDF. It reads the PDF page by page and stores the extracted text in a list of document objects, where each document contains the content of a single page.

In [2]:
# Path to the PDF file
path = "Understanding_Climate_Change.pdf"

loader = PyPDFLoader(path)
documents = loader.load()

### Preprocessing

##### Split the document into chunks
We use the `RecursiveCharacterTextSplitter` to split the document into smaller chunks. This is ideal when working with large documents to make them manageable for embedding generation and for easier retrieval.

In [3]:
# Split documents into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200, length_function=len)
texts = text_splitter.split_documents(documents)

we are splitting the document into chunks of size 1000 characters with 200 characters of overlap between chunks.
- `chunk_size` makes it more manageable for indexing and retrieval.
- `chunk_overlap` ensures that the context is preserved when the text is split. This helps maintain the flow of information between chunks.
- `length_function` tells the splitter to calculate the length of the chunks based on the number of characters, ensuring that the chunks are exactly the specified size.

##### Replace tabs with spaces
In many cases, PDFs may contain tab characters (`\t`) that were used for indentation but aren't necessary for the final processed text. We will replace these with spaces.

In [4]:
# Replace tab characters with spaces in the text content
for text in texts:
    text.page_content = text.page_content.replace('\t', ' ')  # Replace tabs with spaces

### Generate embeddings using OpenAI
Once the text is cleaned and processed, we can create embeddings for each of the chunks using OpenAI API. These embeddings represent the meaning of the text in a high-dimensional vector space.

In [5]:
# Initializes the OpenAI embeddings model
embeddings = OpenAIEmbeddings()

We will also use FAISS to efficiently store and index the embeddings, which allows us to perform similarity search and query efficiently.

In [6]:
# Create vector store using FAISS
vector_store = FAISS.from_documents(texts, embeddings)

Here, we create a FAISS vector store by:
- Generating embeddings for each chunk of text.
- Storing the embeddings in FAISS, which allows fast similarity search later.

This function automatically creates a flat (brute-force) index by default.

### Setup the retriever
Now that we have created the embeddings and stored them in FAISS, we can query the vector store to retrieve relevant information based on user queries. The retriever will help us fetch the top N most relevant document chunks based on a given query.

In [7]:
# Create a retriever
retriever = vector_store.as_retriever()

### LLM-based contextual compressor
Next, we introduce the LLM-based contextual compressor. This component uses a LLM to intelligently compress and extract only the most relevant content from the retrieved document chunks. The LLMChainExtractor is used for this purpose, which processes the retrieved chunks and extracts the most important information.

In [8]:
# Initialize the language model (ChatOpenAI) with specific configuration
llm = ChatOpenAI(temperature=0, model_name="gpt-4o-mini-2024-07-18", max_tokens=4000)

# Create the LLMChainExtractor as a compressor
compressor = LLMChainExtractor.from_llm(llm)

- We initialize the GPT-4-based language model. The `temperature=0` ensures deterministic, focused output, and `max_tokens=4000` specifies the maximum number of tokens the model can process.
- Then, we use the `LLMChainExtractor` to extract relevant parts from the document chunks using the language model. It helps in compressing the document content, retaining only the most pertinent information. In other words, this process sets up an extraction chain that includes the following steps:
    - Processing: The model processes the document chunks.
    - Extracting: The model identifies and extracts the relevant sections based on the query and document context.
    - Compressing: The extracted information is compressed to retain only the essential parts, ensuring brevity while maintaining relevance.

### Contextual compression retriever
We now combine the retriever and compressor into a ContextualCompressionRetriever. This retriever first fetches the most relevant document chunks using the base retriever and then compresses them using the compressor to extract the most important content.

In [9]:
# Combine the retriever with the compressor into a contextual compression retriever
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=retriever
)

The `ContextualCompressionRetriever` combines both of these components — the retriever and the compressor — to produce a more powerful and efficient retrieval process:
1. First stage - Retrieve relevant chunks - When a query is made, the base retriever searches through the vector store (which contains vector representations of the document chunks) and returns the top document chunks that are relevant to the query.
2. Second stage: Compress and extract - Once the relevant chunks are retrieved, the compressor uses the `LLMChainExtractor` to compress the chunks, ensuring that only the most relevant and concise parts are kept. The compressor applies a language model to extract key information and remove unnecessary content, leaving only the essential details.

### Question-answering chain
Finally, we create a QA chain that integrates the contextual compression retriever to form a complete system capable of retrieving and answering queries with contextually compressed information. The QA chain takes a user's query, retrieves the relevant compressed context, and generates an answer based on the compressed content.

In [10]:
# Create a QA chain with the compressed retriever
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=compression_retriever,
    return_source_documents=True
)

The `RetrievalQA.from_chain_type` method is used to create the QA chain. This method brings together the LLM and the retriever (compression retriever) in a unified structure. The goal is to use the LLM to generate accurate answers from the retrieved and compressed context.
- `llm=llm`: The language model is initialized earlier in the code as a ChatOpenAI model. This model is responsible for answering the query based on the retrieved content. The model works by processing the compressed document chunks and generating an answer that is relevant and coherent.
- `retriever=compression_retriever`: The contextual compression retriever is used as the retriever in this QA chain. This retriever performs two main functions: first, it retrieves relevant chunks based on the query, and then it compresses these chunks to extract only the most relevant information. The compressed context is then passed to the language model to generate a focused answer.
- `return_source_documents=True`: This flag ensures that the source documents that contributed to generating the answer are returned alongside the answer itself. This adds transparency to the process by showing which parts of the document the system relied on to generate the response. This is useful for users to validate the information and understand where the answer came from.

### Test the system
Now that we have set up the QA chain with the contextual compression retriever, we can test the system by sending it a query.

In [11]:
# Example query
query = "What is the main topic of the document?"

# Invoke the QA chain with the query
result = qa_chain.invoke({"query": query})

The `qa_chain.invoke({"query": query})` command is where the magic happens. This line triggers the entire question-answering process, which involves the following steps:
   - Retrieve relevant chunks: The contextual compression retriever fetches the most relevant document chunks that could contain information related to the query.
   - Compression: The retriever then compresses these chunks to retain only the most critical content.
   - Answer generation: The language model processes the compressed context and generates an answer to the query.
   
The result of invoking this QA chain will be a dictionary containing two key pieces of information:
   - `"result"`: The answer to the query based on the compressed context.
   - `"source_documents"`: A list of the document chunks or sources that contributed to generating the answer.

In [13]:
# Print the result and source documents
print(result["result"])
print("\nSource documents:", result["source_documents"])

The main topic of the document is climate change, focusing on its causes, effects, and potential solutions through global and local climate action, international collaboration, and national strategies. It discusses frameworks like the UNFCCC and the Paris Agreement, as well as various policies and practices aimed at reducing greenhouse gas emissions and promoting sustainability.

Source documents: [Document(metadata={'source': 'Understanding_Climate_Change.pdf', 'page': 9}, page_content='Chapter 6: Global and Local Climate Action \nInternational Collaboration \nUnited Nations Framework Convention on Climate Change (UNFCCC) \nThe UNFCCC is an international treaty aimed at addressing climate change. It provides a \nframework for negotiating specific protocols and agreements, such as the Kyoto Protocol and \nthe Paris Agreement. Global cooperation under the UNFCCC is crucial for coordinated \nclimate action. \nParis Agreement \nThe Paris Agreement, adopted in 2015, aims to limit global wa

The system successfully processed the query by retrieving, compressing, and generating a focused answer from the relevant document content. validate the answers and trace the content that was used for generating the response.

