## Setting Up Our Environment


Don't forget to run ```export OPENAI_API_KEY=sk-...``` to set your api key in the environment variables before running Jupyter. You can set up alternative api keys with hugging face or other client sites to operate with LangChain but that is beyond our scope.

# Why We Need Compression in Retrieval Systems
Compression in retrieval systems addresses several critical challenges:

1. <b>Relevance Filtering</b>:
   - When you ingest documents into a retrieval system, you typically don't know what specific queries users will make. This means retrieved documents often contain a mix of relevant and irrelevant information.
2. <b>Cost Efficiency</b>:
   - Passing entire documents through LLMs is expensive. Compression reduces the amount of text processed, lowering costs for API-based LLM services.
3. <b>Response Quality</b>:
   - Providing LLMs with concise, relevant context leads to better responses. Irrelevant information can confuse the model and result in lower quality outputs.
4. <b>Processing Speed</b>:
   - Shorter documents mean faster processing times, improving overall system performance.
5. <b>Context Window Optimization</b>:
   - LLMs have limited context windows. Compression helps maximize the use of available tokens by focusing only on the most relevant content.

# How to do retrieval with contextual compression

One challenge with [retrieval](/docs/concepts/retrieval/) is that usually you don't know the specific queries your document storage system will face when you ingest data into the system. This means that the information most relevant to a query may be buried in a document with a lot of irrelevant text. Passing that full document through your application can lead to more expensive LLM calls and poorer responses.

<b>Contextual compression</b> is meant to fix this. 

The idea is simple: instead of immediately returning retrieved documents as-is, you can compress them using the context of the given query, so that only the relevant information is returned. “Compressing” here refers to both compressing the contents of an individual document and filtering out documents.

To use the Contextual Compression Retriever, you'll need:

- a base [retriever](https://python.langchain.com/api_reference/core/retrievers/langchain_core.retrievers.BaseRetriever.html#langchain_core.retrievers.BaseRetriever)
- a [Document Compressor](https://python.langchain.com/api_reference/core/documents/langchain_core.documents.compressor.BaseDocumentCompressor.html)

The Contextual Compression Retriever passes queries to the base retriever, takes the initial documents and passes them through the Document Compressor. The Document Compressor takes a list of documents and shortens it by reducing the contents of documents or dropping documents altogether.

## Get started

In [1]:
# Helper function for printing docs


def pretty_print_docs(docs):
    print(
        f"\n{'-' * 100}\n".join(
            [f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]
        )
    )

## Using a vanilla vector store retriever
Let's start by initializing a simple vector store retriever and storing the 2023 State of the Union speech (in chunks). We can see that given an example question our retriever returns one or two relevant docs and a few irrelevant docs. And even the relevant docs have a lot of irrelevant information in them.


In [2]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter

documents = TextLoader("../session2/some_data/FDR_State_of_Union_1944.txt").load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
retriever = FAISS.from_documents(texts, OpenAIEmbeddings()).as_retriever()

docs = retriever.invoke("What did the president say about germany")
pretty_print_docs(docs)

Created a chunk of size 1535, which is longer than the specified 1000


Document 1:

The foreign policy that we have been followingâ€”the policy that guided us at Moscow, Cairo, and Teheranâ€”is based on the common sense principle which was best expressed by Benjamin Franklin on July 4, 1776: "We must all hang together, or assuredly we shall all hang separately."

I have often said that there are no two fronts for America in this war. There is only one front. There is one line of unity which extends from the hearts of the people at home to the men of our attacking forces in our farthest outposts. When we speak of our total effort, we speak of the factory and the field, and the mine as well as of the battleground -- we speak of the soldier and the civilian, the citizen and his Government.

Each and every one of us has a solemn obligation under God to serve this Nation in its most critical hourâ€”to keep this Nation great -- to make this Nation greater in a better world.
----------------------------------------------------------------------------------------

## Adding contextual compression with an `LLMChainExtractor`
Now let's wrap our base retriever with a `ContextualCompressionRetriever`. 

We'll add an `LLMChainExtractor`, a Document compressor that uses an LLM chain to extract the relevant parts of documents.

This will iterate over the initially returned documents and extract from each only the content that is relevant to the query.


In [3]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor
from langchain_openai import OpenAI

llm = OpenAI(temperature=0)

compressor = LLMChainExtractor.from_llm(llm)

compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, 
    base_retriever=retriever                          # retriever = FAISS.from_documents(texts, OpenAIEmbeddings()).as_retriever()
)

compressed_docs = compression_retriever.invoke(
    "What did the president say about germany"
)
pretty_print_docs(compressed_docs)

Document 1:

- The foreign policy that we have been followingâ€”the policy that guided us at Moscow, Cairo, and Teheranâ€”is based on the common sense principle which was best expressed by Benjamin Franklin on July 4, 1776: "We must all hang together, or assuredly we shall all hang separately."
- I have often said that there are no two fronts for America in this war. There is only one front. There is one line of unity which extends from the hearts of the people at home to the men of our attacking forces in our farthest outposts.
- When we speak of our total effort, we speak of the factory and the field, and the mine as well as of the battleground -- we speak of the soldier and the civilian, the citizen and his Government.
- Each and every one of us has a solemn obligation under God to serve this Nation in its most critical hourâ€”to keep this Nation great -- to make this Nation greater in a better world.
----------------------------------------------------------------------------------

## More built-in compressors: filters
### `LLMChainFilter`
The `LLMChainFilter` is a Filter that drops documents that aren’t relevant to the query.

This slightly simpler but more robust compressor uses an LLM chain to decide which of the initially retrieved documents to filter out and which ones to return, without manipulating the document contents.



In [4]:
from langchain.retrievers.document_compressors import LLMChainFilter

_filter = LLMChainFilter.from_llm(llm)     # The language model to use for filtering

compression_retriever = ContextualCompressionRetriever(
    base_compressor=_filter, 
    base_retriever=retriever                       # retriever = FAISS.from_documents(texts, OpenAIEmbeddings()).as_retriever()
)

compressed_docs = compression_retriever.invoke(
    "What did the president say about germany?"
)
pretty_print_docs(compressed_docs)

Document 1:

Let us remember the lessons of 1918. In the summer of that year the tide turned in favor of the allies. But this Government did not relax. In fact, our national effort was stepped up. In August, 1918, the draft age limits were broadened from 21-31 to 18-45. The President called for "force to the utmost," and his call was heeded. And in November, only three months later, Germany surrendered.

That is the way to fight and win a warâ€”all outâ€”and not with half-an-eye on the battlefronts abroad and the other eye-and-a-half on personal, selfish, or political interests here at home.

Therefore, in order to concentrate all our energies and resources on winning the war, and to maintain a fair and stable economy at home, I recommend that the Congress adopt:
----------------------------------------------------------------------------------------------------
Document 2:

It will give our people at home the assurance that they are standing four-square behind our soldiers and sailors

### `LLMListwiseRerank`

[LLMListwiseRerank](https://python.langchain.com/api_reference/langchain/retrievers/langchain.retrievers.document_compressors.listwise_rerank.LLMListwiseRerank.html) uses [zero-shot listwise document reranking](https://arxiv.org/pdf/2305.02156) and functions similarly to `LLMChainFilter` as a robust but more expensive option. It is recommended to use a more powerful LLM.

Note that `LLMListwiseRerank` requires a model with the [with_structured_output](https://python.langchain.com/api_reference/openai/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html#langchain_openai.chat_models.base.ChatOpenAI.with_structured_output) method implemented.

The `LLMListwiseRerank` approach is based on the mathematical concept of permutation optimization. If we have $n$ documents, there are $n!$ possible orderings. The LLM attempts to find the optimal permutation $\pi^*$ that maximizes relevance:

$$
\pi^* = \arg\max_{\pi \in \Pi_n} \text{Relevance}(\pi, q)
$$

Where $\Pi_n$ is the set of all possible permutations of $n$ documents.

In [5]:
from langchain.retrievers.document_compressors import LLMListwiseRerank
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

_filter = LLMListwiseRerank.from_llm(llm, top_n=1)

compression_retriever = ContextualCompressionRetriever(
    base_compressor=_filter, 
    base_retriever=retriever
)

compressed_docs = compression_retriever.invoke(
    "What did the president say about germany"
)

pretty_print_docs(compressed_docs)

Document 1:

Let us remember the lessons of 1918. In the summer of that year the tide turned in favor of the allies. But this Government did not relax. In fact, our national effort was stepped up. In August, 1918, the draft age limits were broadened from 21-31 to 18-45. The President called for "force to the utmost," and his call was heeded. And in November, only three months later, Germany surrendered.

That is the way to fight and win a warâ€”all outâ€”and not with half-an-eye on the battlefronts abroad and the other eye-and-a-half on personal, selfish, or political interests here at home.

Therefore, in order to concentrate all our energies and resources on winning the war, and to maintain a fair and stable economy at home, I recommend that the Congress adopt:


### `EmbeddingsFilter` or `Threshold-Based Filtering`

Making an extra LLM call over each retrieved document is expensive and slow. 

The `EmbeddingsFilter` provides a cheaper and faster option by embedding the documents and query and only returning those documents which have sufficiently similar embeddings to the query.

For the EmbeddingsFilter used in the notebook, the mathematical operation is:
$$
\text{Keep document } d_i \text{ if } \text{similarity}(q, d_i) \geq \theta
$$

Where:

- $q$ is the query embedding  
- $d_i$ is the document embedding  
- $\theta$ is the similarity threshold (0.76 in the example)


In [6]:
from langchain.retrievers.document_compressors import EmbeddingsFilter
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

embeddings_filter = EmbeddingsFilter(embeddings=embeddings, similarity_threshold=0.76)

compression_retriever = ContextualCompressionRetriever(
    base_compressor=embeddings_filter, 
    base_retriever=retriever                         # retriever = FAISS.from_documents(texts, OpenAIEmbeddings()).as_retriever()
)

compressed_docs = compression_retriever.invoke(
    "What did the president say about germany"
)

pretty_print_docs(compressed_docs)

Document 1:

Let us remember the lessons of 1918. In the summer of that year the tide turned in favor of the allies. But this Government did not relax. In fact, our national effort was stepped up. In August, 1918, the draft age limits were broadened from 21-31 to 18-45. The President called for "force to the utmost," and his call was heeded. And in November, only three months later, Germany surrendered.

That is the way to fight and win a warâ€”all outâ€”and not with half-an-eye on the battlefronts abroad and the other eye-and-a-half on personal, selfish, or political interests here at home.

Therefore, in order to concentrate all our energies and resources on winning the war, and to maintain a fair and stable economy at home, I recommend that the Congress adopt:
----------------------------------------------------------------------------------------------------
Document 2:

The foreign policy that we have been followingâ€”the policy that guided us at Moscow, Cairo, and Teheranâ€”is ba

## Stringing compressors and document transformers together
Using the `DocumentCompressorPipeline` we can also easily combine multiple compressors in sequence. Along with compressors we can add `BaseDocumentTransformer`s to our pipeline, which don't perform any contextual compression but simply perform some transformation on a set of documents. For example `TextSplitter`s can be used as document transformers to split documents into smaller pieces, and the `EmbeddingsRedundantFilter` can be used to filter out redundant documents based on embedding similarity between documents.

Below we create a compressor pipeline by first splitting our docs into smaller chunks, then removing redundant documents, and then filtering based on relevance to the query.


### Redundancy Elimination
The ```EmbeddingsRedundantFilter``` uses a similar principle but compares documents to each other:
The filter uses a greedy algorithm to identify redundant documents:

1. Initialize an empty set $R$ of documents to retain
2. For each document $d_i$ in the original set:
    
    - Check if $d_i$ is too similar to any document already in $R$:
       - $\exists d_j \in R \text{ such that } S_{ij} \geq \theta $
       -  If no such $d_j$ exists, add $d_i$ to $R$
  
This can be expressed mathematically as:
$$R=\{ d_j \in D : \forall d_j \in R \text{ added before } d_i,S_{ij} < \theta \}$$

Where:
- $S_{ij} = \text{similarity}(d_i, d_j)$
- $D$ is the set of documents
- $d_i,d_j$ is document embeddings
- $\theta$ is the redundancy (similarity) threshold (typically between 0.8 and 0.95).

In [11]:
from langchain.retrievers.document_compressors import DocumentCompressorPipeline
from langchain_community.document_transformers import EmbeddingsRedundantFilter
from langchain_text_splitters import CharacterTextSplitter

splitter = CharacterTextSplitter(chunk_size=300, chunk_overlap=0, separator=". ")

redundant_filter = EmbeddingsRedundantFilter(embeddings=embeddings, similarity_threshold=0.83)

relevant_filter = EmbeddingsFilter(embeddings=embeddings, similarity_threshold=0.76)

pipeline_compressor = DocumentCompressorPipeline(
    transformers=[splitter, 
                  redundant_filter, 
                  relevant_filter
                 ]
)

In [12]:
compression_retriever = ContextualCompressionRetriever(
    base_compressor=pipeline_compressor, 
    base_retriever=retriever                            # retriever = FAISS.from_documents(texts, OpenAIEmbeddings()).as_retriever()
)

compressed_docs = compression_retriever.invoke(
    "What did the president say about germany?"
)

pretty_print_docs(compressed_docs)

Created a chunk of size 352, which is longer than the specified 300
Created a chunk of size 329, which is longer than the specified 300


Document 1:

The President called for "force to the utmost," and his call was heeded
----------------------------------------------------------------------------------------------------
Document 2:

There had been no previous opportunities for man-to-man discussions which lead to meetings of minds. The result was a peace which was not a peace. That was a mistake which we are not repeating in this war.
----------------------------------------------------------------------------------------------------
Document 3:

Let us remember the lessons of 1918. In the summer of that year the tide turned in favor of the allies. But this Government did not relax. In fact, our national effort was stepped up. In August, 1918, the draft age limits were broadened from 21-31 to 18-45


# Comparison of Compression Retrievers in LangChain

| Retriever | Description | Use Cases | Advantages | Disadvantages | Performance Impact |
|-----------|-------------|-----------|------------|---------------|-------------------|
| **Standard Retriever** (baseline) | Returns documents as-is without compression | Simple retrieval tasks with short documents | • Simple implementation<br>• No additional processing overhead | • Returns irrelevant information<br>• Higher LLM costs<br>• May include irrelevant content | Lower quality for complex queries |
| **LLMChainExtractor** | Uses an LLM to extract only relevant content from each document | • Long documents with sparse relevant information<br>• When precision is critical | • Highly precise extraction<br>• Preserves context<br>• Can reformat content for better readability | • Expensive (requires LLM call per document)<br>• Slower than non-LLM methods<br>• May miss relevant information | High quality but highest cost |
| **LLMChainFilter** | Uses an LLM to decide which documents to keep/discard | • When you need to reduce number of documents<br>• Maintaining complete document context | • Preserves full document content<br>• Simpler and more robust than extraction<br>• Reduces number of documents | • Still requires LLM call per document<br>• Doesn't remove irrelevant content within documents | Medium quality improvement, high cost |
| **LLMListwiseRerank** | Uses zero-shot listwise reranking to prioritize documents | • When document order matters<br>• Prioritizing best documents<br>• High-stakes applications | • Better ranking than traditional methods<br>• Can use more powerful LLMs<br>• Considers documents collectively | • Most expensive option<br>• Requires models with structured output<br>• Complex implementation | Highest quality ranking, highest cost |
| **EmbeddingsFilter** | Filters documents based on embedding similarity to query | • Cost-sensitive applications<br>• Real-time systems<br>• First-pass filtering | • Much faster than LLM-based methods<br>• Significantly cheaper<br>• Scales well | • Less precise than LLM methods<br>• Requires quality embeddings<br>• May miss semantic connections | Good balance of quality and cost |
| **DocumentCompressorPipeline** | Combines multiple compressors and transformers | • Complex retrieval needs<br>• Multi-stage processing<br>• Custom workflows | • Highly customizable<br>• Can optimize for multiple objectives<br>• Combines strengths of different methods | • Complex setup<br>• Requires careful tuning<br>• Performance depends on component selection | Depends on components used |

## Additional Pipeline Components

| Component | Function | Best Used With |
|-----------|----------|---------------|
| **TextSplitter** | Splits documents into smaller chunks | • Long documents<br>• Before redundancy filtering<br>• When granularity matters |
| **EmbeddingsRedundantFilter** | Removes similar documents based on embedding similarity | • After text splitting<br>• Before relevance filtering<br>• When working with repetitive content |

## Selection Guide

1. **Limited Budget**: Start with EmbeddingsFilter for cost-effective filtering
2. **Highest Quality**: Use LLMListwiseRerank with a powerful model
3. **Balance**: Consider a pipeline with TextSplitter → EmbeddingsRedundantFilter → EmbeddingsFilter
4. **Flexible & Custom**: Build a DocumentCompressorPipeline tailored to your specific needs