The notebook is origined from https://python.langchain.com/en/latest/_sources/modules/indexes/retrievers/examples/contextual-compression.ipynb, I adapted it to my own use case.

# Contextual Compression Retriever

This notebook introduces the concept of DocumentCompressors and the ContextualCompressionRetriever. 

The core idea is simple, given a specific query, we should be able to:
1) Return only the documents relevant to that query, and 
2) only the parts of those documents that are relevant. 

The ContextualCompressionsRetriever is a wrapper for another retriever that iterates over the initial output of the base retriever and filters and compresses those initial documents, so that only the most relevant information is returned.

In [1]:
# Helper function for printing docs

def pretty_print_docs(docs):
    print(f"\n{'-' * 100}\n".join([f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]))

## Using a vanilla vector store retriever
Let's start by initializing a simple vector store retriever and storing the 2023 State of the Union speech (in chunks). We can see that given an example question our retriever returns one or two relevant docs and a few irrelevant docs. And even the relevant docs have a lot of irrelevant information in them.

In [9]:
import pickle

with open("faiss_store.pkl", "rb") as f:
    vector_store = pickle.load(f)
retriever = vector_store.as_retriever(search_kwargs={"k": 4})
prompt = "What is Flutter?"
docs = retriever.get_relevant_documents(prompt)
pretty_print_docs(docs)

Document 1:

Performance FAQ

What is Flutter?

Flutter is Google’s portable UI toolkit for crafting beautiful,
natively compiled applications for mobile, web,
and desktop from a single codebase.
Flutter works with existing code,
is used by developers and organizations around
the world, and is free and open source.

Who is Flutter for?

For users, Flutter makes beautiful apps come to life.

For developers, Flutter lowers the bar to entry for building apps.
It speeds app development and reduces the cost and complexity
of app production across platforms.

For designers, Flutter provides a canvas for
high-end user experiences. Fast Company described
Flutter as one of the top design ideas of the decade for
its ability to turn concepts into production code
without the compromises imposed by typical frameworks.
It also acts as a productive prototyping tool,
with CodePen support for sharing your ideas with others.
-------------------------------------------------------------------------------

## Adding contextual compression with an `LLMChainExtractor`
Now let's wrap our base retriever with a `ContextualCompressionRetriever`. We'll add an `LLMChainExtractor`, which will iterate over the initially returned documents and extract from each only the content that is relevant to the query.

In [10]:
from langchain.llms import HuggingFacePipeline
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, pipeline
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-base")
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-base")

def get_llm(
    min_length: int = 20,
    max_length: int = 200,
    temperature: float = 0.0,
    top_p: float = 1.0,
    top_k: int = 50
):
    hf_pipeline = pipeline(
        "text2text-generation",
        model=model,
        tokenizer=tokenizer,
        min_length=min_length,
        max_length=max_length,
        temperature=temperature,
        top_k=top_k,
        top_p=top_p
    )
    llm = HuggingFacePipeline(pipeline=hf_pipeline)
    return llm

llm = get_llm()
compressor = LLMChainExtractor.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(base_compressor=compressor, base_retriever=retriever)

compressed_docs = compression_retriever.get_relevant_documents(prompt)
pretty_print_docs(compressed_docs)



Document 1:

Flutter is Google’s portable UI toolkit for crafting beautiful, natively compiled applications for mobile, web, and desktop from a single codebase. Flutter works with existing code, is used by developers and organizations around the world, and is free and open source. Flutter works with existing code, is used by developers and organizations around the world, and is free and open source. Flutter works with existing code, is used by developers and organizations around the world, and is free and open source. Flutter works with existing code, is used by developers and organizations around the world, and is free and open source. Flutter works with existing code, is used by developers and organizations around the world, and is free and open source. Flutter works with existing code, is used by developers and organizations around the world, and is free and open source. Flutter works with existing code, is used by developers and organizations around the world, and is free and
-----

## More built-in compressors: filters
### `LLMChainFilter`
The `LLMChainFilter` is slightly simpler but more robust compressor that uses an LLM chain to decide which of the initially retrieved documents to filter out and which ones to return, without manipulating the document contents.

In [11]:
from langchain.retrievers.document_compressors import LLMChainFilter

_filter = LLMChainFilter.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(base_compressor=_filter, base_retriever=retriever)

compressed_docs = compression_retriever.get_relevant_documents(prompt)
pretty_print_docs(compressed_docs)



ValueError: BooleanOutputParser expected output value to either be YES or NO. Received Yes. > Yes. > Flutter is Google’s portable UI toolkit for crafting beautiful, natively compiled applications for mobile, web, and desktop from a single codebase. > Flutter works with existing code, is used by developers and organizations around the world, and is free and open source. > Flutter is Google’s portable UI toolkit for crafting beautiful, natively compiled applications for mobile, web, and desktop from a single codebase. > Flutter works with existing code, is used by developers and organizations around the world, and is free and open source. > Flutter is Google’s portable UI toolkit for crafting beautiful, natively compiled applications for mobile, web, and desktop from a single codebase. > Flutter is Google’s portable UI toolkit for crafting beautiful, natively compiled applications for mobile, web, and desktop from a single codebase..

### `EmbeddingsFilter`

Making an extra LLM call over each retrieved document is expensive and slow. The `EmbeddingsFilter` provides a cheaper and faster option by embedding the documents and query and only returning those documents which have sufficiently similar embeddings to the query.

In [13]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.retrievers.document_compressors import EmbeddingsFilter

embeddings = HuggingFaceEmbeddings()
embeddings_filter = EmbeddingsFilter(embeddings=embeddings, similarity_threshold=0.76)
compression_retriever = ContextualCompressionRetriever(base_compressor=embeddings_filter, base_retriever=retriever)

compressed_docs = compression_retriever.get_relevant_documents(prompt)
pretty_print_docs(compressed_docs)

Document 1:

Performance FAQ

What is Flutter?

Flutter is Google’s portable UI toolkit for crafting beautiful,
natively compiled applications for mobile, web,
and desktop from a single codebase.
Flutter works with existing code,
is used by developers and organizations around
the world, and is free and open source.

Who is Flutter for?

For users, Flutter makes beautiful apps come to life.

For developers, Flutter lowers the bar to entry for building apps.
It speeds app development and reduces the cost and complexity
of app production across platforms.

For designers, Flutter provides a canvas for
high-end user experiences. Fast Company described
Flutter as one of the top design ideas of the decade for
its ability to turn concepts into production code
without the compromises imposed by typical frameworks.
It also acts as a productive prototyping tool,
with CodePen support for sharing your ideas with others.


# Stringing compressors and document transformers together
Using the `DocumentCompressorPipeline` we can also easily combine multiple compressors in sequence. Along with compressors we can add `BaseDocumentTransformer`s to our pipeline, which don't perform any contextual compression but simply perform some transformation on a set of documents. For example `TextSplitter`s can be used as document transformers to split documents into smaller pieces, and the `EmbeddingsRedundantFilter` can be used to filter out redundant documents based on embedding similarity between documents.

Below we create a compressor pipeline by first splitting our docs into smaller chunks, then removing redundant documents, and then filtering based on relevance to the query.

In [20]:
from langchain.document_transformers import EmbeddingsRedundantFilter
from langchain.retrievers.document_compressors import DocumentCompressorPipeline
from langchain.text_splitter import CharacterTextSplitter

splitter = CharacterTextSplitter(chunk_size=300, chunk_overlap=0, separator=". ")
redundant_filter = EmbeddingsRedundantFilter(embeddings=embeddings)
relevant_filter = EmbeddingsFilter(embeddings=embeddings, similarity_threshold=0.76)
pipeline_compressor = DocumentCompressorPipeline(
    transformers=[splitter, redundant_filter, relevant_filter]
)

In [21]:
compression_retriever = ContextualCompressionRetriever(base_compressor=pipeline_compressor, base_retriever=retriever)

compressed_docs = compression_retriever.get_relevant_documents(prompt)
pretty_print_docs(compressed_docs)




