# Contextual Compression Retriever

This notebook introduces the concept of DocumentCompressors and the ContextualCompressionRetriever. The core idea is simple: given a specific query, we should be able to return only the documents relevant to that query, and only the parts of those documents that are relevant. The ContextualCompressionsRetriever is a wrapper for another retriever that iterates over the initial output of the base retriever and filters and compresses those initial documents, so that only the most relevant information is returned.

In [2]:
# Helper function for printing docs

def pretty_print_docs(docs):
    print(f"\n{'-' * 100}\n".join([f"Document {i+1}:\n\n" +
          d.page_content for i, d in enumerate(docs)]))

## Using a vanilla vector store retriever
Let's start by initializing a simple vector store retriever and storing the 2023 State of the Union speech (in chunks). We can see that given an example question our retriever returns one or two relevant docs and a few irrelevant docs. And even the relevant docs have a lot of irrelevant information in them.

In [3]:
from langchain.embeddings import HuggingFaceInstructEmbeddings
import pickle

def get_retriever(k: int = 3):
    embedding = HuggingFaceInstructEmbeddings(
        model_name="hkunlp/instructor-large")
    with open("faiss_store.pkl", "rb") as f:
        vector_store = pickle.load(f) 
    retriever = vector_store.as_retriever(search_kwargs={"k": k})
    return retriever

retriever = get_retriever()
docs = retriever.get_relevant_documents("What is Flutter?")
pretty_print_docs(docs)

load INSTRUCTOR_Transformer
max_seq_length  512
Document 1:

Flutter is a framework for building cross-platform applications
that uses the Dart programming language.
To understand some differences between programming with Dart
and programming with Javascript, 
see Learning Dart as a JavaScript Developer.
----------------------------------------------------------------------------------------------------
Document 2:

Flutter is a framework for building cross-platform applications
that uses the Dart programming language.
To understand some differences between programming with Dart
and programming with Swift, see Learning Dart as a Swift Developer
and Flutter concurrency for Swift developers.
----------------------------------------------------------------------------------------------------
Document 3:

Flutter is a multi-paradigm programming environment.
Many programming techniques developed over the past few decades
are used in Flutter. We use each one where we believe
the strengths of

## Adding contextual compression with an `LLMChainExtractor`
Now let's wrap our base retriever with a `ContextualCompressionRetriever`. We'll add an `LLMChainExtractor`, which will iterate over the initially returned documents and extract from each only the content that is relevant to the query.

In [11]:
from llms import GPT4AllJApi
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor
from dotenv import load_dotenv
load_dotenv()

llm = GPT4AllJApi()
compressor = LLMChainExtractor.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(base_compressor=compressor, base_retriever=retriever)

compressed_docs = compression_retriever.get_relevant_documents("What is Flutter?")
pretty_print_docs(compressed_docs)

data {"prompt": "Given the following question and context, extract any part of the context *AS IS* that is relevant to answer the question. If none of the context is relevant return NO_OUTPUT. \n\nRemember, *DO NOT* edit the extracted parts of the context.\n\n> Question: What is Flutter?\n> Context:\n>>>\nFlutter is a framework for building cross-platform applications\nthat uses the Dart programming language.\nTo understand some differences between programming with Dart\nand programming with Javascript, \nsee Learning Dart as a JavaScript Developer.\n>>>\nExtracted relevant parts:", "params": {"seed": -1, "n_threads": -1, "n_predict": 200, "top_k": 40, "top_p": 0.9, "temperature": 0.9, "repeat_penalty": 1, "repeat_last_n": 64, "n_batch": 8}}


## More built-in compressors: filters
### `LLMChainFilter`
The `LLMChainFilter` is slightly simpler but more robust compressor that uses an LLM chain to decide which of the initially retrieved documents to filter out and which ones to return, without manipulating the document contents.

In [10]:
from langchain.retrievers.document_compressors import LLMChainFilter

_filter = LLMChainFilter.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(base_compressor=_filter, base_retriever=retriever)

compressed_docs = compression_retriever.get_relevant_documents("What is Flutter?")
pretty_print_docs(compressed_docs)

data {"prompt": "Given the following question and context, return YES if the context is relevant to the question and NO if it isn't.\n\n> Question: What is Flutter?\n> Context:\n>>>\nFlutter is a framework for building cross-platform applications\nthat uses the Dart programming language.\nTo understand some differences between programming with Dart\nand programming with Javascript, \nsee Learning Dart as a JavaScript Developer.\n>>>\n> Relevant (YES / NO):", "params": {"seed": -1, "n_threads": -1, "n_predict": 200, "top_k": 40, "top_p": 0.9, "temperature": 0.9, "repeat_penalty": 1, "repeat_last_n": 64, "n_batch": 8}}


ValueError: BooleanOutputParser expected output value to either be YES or NO. Received YES
Yes, the context provided in the conversation is relevant to the question about Fluter. Fluter is a framework for building cross-platform applications that uses the Dart programming language, which is a higher-level programming language that is often used for building mobile applications. Understanding differences between programming with Dart and JavaScript can provide insight into the advantages of using Dart for building cross-platform applications..

### `EmbeddingsFilter`

Making an extra LLM call over each retrieved document is expensive and slow. The `EmbeddingsFilter` provides a cheaper and faster option by embedding the documents and query and only returning those documents which have sufficiently similar embeddings to the query.

In [6]:
from langchain.retrievers.document_compressors import EmbeddingsFilter

embeddings = HuggingFaceInstructEmbeddings(
        model_name="hkunlp/instructor-large")
embeddings_filter = EmbeddingsFilter(embeddings=embeddings, similarity_threshold=0.76)
compression_retriever = ContextualCompressionRetriever(base_compressor=embeddings_filter, base_retriever=retriever)

compressed_docs = compression_retriever.get_relevant_documents("What is Flutter?")
pretty_print_docs(compressed_docs)

load INSTRUCTOR_Transformer
max_seq_length  512
Document 1:

Flutter is a framework for building cross-platform applications
that uses the Dart programming language.
To understand some differences between programming with Dart
and programming with Javascript, 
see Learning Dart as a JavaScript Developer.
----------------------------------------------------------------------------------------------------
Document 2:

Flutter is a framework for building cross-platform applications
that uses the Dart programming language.
To understand some differences between programming with Dart
and programming with Swift, see Learning Dart as a Swift Developer
and Flutter concurrency for Swift developers.
----------------------------------------------------------------------------------------------------
Document 3:

Flutter is a multi-paradigm programming environment.
Many programming techniques developed over the past few decades
are used in Flutter. We use each one where we believe
the strengths of

# Stringing compressors and document transformers together
Using the `DocumentCompressorPipeline` we can also easily combine multiple compressors in sequence. Along with compressors we can add `BaseDocumentTransformer`s to our pipeline, which don't perform any contextual compression but simply perform some transformation on a set of documents. For example `TextSplitter`s can be used as document transformers to split documents into smaller pieces, and the `EmbeddingsRedundantFilter` can be used to filter out redundant documents based on embedding similarity between documents.

Below we create a compressor pipeline by first splitting our docs into smaller chunks, then removing redundant documents, and then filtering based on relevance to the query.

In [7]:
from langchain.document_transformers import EmbeddingsRedundantFilter
from langchain.retrievers.document_compressors import DocumentCompressorPipeline
from langchain.text_splitter import CharacterTextSplitter

splitter = CharacterTextSplitter(chunk_size=300, chunk_overlap=0, separator=". ")
redundant_filter = EmbeddingsRedundantFilter(embeddings=embeddings)
relevant_filter = EmbeddingsFilter(embeddings=embeddings, similarity_threshold=0.76)
pipeline_compressor = DocumentCompressorPipeline(
    transformers=[splitter, redundant_filter, relevant_filter]
)

In [8]:
compression_retriever = ContextualCompressionRetriever(base_compressor=pipeline_compressor, base_retriever=retriever)

compressed_docs = compression_retriever.get_relevant_documents("What is Flutter?")
pretty_print_docs(compressed_docs)

Document 1:

Flutter is a multi-paradigm programming environment.
Many programming techniques developed over the past few decades
are used in Flutter. We use each one where we believe
the strengths of the technique make it particularly well-suited.
In no particular order:
