### Contextual Compression Filtering Retriever

Contextual Compression Filtering is a technique used in natural language processing (NLP) and information retrieval systems to improve the efficiency and relevance of retrieved information by filtering out less relevant content based on the context of the query. The primary goal is to compress or reduce the amount of information that needs to be processed or retrieved, focusing only on the most relevant portions of the data.

How It Works:
- The system first understands the context of the query or task. This context can be derived from the query itself, user preferences, or even the broader application scenario.
For example, if a user is asking about "climate change impact on agriculture," the system recognizes that the context involves the intersection of climate change and agriculture.

- The system may initially retrieve a large set of documents or data chunks that are potentially relevant to the query. This is done using traditional retrieval methods like BM25, FAISS, or a similar technique.
At this stage, the retrieval might include a lot of extraneous information that is not directly relevant to the specific context.

- Contextual Compression Filtering then applies additional filtering based on the context of the query. This step involves analyzing the retrieved data to determine which parts are most relevant to the query context.
Techniques like semantic similarity, attention mechanisms, or even more advanced methods like transformers may be used to score and filter the content.

- After filtering, the system compresses the data by discarding or deprioritizing irrelevant information. Only the most contextually relevant data is retained for further processing or presentation.
The compression can be literal (reducing the size of the data) or conceptual (focusing only on key concepts or sentences).
Final Output:

In [7]:
from langchain.vectorstores import FAISS

from langchain.schema import Document

## Text Splitting & Docloader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import TextLoader

from langchain.embeddings import HuggingFaceBgeEmbeddings

In [2]:
model_name = "BAAI/bge-small-en-v1.5"
encode_kwargs = {'normalize_embeddings': True} # set True to compute cosine similarity

bge_embeddings = HuggingFaceBgeEmbeddings(
    model_name=model_name,
    model_kwargs={'device': 'cuda'},
    encode_kwargs=encode_kwargs
)

  from tqdm.autonotebook import tqdm, trange


In [3]:
loaders = [
    TextLoader('/home/heliya/Desktop/rag_approaches/src/rag_approaches/dataset/blog_post/blog.langchain.dev_announcing-langsmith_.txt'),
    TextLoader('/home/heliya/Desktop/rag_approaches/src/rag_approaches/dataset/blog_post/blog.langchain.dev_benchmarking-question-answering-over-csv-data_.txt'),
]
docs = []
for l in loaders:
    docs.extend(l.load())

In [4]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
texts = text_splitter.split_documents(docs)

In [5]:
# Helper function for printing docs

def pretty_print_docs(docs):
    print(f"\n{'-' * 100}\n".join([f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]))

In [6]:
retriever = FAISS.from_documents(texts,
                                 bge_embeddings
                                #  OpenAIEmbeddings()
                                 ).as_retriever()

docs = retriever.get_relevant_documents("What is LangSmith?")
#lets look at the docs
pretty_print_docs(docs)

Document 1:

“Because we are building financial products, the bar for accuracy, personalization, and security is particularly high. LangSmith helps us build products we are confident putting in front of users.”

We can’t wait to bring these benefits to more teams. And we’ve got a long list of features on the roadmap like analytics, playgrounds, collaboration, in-context learning, prompt creation, and more.
----------------------------------------------------------------------------------------------------
Document 2:

URL: https://blog.langchain.dev/announcing-langsmith/
Title: Announcing LangSmith, a unified platform for debugging, testing, evaluating, and monitoring your LLM applications

LangChain exists to make it as easy as possible to develop LLM-powered applications.

We started with an open-source Python package when the main blocker for building LLM-powered applications was getting a simple prototype working. We remember seeing Nat Friedman tweet in late 2022 that there was “n

  warn_deprecated(


#### Adding contextual compression with an LLMChainExtractor

In [8]:
from langchain.llms import OpenAI
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

# making the compressor
llm = OpenAI(temperature=0)
compressor = LLMChainExtractor.from_llm(llm)

# it needs a base retriever (we're using FAISS Retriever) and a compressor (Made above)
compression_retriever = ContextualCompressionRetriever(base_compressor=compressor,
                                                       base_retriever=retriever)

  warn_deprecated(


In [9]:
# compressor prompt for illustration
compressor.llm_chain.prompt

PromptTemplate(input_variables=['context', 'question'], output_parser=NoOutputParser(), template='Given the following question and context, extract any part of the context *AS IS* that is relevant to answer the question. If none of the context is relevant return NO_OUTPUT. \n\nRemember, *DO NOT* edit the extracted parts of the context.\n\n> Question: {question}\n> Context:\n>>>\n{context}\n>>>\nExtracted relevant parts:')

In [10]:
compressed_docs = compression_retriever.get_relevant_documents("What is LangSmith?")
pretty_print_docs(compressed_docs)

Document 1:

LangSmith helps us build products we are confident putting in front of users.
----------------------------------------------------------------------------------------------------
Document 2:

LangChain exists to make it as easy as possible to develop LLM-powered applications.
----------------------------------------------------------------------------------------------------
Document 3:

LangSmith gives you full visibility into model inputs and output of every step in the chain of events.
----------------------------------------------------------------------------------------------------
Document 4:

LangSmith is a platform to help developers close the gap between prototype and production. It's designed for building and iterating on products that can harness the power and wrangle the complexity of LLMs. LangSmith is now in closed beta.


#### EmbeddingsFilter

In [12]:
from langchain.embeddings import OpenAIEmbeddings
from langchain.retrievers.document_compressors import EmbeddingsFilter

embeddings = OpenAIEmbeddings()
embeddings_filter = EmbeddingsFilter(embeddings=embeddings, similarity_threshold=0.70)
compression_retriever = ContextualCompressionRetriever(base_compressor=embeddings_filter, base_retriever=retriever)

compressed_docs = compression_retriever.get_relevant_documents("What is LangSmith")
pretty_print_docs(compressed_docs)

Document 1:

URL: https://blog.langchain.dev/announcing-langsmith/
Title: Announcing LangSmith, a unified platform for debugging, testing, evaluating, and monitoring your LLM applications

LangChain exists to make it as easy as possible to develop LLM-powered applications.

We started with an open-source Python package when the main blocker for building LLM-powered applications was getting a simple prototype working. We remember seeing Nat Friedman tweet in late 2022 that there was “not enough tinkering happening.” The LangChain open-source packages are aimed at addressing this and we see lots of tinkering happening now (Nat agrees)–people are building everything from chatbots over internal company documents to an AI dungeon master for a Dungeons and Dragons game.
----------------------------------------------------------------------------------------------------
Document 2:

Boston Consulting Group also built a highly-customized, and highly performant, series of applications on top of

Pipeline Example

In [13]:
from langchain.document_transformers import EmbeddingsRedundantFilter
from langchain.retrievers.document_compressors import DocumentCompressorPipeline
from langchain.text_splitter import CharacterTextSplitter

splitter = CharacterTextSplitter(chunk_size=400, chunk_overlap=0, separator=".")

redundant_filter = EmbeddingsRedundantFilter(embeddings=embeddings)
relevant_filter = EmbeddingsFilter(embeddings=embeddings, similarity_threshold=0.70)

## making the pipeline
pipeline_compressor = DocumentCompressorPipeline(
    transformers=[splitter, redundant_filter, relevant_filter]
)

In [14]:
compression_retriever = ContextualCompressionRetriever(base_compressor=pipeline_compressor,
                                                       base_retriever=retriever)

compressed_docs = compression_retriever.get_relevant_documents("What is LangSmith")
pretty_print_docs(compressed_docs)

Document 1:

URL: https://blog.langchain.dev/announcing-langsmith/
Title: Announcing LangSmith, a unified platform for debugging, testing, evaluating, and monitoring your LLM applications

LangChain exists to make it as easy as possible to develop LLM-powered applications
----------------------------------------------------------------------------------------------------
Document 2:

“The use of LangSmith has been key to bringing production-ready LLM applications to our clients. LangSmith's ease of integration and intuitive UI enabled us to have an evaluation pipeline up and running very quickly
----------------------------------------------------------------------------------------------------
Document 3:

“Because we are building financial products, the bar for accuracy, personalization, and security is particularly high. LangSmith helps us build products we are confident putting in front of users.”

We can’t wait to bring these benefits to more teams. And we’ve got a long list of fe

Text Splitting:CharacterTextSplitter is used to split the original documents into smaller chunks of 400 characters each, with no overlap and using a period (.) as the separator.

Why: This step breaks down large documents into more manageable, smaller pieces (chunks), which can be processed individually. This is particularly useful when dealing with long documents that need to be searched more effectively.

Redundant Filtering:EmbeddingsRedundantFilter is applied to the chunks. It uses embeddings to identify and remove chunks that are very similar to each other (redundant).

Why: The goal is to eliminate duplicate or highly similar content to ensure diversity in the retrieved information. This helps in avoiding redundancy and improving the efficiency of the retrieval process.

Relevance Filtering:EmbeddingsFilter is applied next with a similarity threshold of 0.70. It filters out chunks that are not sufficiently similar to the query.

Why: This step ensures that only the chunks most relevant to the query are retained, based on their semantic similarity to the query. It refines the retrieval to focus on the most pertinent information.
Pipeline Creation:

DocumentCompressorPipeline is created by chaining together the splitter, redundant filter, and relevance filter.

Why: The pipeline allows these operations to be applied sequentially to each document, automating the process of chunking, filtering out redundancies, and ensuring relevance in one streamlined process.

Contextual Compression Retriever:ContextualCompressionRetriever is created by combining the pipeline_compressor with a base_retriever.

Why: This retriever uses the compressor pipeline to preprocess documents before retrieval, ensuring that only the most relevant and non-redundant information is retrieved based on the query.