# 3.3 Contextual compression


## Setup

### Install dependencies

In [None]:
%pip install python-dotenv~=1.0 docarray~=0.40.0 pypdf~=5.1 --upgrade --quiet
%pip install chromadb~=0.5.18 sentence-transformers~=3.3 --upgrade --quiet 
%pip install langchain~=0.3.7 langchain_openai~=0.2.6 langchain_community~=0.3.5 langchain-chroma~=0.1.4 langchainhub~=0.1.21 --upgrade --quiet

# If running locally, you can do this instead:
#%pip install -r ../requirements.txt

### Load environment variables

In [None]:
import os
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())

# If running in Google Colab, you can use this code instead:
# from google.colab import userdata
# os.environ["AZURE_OPENAI_API_KEY"] = userdata.get("AZURE_OPENAI_API_KEY")
# os.environ["AZURE_OPENAI_ENDPOINT"] = userdata.get("AZURE_OPENAI_ENDPOINT")

### Setup models

In [None]:
from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings
api_version = "2024-10-01-preview"
llm = AzureChatOpenAI(deployment_name="gpt-4o", temperature=0.0, openai_api_version=api_version)
embedding_model = AzureOpenAIEmbeddings(model="text-embedding-3-large", openai_api_version=api_version)

### Setup path to data 

In [None]:
data_path = "../data"

### Setup LangSmith tracing for this notebook

In [None]:
import os

# API key etc is in the .env file
# my_name = "Totoro"
# os.environ["LANGCHAIN_TRACING_V2"] = "true"
# os.environ["LANGCHAIN_PROJECT"] = f"tokyo24-test-{my_name}"

### Let's setup our vectorDB as before
Load ML sample docs and setup Vector DB

In [None]:
# Load PDFs
from langchain.document_loaders import PyPDFLoader
loaders = [
    PyPDFLoader(f"{data_path}/MachineLearning-Lecture01.pdf"),
    PyPDFLoader(f"{data_path}/MachineLearning-Lecture01.pdf"),
    PyPDFLoader(f"{data_path}/MachineLearning-Lecture03.pdf")
]
docs = []
for loader in loaders:
    docs.extend(loader.load())

# Split
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1500,
    chunk_overlap = 150
)
splits = text_splitter.split_documents(docs)

# Setup vector DB
from langchain.vectorstores import Chroma
persist_directory = './db/chroma-ML-docs/'
vectordb = Chroma.from_documents(
    collection_name="ml_docs",
    documents=splits,
    embedding=embedding_model,
    #persist_directory=persist_directory # Optionally persist the database
)

print(vectordb._collection.count())

## What is compression?

Another approach for improving the quality of retrieved docs is compression.

Information most relevant to a query may be buried in a document with a lot of irrelevant text. 

Passing that full document through your application can lead to more expensive LLM calls and poorer responses.

Contextual compression is meant to fix this. We start by looking at a solution that uses an **LLM** for extracting content relevant to the query: [**LLMChainExtractor**](https://python.langchain.com/docs/how_to/contextual_compression/#adding-contextual-compression-with-an-llmchainextractor) 

In [None]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

In [None]:
# Just making output a bit nicer
def pretty_print_docs(docs):
    print(f"\n{'-' * 100}\n".join([f"Document {i+1} ({len(d.page_content)}):\n\n" + d.page_content for i, d in enumerate(docs)]))

In [None]:
# Wrap our vectorstore
compressor = LLMChainExtractor.from_llm(llm)

compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=vectordb.as_retriever()
)

In [None]:
question = "what did they say about matlab?"
compressed_docs = compression_retriever.invoke(question)
pretty_print_docs(compressed_docs)
print(f"No. of docs used: {len(compressed_docs)}")

### EXERCISE - try another compressors (see link below)
https://python.langchain.com/docs/how_to/contextual_compression/#more-built-in-compressors-filters

#### Try for instance:
 - [**LLMChainFilter**](https://python.langchain.com/docs/how_to/contextual_compression/#llmchainfilter) - slightly simpler but more robust LLM-based solution
 - [**LLMListwiseRerank**](https://python.langchain.com/docs/how_to/contextual_compression/#llmlistwisererank) - using a zero-shot listwise document reranking proposed in this [paper](https://arxiv.org/pdf/2305.02156) 
 - [**EmbeddingsFilter**](https://python.langchain.com/docs/how_to/contextual_compression/#embeddingsfilter) - using embeddings for faster/cheaper results

## Combining various techniques

Combining compression and MMR can lead to even better results.

In [None]:
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=vectordb.as_retriever(search_type = "mmr")
)

In [None]:
question = "what did they say about matlab?"
compressed_docs = compression_retriever.invoke(question)
pretty_print_docs(compressed_docs)
print(f"No. of docs used: {len(compressed_docs)}")