<a href = "https://www.pieriantraining.com"><img src="../PT Centered Purple.png"> </a>

<em style="text-align:center">Copyrighted by Pierian Training</em>

#  Context Compression

"Compression" is a bit of a strange term to use here, but the basic gist is that using another LLM call we can ask the model to extract only the relevant information from a vector doc lookup call. 

Let's revisit our last example:

In [1]:
# Build a sample vectorDB
from langchain.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

In [2]:
import os
from dotenv import load_dotenv
load_dotenv()

openai_token = os.getenv("OPENAI_API_KEY")

### OpenAI Connection for Embeddings

In [3]:
# We need to have OPENAI_API_KEY in the environment
embedding_function = OpenAIEmbeddings()

In [4]:
# docs

### Connect to Embed Documents via ChromaDB

In [5]:
# Load/connect DB
db_connection = Chroma(
    persist_directory='./mk_ultra',
    embedding_function=embedding_function
)

### Contextual Compression

In [6]:
from langchain_openai import ChatOpenAI
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

In [7]:
# We need an LLM chat
llm = ChatOpenAI(temperature=0)
# We create the compressor with the chat
compressor = LLMChainExtractor.from_llm(llm)

In [8]:
# The compression retriever consists of
# - a base compressor (from an LLM/chat)
# - a retriever (from a DB)
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, 
    base_retriever=db_connection.as_retriever()
)

In [11]:
# Similarity search in the DB
docs = db_connection.similarity_search('When was this declassified?')

In [None]:
docs[0] # Document(page_content='The United States President

In [13]:
# Here, we use the compression retriever to summarize the document:
# - Documents are retrieved according to the query
# - Results are sent to the LLM to summarize
# - The summarized results are presented
compressed_docs = compression_retriever.invoke("When was this declassified?")

In [None]:
compressed_docs[0].page_contents