#  Context Compression

Context compression will allow to send the retrieved document based on a query but instead of sending the entire document it will output a shorter (compressed) version of the document that is smaller and more relevant.

In [2]:
from langchain.vectorstores import Chroma
from langchain.document_loaders import WikipediaLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter

### OpenAI Connection for Embeddings

In [4]:
import os
openai_api_key = os.getenv(key="OPENAI_API_KEY")
embedding_function = OpenAIEmbeddings()

In [23]:
# docs

### Connect to Embed Documents via ChromaDB

In [17]:
db_connection = Chroma(persist_directory='./some_new_mk_ultra',embedding_function=embedding_function)

### Contextual Compression

In [6]:
from langchain.chat_models import ChatOpenAI
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

In [18]:
llm = ChatOpenAI(temperature=0)
compressor = LLMChainExtractor.from_llm(llm)

In [19]:
compressor.llm_chain.prompt.template

'Given the following question and context, extract any part of the context *AS IS* that is relevant to answer the question. If none of the context is relevant return NO_OUTPUT. \n\nRemember, *DO NOT* edit the extracted parts of the context.\n\n> Question: {question}\n> Context:\n>>>\n{context}\n>>>\nExtracted relevant parts:'

In [20]:
compression_retriever = ContextualCompressionRetriever(base_compressor=compressor, base_retriever=db_connection.as_retriever())

In [31]:
docs = db_connection.similarity_search('Who led the presidential commission?')

In [32]:
docs

[Document(page_content='The United States President\'s Commission on CIA Activities within the United States was ordained by President Gerald Ford in 1975 to investigate the activities of the Central Intelligence Agency and other intelligence agencies within the United States. The Presidential Commission was led by Vice President Nelson Rockefeller, from whom it gained the nickname the Rockefeller Commission.\nThe commission was created in response to a December 1974 report in The New York Times that the CIA had conducted illegal domestic activities, including experiments on US citizens, during the 1960s. The commission issued a single report in 1975, touching upon certain CIA abuses including mail opening and surveillance of domestic dissident groups. It also publicized Project MKUltra, a CIA mind control research program.\nSeveral weeks later, committees were established in the House and Senate for a similar purpose. White House Personnel, including future Vice President Dick Cheney,

In [33]:
compressed_docs = compression_retriever.get_relevant_documents("Who led the presidential commission?")

In [34]:
compressed_docs[0].page_content

'The Presidential Commission was led by Vice President Nelson Rockefeller, from whom it gained the nickname the Rockefeller Commission.'

As we can see, context compression allows to obtain a much smaller and relevant answer to our queries instead of returning the entire document like in a similarity search.

In [38]:
compressed_docs[0].metadata["summary"]

'The United States President\'s Commission on CIA Activities within the United States was ordained by President Gerald Ford in 1975 to investigate the activities of the Central Intelligence Agency and other intelligence agencies within the United States. The Presidential Commission was led by Vice President Nelson Rockefeller, from whom it gained the nickname the Rockefeller Commission.\nThe commission was created in response to a December 1974 report in The New York Times that the CIA had conducted illegal domestic activities, including experiments on US citizens, during the 1960s. The commission issued a single report in 1975, touching upon certain CIA abuses including mail opening and surveillance of domestic dissident groups. It also publicized Project MKUltra, a CIA mind control research program.\nSeveral weeks later, committees were established in the House and Senate for a similar purpose. White House Personnel, including future Vice President Dick Cheney, edited the results, ex