# Contextual Retrieval With Llama Index

This notebook covers contextual retrieval with llama_index DocumentContextExtractor

Based on an Anthropic [blost post](https://www.anthropic.com/news/contextual-retrieval), the concept is to:
1. Use an LLM to generate a 'context' for each chunk based on the entire document
2. embed the chunk + context together
3. reap the benefits of higher RAG accuracy

While you can also do this manually, the DocumentContextExtractor offers a lot of convenience and error handling, plus you can integrate it into your llama_index pipelines! Let's get started.

NOTE: This notebook costs about $0.02 everytime you run it.

# Install Packages

In [None]:
%pip install llama-index
%pip install llama-index-readers-file
%pip install llama-index-embeddings-huggingface
%pip install llama-index-llms-openai

Obtaining file:///C:/users/cklap/llama_index/llama-index-core
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Checking if build backend supports build_editable: started
  Checking if build backend supports build_editable: finished with status 'done'
  Getting requirements to build editable: started
  Getting requirements to build editable: finished with status 'done'
  Preparing editable metadata (pyproject.toml): started
  Preparing editable metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: llama-index-core
  Building editable for llama-index-core (pyproject.toml): started
  Building editable for llama-index-core (pyproject.toml): finished with status 'done'
  Created wheel for llama-index-core: filename=llama_index_core-0.12.9-py3-none-any.whl size=2861 sha256=4cfeb1883041b485e67212c7f5068451dc09f3ad6eb609adc9dd4ede00495cdf
  Stored in directory: C:\Users\cklap\AppData\Local\Temp\pip-

# Setup an LLM
You can use the MockLLM or you can use a real LLM of your choice here. flash 2 and gpt-4o-mini work well.

In [None]:
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings

OPENAI_API_KEY = ""
llm = OpenAI(model="gpt-4o-mini", api_key=OPENAI_API_KEY)
Settings.llm = llm

 # Setup a data pipeline

 we'll need an embedding model, an index store, a vectore store, and a way to split tokens.

### Build Pipeline & Index

In [None]:
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.core.node_parser import TokenTextSplitter
from llama_index.core.storage.docstore.simple_docstore import (
    SimpleDocumentStore,
)
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

# Initialize document store and embedding model
docstore = SimpleDocumentStore()
embed_model = HuggingFaceEmbedding(model_name="all-MiniLM-L6-v2")

# Create storage contexts
storage_context = StorageContext.from_defaults(docstore=docstore)
storage_context_no_extra_context = StorageContext.from_defaults(
    docstore=docstore
)
text_splitter = TokenTextSplitter(
    separator=" ", chunk_size=128, chunk_overlap=10
)

#### DocumentContextExtractor

In [None]:
# This is the new part!

from llama_index.core.extractors import DocumentContextExtractor

context_extractor = DocumentContextExtractor(
    # these 2 are mandatory
    docstore=docstore,
    max_context_length=128000,
    # below are optional
    llm=llm,  # default to Settings.llm
    oversized_document_strategy="warn",
    max_output_tokens=100,
    key="context",
    prompt=DocumentContextExtractor.SUCCINCT_CONTEXT_PROMPT,
)

#### Build Index

In [None]:
import nest_asyncio

nest_asyncio.apply()

index = VectorStoreIndex.from_documents(
    documents=[],
    storage_context=storage_context,
    embed_model=embed_model,
    transformations=[text_splitter, context_extractor],
)

import nest_asyncio

nest_asyncio.apply()

index_nocontext = VectorStoreIndex.from_documents(
    documents=[],
    storage_context=storage_context_no_extra_context,
    embed_model=embed_model,
    transformations=[text_splitter],
)

0it [00:00, ?it/s]


# Load Data

In [None]:
!wget "https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay_ambiguated.txt" -O "paul_graham_essay_ambiguated.txt"

In [None]:
from llama_index.core import SimpleDirectoryReader

reader = SimpleDirectoryReader(
    input_files=["paul_graham_essay_ambiguated.txt"]
)
documents = reader.load_data()

# Run the pipeline, then search

In [None]:
import nest_asyncio

nest_asyncio.apply()

# have to keep this updated for the DocumentContextExtractor to function.
# everytime we insert a doc the entire pipeline will run and context will be generated
storage_context.docstore.add_documents(documents)
for doc in documents:
    index.insert(doc)

storage_context_no_extra_context.docstore.add_documents(documents)
for doc in documents:
    index_nocontext.insert(doc)

100%|██████████| 32/32 [00:10<00:00,  2.94it/s]


In [None]:
test_question = "Which chunks of text discuss the IBM 704?"
retriever = index.as_retriever(similarity_top_k=5)
nodes_fromcontext = retriever.retrieve(test_question)

retriever_nocontext = index_nocontext.as_retriever(similarity_top_k=5)
nodes_nocontext = retriever_nocontext.retrieve(test_question)

# Verify all jobs were sucessfully completed
assert all(node.metadata.get("context") for node in nodes_fromcontext)

In [None]:
# Print each node's content
print("==========")
print("NO CONTEXT")
for i, node in enumerate(nodes_nocontext, 1):
    print(f"\nChunk {i}:")
    print(f"Score: {node.score}")  # Similarity score
    print(f"Content: {node.node.text}")  # The actual text content

# Print each node's content
print("==========")
print("WITH CONTEXT")
for i, node in enumerate(nodes_fromcontext, 1):
    print(f"\nChunk {i}:")
    print(f"Score: {node.score}")  # Similarity score
    print(f"Content: {node.node.text}")  # The actual text content

NO CONTEXT

Chunk 1:
Score: 0.47188236572598097
Content: McCarthy's interpreter into IBM 704 machine language, and from then on Lisp also became a programming language in the conventional sense. But its origins as a model of computation gave it a power and elegance that other languages couldn't match. This quality was what attracted me in college, though I didn't understand why at the time.
McCarthy's 1960 version did nothing more than interpret Lisp expressions. It was missing many features you'd want in a programming language. So these had to be added, and when they were, they weren't defined using his original

Chunk 2:
Score: 0.3765542523138754
Content: is a general lesson here that our experience with Y Combinator also teaches: Customs continue to constrain you long after the restrictions that caused them have disappeared. Customary VC practice had once, like the customs about publishing essays, been based on real constraints. Startups had once been much more expensive to start, a

The no-context retriever does well at getting the exact match (IBM 704), but several chunks (2,3) seem totally irrelevant. The context retriever manages to get multiple chunks discussing Graham programming on the IBM 704. Unfortunately it totally misses the chunk that directly references the 704 directly by name, showing somewhat of a tradeoff, and the need for hybrid search - which Anthropic does mention in their blog post. You may have different results with different embedding models, prompts, and contextualization LLMs.