# CONTEXTUAL RETRIEVAL WITH LLAMA_INDEX

This notebook covers contextual retrieval with llama_index DocumentContextExtractor

Based on an Anthropic [blost post](https://www.anthropic.com/news/contextual-retrieval), the concept is to:
1. Use an LLM to generate a 'context' for each chunk based on the entire document
2. embed the chunk + context together
3. reap the benefits of higher RAG accuracy

While you can also do this manually, the DocumentContextExtractor offers a lot of convenience and error handling, plus you can integrate it into your llama_index pipelines! Let's get started.

# INSTALL PACKAGES

In [None]:
%pip install llama-index-readers-file
%pip install llama-index-embeddings-huggingface
%pip install llama-index-llms-huggingface-api

# SETUP AN LLM
You can use the MockLLM or you can use a real LLM of your choice here. flash 2 and gpt-4o-mini work well.

In [None]:
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
from llama_index.core import Settings
import os

HF_TOKEN = os.getenv("HUGGING_FACE_TOKEN")
# HF_TOKEN = ''
# Using Hugging Face's inference API anonymously
llm = HuggingFaceInferenceAPI(
    model_name="HuggingFaceH4/zephyr-7b-alpha", token=HF_TOKEN
)

Settings.llm = llm

 #Setup a data pipeline

 we'll need an embedding model, an index store, a vectore store, and a way to split tokens.

# Build Pipeline & Index

In [None]:
from llama_index.core import VectorStoreIndex, StorageContext, Settings
from llama_index.core.node_parser import TokenTextSplitter
from llama_index.core.storage.index_store.simple_index_store import (
    SimpleIndexStore,
)
from llama_index.core.vector_stores.simple import SimpleVectorStore
from llama_index.core.storage.docstore.simple_docstore import (
    SimpleDocumentStore,
)
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

# Initialize document store and embedding model
docstore = SimpleDocumentStore()
embed_model = HuggingFaceEmbedding(model_name="all-MiniLM-L6-v2")

# Create storage context
storage_context = StorageContext.from_defaults(docstore=docstore)

text_splitter = TokenTextSplitter(
    separator=" ", chunk_size=512, chunk_overlap=10
)

#### DocumentContextExtractor

In [None]:
# This is the new part!

from llama_index.core.extractors import DocumentContextExtractor

context_extractor = DocumentContextExtractor(
    # mandatory
    docstore=docstore,
    max_context_length=128000,
    # optional
    llm=llm,  # default to Settings.llm
    oversized_document_strategy="warn",
    max_output_tokens=100,
    key="context",
    prompt=DocumentContextExtractor.DEFAULT_CONTEXT_PROMPT,
)

#### Build Index

In [None]:
import nest_asyncio

nest_asyncio.apply()

index = VectorStoreIndex.from_documents(
    documents=[],
    storage_context=storage_context,
    embed_model=embed_model,
    transformations=[text_splitter, context_extractor],
)

import nest_asyncio

nest_asyncio.apply()

index_nocontext = VectorStoreIndex.from_documents(
    documents=[],
    storage_context=storage_context,
    embed_model=embed_model,
    transformations=[text_splitter],
)

0it [00:00, ?it/s]


# LOAD DATA

In [None]:
!wget "https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt" "paul_graham_essay.txt"

'wget' is not recognized as an internal or external command,
operable program or batch file.


In [None]:
from llama_index.core import SimpleDirectoryReader

reader = SimpleDirectoryReader(input_files=["paul_graham_essay.txt"])
documents = reader.load_data()

# Run the pipeline, then search

In [None]:
# have to keep this updated for the DocumentContextExtractor to function.
# everytime we insert a doc the entire pipeline will run and context will be generated
storage_context.docstore.add_documents(documents)
for doc in documents:
    index.insert(doc)
    index_nocontext.insert(doc)

In [None]:
# Verify all nodes have context
assert context_extractor.is_job_complete()

{'94487fa8-06f6-434b-ba66-e0a1a6c06bb3', '606345ec-b8a8-4d87-bcab-3d07acbd04b5', 'bc81b234-5d95-4d78-8992-cbeed2c60d6b', 'a9b0eea8-f815-4a96-b7f2-863d4937e97e', '1333ed0a-2c09-4367-aa43-6268d995ac9a', '435a4fba-69c9-4a09-aaa2-2bb1333e2727', '7bd7f993-6c79-4564-bb32-d65072fdc711', 'd3a00a6c-8046-4432-aed4-906425ebc418', '7d534d81-455c-468a-8bf1-77fd4bce37f2', '2db690d2-137a-492c-9605-03c99c3c0e91', 'affc0804-fb85-477b-9519-6c65552d6025', 'be9eabe4-274e-4747-a1a8-6ec4a6d53f97', '5f74c666-3da0-4868-b4f0-bdf462625959', 'c1c91ff1-a46c-4737-91b4-fd43dd517229', 'de5bd5ee-2ca3-4d2e-a9fc-418e22cf739b', '1b625f8c-bb6a-43e2-940a-6530a64a7f86', 'facc442a-9ae6-4433-90e5-5590cd52dffe', '1fef7407-3707-4eca-b474-c9403d3a8a05', 'dd0d84d6-2fbe-483d-be3e-af0eb6f58635', 'bd5964e0-1066-49ed-ac39-27149cbf5c2d', '7ac18149-7a6f-40b2-8519-36caa4074b64', 'b9789b34-6d7f-4563-9cc8-2217e5e554a1', '0bee2203-7346-4d93-89a9-2261bd2cab52', '3357dd08-e4c7-4cad-9a0b-fc90a01a2a8a', '3bd7a3aa-6779-4482-879b-1871428c4954',

In [None]:
retriever = index.as_retriever(similarity_top_k=5)
nodes_fromcontext = retriever.retrieve("Who is Paul Graham.")

retriever_nocontext = index_nocontext.as_retriever(similarity_top_k=5)
nodes_nocontext = retriever.retrieve("Who is Paul Graham.")
# Print each node's content
print("==========")
print("NO CONTEXT")
for i, node in enumerate(nodes_nocontext, 1):
    print(f"\nChunk {i}:")
    print(f"Score: {node.score}")  # Similarity score
    print(f"Content: {node.node.text}")  # The actual text content

# Print each node's content
print("==========")
print("WITH CONTEXT")
for i, node in enumerate(nodes_fromcontext, 1):
    print(f"\nChunk {i}:")
    print(f"Score: {node.score}")  # Similarity score
    print(f"Content: {node.node.text}")  # The actual text content



Paul Graham is the author of the excerpt provided in the given context. He is a well-known entrepreneur, essayist, and venture capitalist. He co-founded the startup incubator Y Combinator, which has helped launch companies such as Airbnb, Stripe, and Reddit. Graham is also the author of several books, including "Hackers & Painters" and "The Matter with Things."


In [None]:
# save the index and vectorstore, cause it can take time and money to generate context!
storage_context.persist(persist_path="./")