# Contextual RAG

In this notebook, we'll explore Contextual Retrieval, a technique to improve the accuracy of vector search by providing additional context for the chunks of a document, by inputting both the document and the chunk to an LLM and asking it to provide a succinct context for the chunk within the document.

This is a way to combat the lost context problem that occurs in chunking, e.g., if a text is split into sentences, the context of later sentences as they relate to earlier sentences is lost.

The idea here is to do these things:
1. For each document, make chunks (Nothing new. Just like Vanilla RAG)
2. For each Chunk you created, as an LLM create a context of that Chunk (You see this is new!)
3. Append that context to the original chunk
4. Create BM-25 and Vector Index based on those chunks for Hybrid Search (New to you? See this amazing blog by LanceDB on hybrid search)
5. Search as usual!

**Change Runtime with GPU to run this notebook**

## Install Dependencies

In [1]:
# Install
!pip install -U openai lancedb einops sentence-transformers transformers datasets tantivy rerankers -qq

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.4/44.4 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m383.5/383.5 kB[0m [31m17.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.2/24.2 MB[0m [31m58.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m29.2/29.2 MB[0m [31m47.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m245.3/245.3 kB[0m [31m20.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.9/9.9 MB[0m [31m72.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m471.6/471.6 kB[0m [31m28.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.5/4.5 MB[0m [31m92.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [2]:
# Get the data
!wget -P ./data/ https://raw.githubusercontent.com/anthropics/anthropic-cookbook/refs/heads/main/skills/contextual-embeddings/data/codebase_chunks.json

--2024-10-07 09:03:31--  https://raw.githubusercontent.com/anthropics/anthropic-cookbook/refs/heads/main/skills/contextual-embeddings/data/codebase_chunks.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1126046 (1.1M) [text/plain]
Saving to: ‘./data/codebase_chunks.json’


2024-10-07 09:03:32 (41.2 MB/s) - ‘./data/codebase_chunks.json’ saved [1126046/1126046]



### Set OPENAI and Anthropic API KEY as env variable

In [3]:
# IMPORT

import os, re, random, json
import pandas as pd
from datasets import load_dataset
import torch
import gc
import lancedb
import openai
from lancedb.embeddings import get_registry
from lancedb.pydantic import LanceModel, Vector
from tqdm.auto import tqdm
from openai import OpenAI

pd.set_option("max_colwidth", 400)

OAI_KEY = "sk-proj-...."  # Replace with your OpenAI Key
os.environ["OPENAI_API_KEY"] = OAI_KEY

gpt_client = OpenAI(api_key=OAI_KEY)  # For Contenxt text generation

model = (
    get_registry()
    .get("sentence-transformers")
    .create(name="BAAI/bge-small-en-v1.5", device="cuda")
)  # For embedding

## Data Loading and Chunking

In [4]:
def load_raw_data(datapath="/content/data/codebase_chunks.json", debugging=False):
    with open(datapath, "r") as f:
        dataset = json.load(f)
    if debugging:
        print("Debugging Mode: Using few doc samples only ")
        dataset = dataset[:5]  # just use a sample only

    data = []
    num_docs = len(dataset)
    total_chunks = sum(len(doc["chunks"]) for doc in dataset)

    with tqdm(
        total=num_docs,
        desc=f"Processing {total_chunks} chunks from {len(dataset)} docs",
    ) as pbar:
        for doc in dataset:  # Full document
            for chunk in doc["chunks"]:  # Each document has multiple chunks
                data.append(
                    {
                        "raw_chunk": chunk[
                            "content"
                        ],  # We won't make Embedding from this instead we'll create new Context based on Chunk and full_doc
                        "full_doc": doc[
                            "content"
                        ],  # This shouldn't be saved in DB as it'll grow the DB size to a lot
                        "doc_id": doc["doc_id"],
                        "original_uuid": doc["original_uuid"],
                        "chunk_id": chunk["chunk_id"],
                        "original_index": chunk["original_index"],
                    }
                )
                pbar.update(1)

    return data


raw_chunks = load_raw_data(
    debugging=True
)  # For debugging and tutorial purpose, just use ther first few documents only

Debugging Mode: Using few doc samples only 


Processing 29 chunks from 5 docs:   0%|          | 0/5 [00:00<?, ?it/s]

## Vanilla RAG

In [5]:
class VanillaDocuments(LanceModel):
    vector: Vector(model.ndims()) = model.VectorField()  # Default field
    raw_chunk: str = (
        model.SourceField()
    )  # the Columns (field) in DB whose Embedding we'll create
    doc_id: str  # rest is just metadata below
    original_uuid: str
    chunk_id: str
    original_index: int
    full_doc: str


db = lancedb.connect("./db")
vanilla_table = db.create_table("vanilla_documents", schema=VanillaDocuments)

vanilla_table.add(raw_chunks)  # ingest docs with auto-vectorization
vanilla_table.create_fts_index(
    "raw_chunk"
)  # Create a fts index before so that we can use BM-25 later

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.8k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [6]:
QUERY = "implement corpus management with event handling"

In [7]:
vanilla_table.search(QUERY, query_type="hybrid").limit(3).to_pandas().drop(
    ["vector", "original_uuid"], axis=1
)

Unnamed: 0,raw_chunk,doc_id,chunk_id,original_index,full_doc,_relevance_score
0,"#[cfg(windows)]\nuse std::ptr::write_volatile;\nuse std::{path::PathBuf, ptr::write};\n\n#[cfg(feature = ""tui"")]\nuse libafl::monitors::tui::{ui::TuiUI, TuiMonitor};\n#[cfg(not(feature = ""tui""))]\nuse libafl::monitors::SimpleMonitor;\nuse libafl::{\n corpus::{InMemoryCorpus, OnDiskCorpus},\n events::SimpleEventManager,\n executors::{inprocess::InProcessExecutor, ExitKind},\n feedba...",doc_2,doc_2_chunk_0,0,"#[cfg(windows)]\nuse std::ptr::write_volatile;\nuse std::{path::PathBuf, ptr::write};\n\n#[cfg(feature = ""tui"")]\nuse libafl::monitors::tui::{ui::TuiUI, TuiMonitor};\n#[cfg(not(feature = ""tui""))]\nuse libafl::monitors::SimpleMonitor;\nuse libafl::{\n corpus::{InMemoryCorpus, OnDiskCorpus},\n events::SimpleEventManager,\n executors::{inprocess::InProcessExecutor, ExitKind},\n feedba...",0.032002
1,"use core::{ffi::c_void, fmt::Debug};\nuse std::sync::atomic::{AtomicBool, AtomicUsize, Ordering};\n\nuse libafl::{\n events::EventFirer,\n executors::ExitKind,\n feedbacks::Feedback,\n inputs::UsesInput,\n observers::{Observer, ObserversTuple},\n state::State,\n Error,\n};\nuse libafl_bolts::Named;\nuse libc::SIGABRT;\nuse serde::{Deserialize, Serialize};\n\nextern ""C"" {\n...",doc_3,doc_3_chunk_0,0,"use core::{ffi::c_void, fmt::Debug};\nuse std::sync::atomic::{AtomicBool, AtomicUsize, Ordering};\n\nuse libafl::{\n events::EventFirer,\n executors::ExitKind,\n feedbacks::Feedback,\n inputs::UsesInput,\n observers::{Observer, ObserversTuple},\n state::State,\n Error,\n};\nuse libafl_bolts::Named;\nuse libc::SIGABRT;\nuse serde::{Deserialize, Serialize};\n\nextern ""C"" {\n...",0.016393
2,"// The Monitor trait define how the fuzzer stats are displayed to the user\n #[cfg(not(feature = ""tui""))]\n let mon = SimpleMonitor::new(|s| println!(""{s}""));\n #[cfg(feature = ""tui"")]\n let ui = TuiUI::with_version(String::from(""Baby Fuzzer""), String::from(""0.0.1""), false);\n #[cfg(feature = ""tui"")]\n let mon = TuiMonitor::new(ui);\n\n // The event manager handle the ...",doc_2,doc_2_chunk_4,4,"#[cfg(windows)]\nuse std::ptr::write_volatile;\nuse std::{path::PathBuf, ptr::write};\n\n#[cfg(feature = ""tui"")]\nuse libafl::monitors::tui::{ui::TuiUI, TuiMonitor};\n#[cfg(not(feature = ""tui""))]\nuse libafl::monitors::SimpleMonitor;\nuse libafl::{\n corpus::{InMemoryCorpus, OnDiskCorpus},\n events::SimpleEventManager,\n executors::{inprocess::InProcessExecutor, ExitKind},\n feedba...",0.016393


## Contextual Retrieval with Prompt Caching

In [9]:
def create_context_prompt(full_document_text, chunk_text):
    prompt = f"""
<document>
{full_document_text}
</document>

Here is the chunk we want to situate within the whole document
<chunk>
{chunk_text}
</chunk>

Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk.
Answer only with the succinct context and nothing else.
"""
    return (
        prompt,
        gpt_client.chat.completions.create(
            model="gpt-4o-mini", messages=[{"role": "user", "content": prompt}]
        )
        .choices[0]
        .message.content.strip(),
    )


for chunk in raw_chunks:
    prompt, response = create_context_prompt(chunk["full_doc"], chunk["raw_chunk"])
    chunk["prompt"] = prompt
    chunk["chunk_context"] = response
    chunk["chunk_with_context"] = chunk["chunk_context"] + "\n" + chunk["raw_chunk"]

In [10]:
class Documents(LanceModel):
    vector: Vector(model.ndims()) = model.VectorField()  # Default field
    text: str = (
        model.SourceField()
    )  # the Columns (field) in DB whose Embedding we'll create
    doc_id: str  # rest is just metadata below
    raw_chunk: str
    full_doc: str
    original_uuid: str
    chunk_id: str
    original_index: int


KEYS = [
    "raw_chunk",
    "full_doc",
    "doc_id",
    "original_uuid",
    "chunk_id",
    "original_index",
]

context_documents = []
for chunk in raw_chunks:
    temp = {
        "text": chunk["chunk_with_context"]
    }  # Create embedding from 'text' field which is (Chunk_Context_i + Chunk_i)

    for key in KEYS:
        temp[key] = chunk[key]  # Get other metadata
    context_documents.append(temp)


context_table = db.create_table("added_context_table", schema=Documents)

context_table.add(context_documents)  # ingest docs with auto-vectorization
context_table.create_fts_index(
    "text"
)  # Create a fts index before so that we can use BM-25 later

Let's search with Contextual Retrieval and see the difference

In [11]:
context_table.search(QUERY, query_type="hybrid").limit(3).to_pandas().drop(
    ["vector", "original_uuid"], axis=1
)

Unnamed: 0,text,doc_id,raw_chunk,full_doc,chunk_id,original_index,_relevance_score
0,"This chunk is part of the main function in a fuzzing application, specifically focusing on the setup of the monitor for displaying fuzzer statistics and the event manager for handling events during the fuzzing loop. It follows the initialization of state, feedback mechanisms, and sets up the fuzzer with a scheduling policy for managing test cases from the corpus.\n // The Monitor trait defi...",doc_2,"// The Monitor trait define how the fuzzer stats are displayed to the user\n #[cfg(not(feature = ""tui""))]\n let mon = SimpleMonitor::new(|s| println!(""{s}""));\n #[cfg(feature = ""tui"")]\n let ui = TuiUI::with_version(String::from(""Baby Fuzzer""), String::from(""0.0.1""), false);\n #[cfg(feature = ""tui"")]\n let mon = TuiMonitor::new(ui);\n\n // The event manager handle the ...","#[cfg(windows)]\nuse std::ptr::write_volatile;\nuse std::{path::PathBuf, ptr::write};\n\n#[cfg(feature = ""tui"")]\nuse libafl::monitors::tui::{ui::TuiUI, TuiMonitor};\n#[cfg(not(feature = ""tui""))]\nuse libafl::monitors::SimpleMonitor;\nuse libafl::{\n corpus::{InMemoryCorpus, OnDiskCorpus},\n events::SimpleEventManager,\n executors::{inprocess::InProcessExecutor, ExitKind},\n feedba...",doc_2_chunk_4,4,0.032787
1,"The chunk contains Rust code that includes the necessary imports and configurations for a fuzzing framework using the libafl library. It sets up the environment for the fuzzer, including the configuration for different operating systems and features, and imports various modules required for corpus management, executors, feedback mechanisms, and mutators, establishing the foundational component...",doc_2,"#[cfg(windows)]\nuse std::ptr::write_volatile;\nuse std::{path::PathBuf, ptr::write};\n\n#[cfg(feature = ""tui"")]\nuse libafl::monitors::tui::{ui::TuiUI, TuiMonitor};\n#[cfg(not(feature = ""tui""))]\nuse libafl::monitors::SimpleMonitor;\nuse libafl::{\n corpus::{InMemoryCorpus, OnDiskCorpus},\n events::SimpleEventManager,\n executors::{inprocess::InProcessExecutor, ExitKind},\n feedba...","#[cfg(windows)]\nuse std::ptr::write_volatile;\nuse std::{path::PathBuf, ptr::write};\n\n#[cfg(feature = ""tui"")]\nuse libafl::monitors::tui::{ui::TuiUI, TuiMonitor};\n#[cfg(not(feature = ""tui""))]\nuse libafl::monitors::SimpleMonitor;\nuse libafl::{\n corpus::{InMemoryCorpus, OnDiskCorpus},\n events::SimpleEventManager,\n executors::{inprocess::InProcessExecutor, ExitKind},\n feedba...",doc_2_chunk_0,0,0.032258
2,"The chunk contains necessary imports, declarations of static atomic variables, and the definition of an external C function, which are foundational components for managing memory allocation tracking in the context of a memory monitoring system within a Rust program that utilizes the libafl library for fuzzing.\nuse core::{ffi::c_void, fmt::Debug};\nuse std::sync::atomic::{AtomicBool, AtomicUsi...",doc_3,"use core::{ffi::c_void, fmt::Debug};\nuse std::sync::atomic::{AtomicBool, AtomicUsize, Ordering};\n\nuse libafl::{\n events::EventFirer,\n executors::ExitKind,\n feedbacks::Feedback,\n inputs::UsesInput,\n observers::{Observer, ObserversTuple},\n state::State,\n Error,\n};\nuse libafl_bolts::Named;\nuse libc::SIGABRT;\nuse serde::{Deserialize, Serialize};\n\nextern ""C"" {\n...","use core::{ffi::c_void, fmt::Debug};\nuse std::sync::atomic::{AtomicBool, AtomicUsize, Ordering};\n\nuse libafl::{\n events::EventFirer,\n executors::ExitKind,\n feedbacks::Feedback,\n inputs::UsesInput,\n observers::{Observer, ObserversTuple},\n state::State,\n Error,\n};\nuse libafl_bolts::Named;\nuse libc::SIGABRT;\nuse serde::{Deserialize, Serialize};\n\nextern ""C"" {\n...",doc_3_chunk_0,0,0.015873


Here we are seeing the difference between the results while using normal retrieval and contextual retrieval with prompt caching and Hybrid search and LanceDB reranking API.