# Metadata Replacement + Node Sentence Window
In this notebook, we use the ```SentenceWindowNodeParser``` to parse documents into single sentences per node. Each node also contains a "window" with the sentences on either side of the node sentence.

Then, after retrieval, before passing the retrieved sentences to the LLM, the single sentences are replaced with a window containing the surrounding sentences using the ```MetadataReplacementNodePostProcessor```.

This is most useful for large documents/indexes, as it helps to retrieve more fine-grained details.

By default, the sentence window is 5 sentences on either side of the original sentence.

In this case, chunk size settings are not used, in favor of following the window settings.

In [1]:
from dotenv import load_dotenv
load_dotenv()
import os

In [2]:
%pip install llama-index-embeddings-openai
%pip install llama-index-embeddings-huggingface
%pip install llama-index-llms-openai


Collecting llama-index-embeddings-huggingface
  Downloading llama_index_embeddings_huggingface-0.3.1-py3-none-any.whl.metadata (718 bytes)
Downloading llama_index_embeddings_huggingface-0.3.1-py3-none-any.whl (8.6 kB)
Installing collected packages: llama-index-embeddings-huggingface
Successfully installed llama-index-embeddings-huggingface-0.3.1
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [3]:
%load_ext autoreload
%autoreload 2

In [10]:
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.node_parser import SentenceWindowNodeParser
from llama_index.core.node_parser import SentenceSplitter

# create the sentence window node parser w/ default settings
node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=3,
    window_metadata_key="window",
    original_text_metadata_key="original_text",
)

# base node parser is a sentence splitter
text_splitter = SentenceSplitter()

llm = OpenAI(model="gpt-4o-mini", temperature=0.1)
# embed_model = HuggingFaceEmbedding(
#     model_name="sentence-transformers/all-mpnet-base-v2", max_length=512
# )


from llama_index.embeddings.openai import OpenAIEmbedding
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

from llama_index.core import Settings

Settings.llm = llm
# Settings.embed_model = embed_model
Settings.text_splitter = text_splitter

# Load Data, Build the Index
In this section, we load data and build the vector index.

# Load Data
Here, we build an index using chapter 3 of the recent IPCC climate report.

In [5]:
!curl https://www.ipcc.ch/report/ar6/wg2/downloads/report/IPCC_AR6_WGII_Chapter03.pdf --output IPCC_AR6_WGII_Chapter03.pdf

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
 47 20.7M   47  9.9M    0     0  9737k      0  0:00:02  0:00:01  0:00:01 9769k
 87 20.7M   87 18.0M    0     0  9053k      0  0:00:02  0:00:02 --:--:-- 9067k
100 20.7M  100 20.7M    0     0  8509k      0  0:00:02  0:00:02 --:--:-- 8520k


In [11]:
from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader(
    input_files=["./IPCC_AR6_WGII_Chapter03.pdf"]
).load_data()

# Extract Nodes
We extract out the set of nodes that will be stored in the VectorIndex. This includes both the nodes with the sentence window parser, as well as the "base" nodes extracted using the standard parser.

In [12]:
nodes = node_parser.get_nodes_from_documents(documents)

In [13]:
base_nodes = text_splitter.get_nodes_from_documents(documents)

# Build the Indexes
We build both the sentence index, as well as the "base" index (with default chunk sizes).

📌Load the embeddings from local director if it's build already to save cost!

In [None]:
from llama_index.core import VectorStoreIndex


In [None]:
sentence_index = VectorStoreIndex(nodes)

In [18]:
base_index = VectorStoreIndex(base_nodes)

# Persist the Embeddings:
After generating the embeddings, call the persist method on your vector store instance.

In [16]:
sentence_index.storage_context.persist(persist_dir="./my_text_embeddings")

In [19]:
base_index.storage_context.persist(persist_dir="./my_base_index_embeddings")

# Loading Embeddings from Local Storage
When you want to retrieve the stored embeddings, you can load them back into your application.

In [None]:
from llama_index.core import StorageContext, load_index_from_storage


In [None]:
# Rebuild storage context
storage_context = StorageContext.from_defaults(persist_dir="./my_text_embeddings")

# Load the index
sentence_index = load_index_from_storage(storage_context)

In [None]:
# Rebuild storage context
storage_context = StorageContext.from_defaults(persist_dir="./my_base_index_embeddings")

# Load the index
base_index = load_index_from_storage(storage_context)

# Querying
### With MetadataReplacementPostProcessor
Here, we now use the ```MetadataReplacementPostProcessor``` to replace the sentence in each node with it's surrounding context.

In [21]:
from llama_index.core.postprocessor import MetadataReplacementPostProcessor

query_engine = sentence_index.as_query_engine(
    similarity_top_k=2,
    # the target key defaults to `window` to match the node_parser's default
    node_postprocessors=[
        MetadataReplacementPostProcessor(target_metadata_key="window")
    ],
)
window_response = query_engine.query(
    "What are the concerns surrounding the AMOC?"
)
print(window_response)

Concerns surrounding the Atlantic meridional overturning circulation (AMOC) include low confidence in quantifying its changes during the 20th century due to a lack of agreement in reconstructed and simulated trends. Additionally, direct observational records since the mid-2000s are considered too short to accurately assess the contributions of internal variability, natural forcing, and anthropogenic forcing to changes in AMOC. Projections indicate that AMOC is very likely to decline over the 21st century across all Shared Socioeconomic Pathways (SSP) scenarios, although an abrupt collapse is not expected before 2100.


### We can also check the original sentence that was retrieved for each node, as well as the actual window of sentences that was sent to the LLM.

In [22]:
window = window_response.source_nodes[0].node.metadata["window"]
sentence = window_response.source_nodes[0].node.metadata["original_text"]

print(f"Window: {window}")
print("------------------")
print(f"Original Sentence: {sentence}")

Window: WGI AR6 assessed that only the California Current system has 
undergone large-scale upwelling-favourable wind intensification since 
the 1980s (medium confidence) (WGI AR6 Section  9.2.1.5; García-
Reyes and Largier, 2010; Seo et al., 2012; Fox-Kemper et al., 2021).
 While no consistent pattern of contemporary changes in upwelling-
favourable winds emerges from observation-based studies, numerical 
and theoretical work projects that summertime winds near poleward 
boundaries of upwelling zones will intensify, while winds near 
equatorward boundaries will weaken (high confidence) (WGI AR6 
Section  9.2.3.5; García-Reyes et  al., 2015; Rykaczewski et  al., 2015; 
Wang et  al., 2015; Aguirre et  al., 2019; Fox-Kemper et  al., 2021).  Nevertheless, projected future annual cumulative upwelling wind 
changes at most locations and seasons remain within ±10–20% of 
present-day values (medium confidence) (WGI AR6 Section  9.2.3.5; 
Fox-Kemper et al., 2021).
 Continuous observation of th

### Contrast with normal VectorStoreIndex


In [23]:
query_engine = base_index.as_query_engine(similarity_top_k=2)
vector_response = query_engine.query(
    "What are the concerns surrounding the AMOC?"
)
print(vector_response)

Concerns surrounding the Atlantic Meridional Overturning Circulation (AMOC) include low confidence in reconstructed and modeled changes for the 20th century, which raises uncertainties about its historical behavior. Projections indicate that the AMOC is expected to decline over the 21st century, with high confidence in this trend, although there is low confidence regarding quantitative projections of the extent of this decline. This uncertainty poses challenges for understanding the potential impacts on climate and oceanic systems.


### Well, that didn't work. Let's bump up the top k! This will be slower and use more tokens compared to the sentence window index.

In [27]:
query_engine = base_index.as_query_engine(similarity_top_k=5)
vector_response = query_engine.query(
    "What are the concerns surrounding the AMOC?"
)
print(vector_response)

There is low confidence in reconstructed and modeled changes of the Atlantic Meridional Overturning Circulation (AMOC) for the 20th century. Projections indicate that the AMOC is expected to decline over the 21st century, although there is high confidence in this trend, the confidence in quantitative projections remains low. This uncertainty raises concerns about the potential impacts on climate patterns, ocean circulation, and associated ecosystems.


# Analysis

So the ```SentenceWindowNodeParser``` + ```MetadataReplacementNodePostProcessor``` combo is the clear winner here. But why?

Embeddings at a sentence level seem to capture more fine-grained details, like the word ```AMOC```.

We can also compare the retrieved chunks for each index!

In [28]:
for source_node in window_response.source_nodes:
    print(source_node.node.metadata["original_text"])
    print("--------")

Continuous observation of the Atlantic meridional overturning 
circulation (AMOC) has improved the understanding of its variability 
(Frajka-Williams et  al., 2019), but there is low confidence in the 
quantification of AMOC changes in the 20th century because of low 
agreement in quantitative reconstructed and simulated trends (WGI 
AR6 Sections 2.3.3, 9.2.3.1; Fox-Kemper et al., 2021; Gulev et al., 2021). 

--------
Over the 21st century, AMOC will very likely decline for all SSP 
scenarios but will not involve an abrupt collapse before 2100 (WGI 
AR6 Sections 4.3.2, 9.2.3.1; Fox-Kemper et al., 2021; Lee et al., 2021).

--------


Here, we can see that the sentence window index easily retrieved two nodes that talk about AMOC. Remember, the embeddings are based purely on the original sentence here, but the LLM actually ends up reading the surrounding context as well!

Now, let's try and disect why the naive vector index failed.

In [29]:
for node in vector_response.source_nodes:
    print("AMOC mentioned?", "AMOC" in node.node.text)
    print("--------")

AMOC mentioned? False
--------
AMOC mentioned? True
--------
AMOC mentioned? False
--------
AMOC mentioned? False
--------
AMOC mentioned? False
--------


So source node at index [2] mentions AMOC, but what did this text actually look like?



In [30]:
print(vector_response.source_nodes[2].node.text)

3
483Oceans and Coastal Ecosystems and Their Services  Chapter 3
Figure  3.26; Bates et  al., 2014; Lubchenco and Grorud-Colvert, 2015; 
Gattuso et  al., 2018) because they alleviate non-climate drivers and 
promote biodiversity (i.e., ‘managed resilience hypothesis’) (Bruno 
et  al., 2019; Maestro et  al., 2019; Cinner et  al., 2020). Current MPAs 
offer conservation benefits such as increases in biomass and diversity 
of habitats, populations and communities (high confidence) (Pendleton 
et al., 2018; Bates et al., 2019; Stevenson et al., 2020; Lenihan et al., 2021; 
Ohayon et al., 2021), and these benefits may last after some (possibly 
climate-enhanced) disturbances (e.g., tropical cyclones) (McClure et al., 
2020). But current MPAs do not provide resilience against observed 
warming and heatwaves in tropical-to-temperate ecosystems (medium 
confidence) (Bates et al., 2019; Bruno et al., 2019; Freedman et al., 2020; 
Graham et al., 2020; Rilov et al., 2020). There is robust evidenc

So AMOC is disuccsed, but sadly it is in the middle chunk. With LLMs, it is often observed that text in the middle of retrieved context is often ignored or less useful. A recent paper ["Lost in the Middle" discusses this here](https://arxiv.org/abs/2307.03172).

### [Optional] Evaluation
We more rigorously evaluate how well the sentence window retriever works compared to the base retriever.

We define/load an eval benchmark dataset and then run different evaluations over it.

WARNING: This can be expensive, especially with GPT-4. Use caution and tune the sample size to fit your budget.

In [None]:
from llama_index.core.evaluation import DatasetGenerator, QueryResponseDataset

from llama_index.llms.openai import OpenAI
import nest_asyncio
import random

nest_asyncio.apply()

In [None]:
len(base_nodes)

In [None]:
num_nodes_eval = 30
# there are 428 nodes total. Take the first 200 to generate questions (the back half of the doc is all references)
sample_eval_nodes = random.sample(base_nodes[:200], num_nodes_eval)
# NOTE: run this if the dataset isn't already saved
# generate questions from the largest chunks (1024)
dataset_generator = DatasetGenerator(
    sample_eval_nodes,
    llm=OpenAI(model="gpt-4"),
    show_progress=True,
    num_questions_per_chunk=2,
)

In [None]:
eval_dataset = await dataset_generator.agenerate_dataset_from_nodes()

In [None]:
eval_dataset.save_json("data/ipcc_eval_qr_dataset.json")

In [None]:
# optional
eval_dataset = QueryResponseDataset.from_json("data/ipcc_eval_qr_dataset.json")

# Compare Results


In [None]:
import asyncio
import nest_asyncio

nest_asyncio.apply()

In [None]:
from llama_index.core.evaluation import (
    CorrectnessEvaluator,
    SemanticSimilarityEvaluator,
    RelevancyEvaluator,
    FaithfulnessEvaluator,
    PairwiseComparisonEvaluator,
)


from collections import defaultdict
import pandas as pd

# NOTE: can uncomment other evaluators
evaluator_c = CorrectnessEvaluator(llm=OpenAI(model="gpt-4"))
evaluator_s = SemanticSimilarityEvaluator()
evaluator_r = RelevancyEvaluator(llm=OpenAI(model="gpt-4"))
evaluator_f = FaithfulnessEvaluator(llm=OpenAI(model="gpt-4"))
# pairwise_evaluator = PairwiseComparisonEvaluator(llm=OpenAI(model="gpt-4"))

In [None]:
from llama_index.core.evaluation.eval_utils import (
    get_responses,
    get_results_df,
)
from llama_index.core.evaluation import BatchEvalRunner

max_samples = 30

eval_qs = eval_dataset.questions
ref_response_strs = [r for (_, r) in eval_dataset.qr_pairs]

# resetup base query engine and sentence window query engine
# base query engine
base_query_engine = base_index.as_query_engine(similarity_top_k=2)
# sentence window query engine
query_engine = sentence_index.as_query_engine(
    similarity_top_k=2,
    # the target key defaults to `window` to match the node_parser's default
    node_postprocessors=[
        MetadataReplacementPostProcessor(target_metadata_key="window")
    ],
)

In [None]:
import numpy as np

base_pred_responses = get_responses(
    eval_qs[:max_samples], base_query_engine, show_progress=True
)
pred_responses = get_responses(
    eval_qs[:max_samples], query_engine, show_progress=True
)

pred_response_strs = [str(p) for p in pred_responses]
base_pred_response_strs = [str(p) for p in base_pred_responses]

In [None]:
evaluator_dict = {
    "correctness": evaluator_c,
    "faithfulness": evaluator_f,
    "relevancy": evaluator_r,
    "semantic_similarity": evaluator_s,
}
batch_runner = BatchEvalRunner(evaluator_dict, workers=2, show_progress=True)

Run evaluations over faithfulness/semantic similarity.



In [None]:
eval_results = await batch_runner.aevaluate_responses(
    queries=eval_qs[:max_samples],
    responses=pred_responses[:max_samples],
    reference=ref_response_strs[:max_samples],
)

In [None]:
base_eval_results = await batch_runner.aevaluate_responses(
    queries=eval_qs[:max_samples],
    responses=base_pred_responses[:max_samples],
    reference=ref_response_strs[:max_samples],
)

In [None]:
results_df = get_results_df(
    [eval_results, base_eval_results],
    ["Sentence Window Retriever", "Base Retriever"],
    ["correctness", "relevancy", "faithfulness", "semantic_similarity"],
)
display(results_df)
