<a href="https://colab.research.google.com/github/sanjayakanungo/RAG/blob/main/Advanced_RAG_with_Sentence_Window_Retrieval.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/node_postprocessor/MetadataReplacementDemo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Metadata Replacement + Node Sentence Window

In this notebook, we use the `SentenceWindowNodeParser` to parse documents into single sentences per node. Each node also contains a "window" with the sentences on either side of the node sentence.

Then, during retrieval, before passing the retrieved sentences to the LLM, the single sentences are replaced with a window containing the surrounding sentences using the `MetadataReplacementNodePostProcessor`.

This is most useful for large documents/indexes, as it helps to retrieve more fine-grained details.

By default, the sentence window is 5 sentences on either side of the original sentence.

In this case, chunk size settings are not used, in favor of following the window settings.

In [None]:
%pip install llama-index-embeddings-openai
%pip install llama-index-embeddings-huggingface
%pip install llama-index-llms-openai

Collecting llama-index-embeddings-openai
  Downloading llama_index_embeddings_openai-0.1.6-py3-none-any.whl (6.0 kB)
Collecting llama-index-core<0.11.0,>=0.10.1 (from llama-index-embeddings-openai)
  Downloading llama_index_core-0.10.12-py3-none-any.whl (15.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.3/15.3 MB[0m [31m46.3 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json (from llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai)
  Downloading dataclasses_json-0.6.4-py3-none-any.whl (28 kB)
Collecting deprecated>=1.2.9.3 (from llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai)
  Downloading Deprecated-1.2.14-py2.py3-none-any.whl (9.6 kB)
Collecting dirtyjson<2.0.0,>=1.0.8 (from llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai)
  Downloading dirtyjson-1.0.8-py3-none-any.whl (25 kB)
Collecting httpx (from llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai)
  Downloading httpx-0.27.0-py3-none-any.w

In [None]:
%load_ext autoreload
%autoreload 2

## Setup

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

In [None]:
!pip install llama-index

Collecting llama-index
  Downloading llama_index-0.10.12-py3-none-any.whl (5.6 kB)
Collecting llama-index-agent-openai<0.2.0,>=0.1.4 (from llama-index)
  Downloading llama_index_agent_openai-0.1.5-py3-none-any.whl (12 kB)
Collecting llama-index-cli<0.2.0,>=0.1.2 (from llama-index)
  Downloading llama_index_cli-0.1.5-py3-none-any.whl (25 kB)
Collecting llama-index-indices-managed-llama-cloud<0.2.0,>=0.1.2 (from llama-index)
  Downloading llama_index_indices_managed_llama_cloud-0.1.3-py3-none-any.whl (6.6 kB)
Collecting llama-index-legacy<0.10.0,>=0.9.48 (from llama-index)
  Downloading llama_index_legacy-0.9.48-py3-none-any.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m34.8 MB/s[0m eta [36m0:00:00[0m
Collecting llama-index-multi-modal-llms-openai<0.2.0,>=0.1.3 (from llama-index)
  Downloading llama_index_multi_modal_llms_openai-0.1.4-py3-none-any.whl (5.8 kB)
Collecting llama-index-program-openai<0.2.0,>=0.1.3 (from llama-index)
  Do

In [None]:
import os
import openai

In [None]:
os.environ["OPENAI_API_KEY"] = " "

In [None]:
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.node_parser import SentenceWindowNodeParser
from llama_index.core.node_parser import SentenceSplitter

# create the sentence window node parser w/ default settings
node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=3,
    window_metadata_key="window",
    original_text_metadata_key="original_text",
)

# base node parser is a sentence splitter
text_splitter = SentenceSplitter()

llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
embed_model = HuggingFaceEmbedding(
    model_name="sentence-transformers/all-mpnet-base-v2", max_length=512
)

from llama_index.core import Settings

Settings.llm = llm
Settings.embed_model = embed_model
Settings.text_splitter = text_splitter

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

## Load Data, Build the Index

In this section, we load data and build the vector index.

### Load Data

Here, we build an index using VCF design document.

In [None]:
from llama_index.core import SimpleDirectoryReader

#documents = SimpleDirectoryReader(
#    input_files=["./IPCC_AR6_WGII_Chapter03.pdf"]
#).load_data()

documents = SimpleDirectoryReader(
    input_files=["/content/vcf-51-design.pdf"]
).load_data()

### Extract Nodes

We extract out the set of nodes that will be stored in the VectorIndex. This includes both the nodes with the sentence window parser, as well as the "base" nodes extracted using the standard parser.

In [None]:
nodes = node_parser.get_nodes_from_documents(documents)

In [None]:
base_nodes = text_splitter.get_nodes_from_documents(documents)

### Build the Indexes

We build both the sentence index, as well as the "base" index (with default chunk sizes).

In [None]:
from llama_index.core import VectorStoreIndex

sentence_index = VectorStoreIndex(nodes)

In [None]:
base_index = VectorStoreIndex(base_nodes)

## Querying

### With MetadataReplacementPostProcessor

Here, we now use the `MetadataReplacementPostProcessor` to replace the sentence in each node with it's surrounding context.

In [None]:
from llama_index.core.postprocessor import MetadataReplacementPostProcessor

query_engine = sentence_index.as_query_engine(
    similarity_top_k=2,
    # the target key defaults to `window` to match the node_parser's default
    node_postprocessors=[
        MetadataReplacementPostProcessor(target_metadata_key="window")
    ],
)

window_response = query_engine.query(
    "How to design vSAN?"
)

print(window_response)

To design vSAN, you need to determine the size of the compute and storage resources required for vSAN storage, as well as configure the network for vSAN traffic. Additionally, for multiple availability zones, you should extend the resource size and configure the vSAN witness host. It is important to consider the logical design, hardware configuration, network design, witness design, and specific requirements and recommendations for vSAN when designing it for VMware Cloud Foundation.


We can also check the original sentence that was retrieved for each node, as well as the actual window of sentences that was sent to the LLM.

In [None]:
window = window_response.source_nodes[0].node.metadata["window"]
sentence = window_response.source_nodes[0].node.metadata["original_text"]

print(f"Window: {window}")
print("------------------")
print(f"Original Sentence: {sentence}")

Window: Choosing a vSAN Architecture Model
Start choosing an architecture
 model for vSAN
Use vSAN OSA Use vSAN ESADo you need 
vSAN stretched 
clusters?
 No
No
NoYes
Yes
YesIs 25-GbE 
networking 
a constraint?
 Is a hybrid 
vSAN configuration 
required?
 vSAN Physical Requirements and Dependencies
vSAN has the following requirements and options:
nvSAN Original Storage Architecture (OSA) as hybrid storage or all-flash storage.
 nA vSAN hybrid storage configuration requires both magnetic devices and flash caching 
devices.  The cache tier must be at least 10% of the size of the capacity tier.
 nAn all-flash vSAN configuration requires flash devices for both the caching and capacity 
tiers.

------------------
Original Sentence: vSAN Physical Requirements and Dependencies
vSAN has the following requirements and options:
nvSAN Original Storage Architecture (OSA) as hybrid storage or all-flash storage.



### Contrast with normal VectorStoreIndex

In [None]:
query_engine = base_index.as_query_engine(similarity_top_k=2)
#vector_response = query_engine.query(
#    "What are the concerns surrounding the AMOC?"
#)
vector_response = query_engine.query(
    "How to design vSAN?"
)
print(vector_response)

To design vSAN, you need to consider the requirements and options available. You can choose between vSAN Original Storage Architecture (OSA) as hybrid storage or all-flash storage. For a hybrid storage configuration, you will need both magnetic devices and flash caching devices, with the cache tier being at least 10% of the size of the capacity tier. An all-flash vSAN configuration requires flash devices for both the caching and capacity tiers. You can utilize VMware vSAN ReadyNodes or hardware from the VMware Compatibility Guide to build your own vSAN setup. Additionally, for VMware Cloud Foundation, vSAN is recommended as the principal storage type for the management domain and VI workload domains, allowing for a cost-efficient storage solution with simple management and support for future storage expansion.
