# Sentence Window Retrieval Pack

This LlamaPack provides an example of our sentence window retriever.

In [1]:
import nest_asyncio

nest_asyncio.apply()

### Setup Data

In [2]:
!wget "https://www.dropbox.com/s/f6bmb19xdg0xedm/paul_graham_essay.txt?dl=1" -O paul_graham_essay.txt

--2023-11-26 19:47:32--  https://www.dropbox.com/s/f6bmb19xdg0xedm/paul_graham_essay.txt?dl=1
Resolving www.dropbox.com (www.dropbox.com)... 2620:100:6017:18::a27d:212, 162.125.13.18
Connecting to www.dropbox.com (www.dropbox.com)|2620:100:6017:18::a27d:212|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: /s/dl/f6bmb19xdg0xedm/paul_graham_essay.txt [following]
--2023-11-26 19:47:33--  https://www.dropbox.com/s/dl/f6bmb19xdg0xedm/paul_graham_essay.txt
Reusing existing connection to [www.dropbox.com]:443.
HTTP request sent, awaiting response... 302 Found
Location: https://uc5bd2ddbf6f7adbcc0b0f2b0f70.dl.dropboxusercontent.com/cd/0/get/CISajRC9E-jXCUmjwbMOz-HbrY4pt7I5ZOUNRWUHVA20k-NvzHIOxgUAlyf3LacM0TLbQvw7ZBnZID1s2Zr7gqRhfg-xMXwO4YO-c6UswKydEBd5avPneT1AydRAutGtr2lavAdm7-9FM7c5iMqDHJD1/file?dl=1# [following]
--2023-11-26 19:47:33--  https://uc5bd2ddbf6f7adbcc0b0f2b0f70.dl.dropboxusercontent.com/cd/0/get/CISajRC9E-jXCUmjwbMOz-HbrY4pt7I5ZOUNRWUHVA20k-NvzHIOxgUA

In [3]:
from llama_index import SimpleDirectoryReader
from llama_index.node_parser import SimpleNodeParser

# load in some sample data
reader = SimpleDirectoryReader(input_files=["paul_graham_essay.txt"])
documents = reader.load_data()

### Download and Initialize Pack

Note that this pack directly takes in the html file, no need to load it beforehand.

In [4]:
from llama_index.llama_pack import download_llama_pack

SentenceWindowRetrieverPack = download_llama_pack(
    "SentenceWindowRetrieverPack",
    "./sentence_window_retriever_pack",
    # leave the below commented out (was for testing purposes)
    # llama_hub_url="https://raw.githubusercontent.com/run-llama/llama-hub/jerry/add_llama_packs/llama_hub",
)

In [5]:
sentence_window_retriever_pack = SentenceWindowRetrieverPack(
    documents,
)

  from .autonotebook import tqdm as notebook_tqdm
Downloading config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 571/571 [00:00<00:00, 1.02MB/s]
Downloading pytorch_model.bin: 100%|█████████████████████████████████████████████████████████████████████████████████████| 438M/438M [00:09<00:00, 44.7MB/s]
Downloading tokenizer_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████| 363/363 [00:00<00:00, 1.83MB/s]
Downloading vocab.txt: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 232k/232k [00:00<00:00, 1.80MB/s]
Downloading tokenizer.json: 100%|████████████████████████████████████████████████████████████████████████████████████████| 466k/466k [00:00<00:00, 25.7MB/s]
Downloading (…)cial_tokens_map.json: 100%|██████████████████████████████████████████████████████████████████████████████████| 239/239 [00:00<00:00, 953kB/s]


### Run Pack

In [6]:
# this will run the full pack
response = sentence_window_retriever_pack.run("What did the author do growing up?")

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [7]:
print(str(response))

The author dropped out of college and taught himself to paint.


In [8]:
len(response.source_nodes)

2

### Inspect Modules

In [9]:
modules = sentence_window_retriever_pack.get_modules()
display(modules)

{'sentence_index': <llama_index.indices.vector_store.base.VectorStoreIndex at 0x2993fde10>,
 'node_parser': SentenceWindowNodeParser(include_metadata=True, include_prev_next_rel=True, callback_manager=<llama_index.callbacks.base.CallbackManager object at 0x287c981f0>, sentence_splitter=<function split_by_sentence_tokenizer.<locals>.split at 0x287bad750>, window_size=3, window_metadata_key='window', original_text_metadata_key='original_text'),
 'postprocessor': MetadataReplacementPostProcessor(callback_manager=<llama_index.callbacks.base.CallbackManager object at 0x108e24070>, target_metadata_key='window'),
 'llm': OpenAI(callback_manager=<llama_index.callbacks.base.CallbackManager object at 0x108e24070>, model='gpt-3.5-turbo', temperature=0.1, max_tokens=None, additional_kwargs={}, max_retries=3, timeout=60.0, default_headers=None, api_key='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', api_base='https://api.openai.com/v1', api_version=''),
 'embed_model': HuggingFaceEmbedding(m