In [None]:
%%capture
!pip install llama-index==0.10.37 llama-index-embeddings-openai==0.1.9 qdrant-client==1.9.1 llama-index-vector-stores-qdrant==0.2.8 llama-index-llms-openai==0.1.19

In [None]:
import os
import sys
from getpass import getpass
import nest_asyncio

from IPython.display import Markdown, display

from dotenv import load_dotenv

nest_asyncio.apply()

load_dotenv("")

sys.path.append('../helpers')

from utils import setup_llm, setup_embed_model, setup_vector_store

In [None]:
OPENAI_API_KEY = os.environ['OPENAI_API_KEY'] or getpass("Enter your OpenAI API key: ")

In [None]:
QDRANT_URL = ":memory:"

In [None]:
QDRANT_API_KEY = os.environ['QDRANT_API_KEY'] or  getpass("Enter your Qdrant API Key:")

In [None]:
from llama_index.core.settings import Settings
from llama_index.llms.openai import OpenAI
from utils import setup_llm, setup_embed_model

setup_llm(
    provider="openai", 
    model="gpt-4o", 
    api_key=OPENAI_API_KEY
    )

setup_embed_model(provider="openai", api_key=OPENAI_API_KEY)

In [None]:
import random
from utils import get_documents_from_docstore, group_documents_by_author, sample_documents

documents = get_documents_from_docstore("../data/words-of-the-senpais")

random.seed(42)

documents_by_author = group_documents_by_author(documents)

senpai_documents = sample_documents(documents_by_author, num_samples=10)

In [None]:
from llama_index.core import StorageContext
from llama_index.core.settings import Settings

from llama_index.core.node_parser import SentenceSplitter
from utils import create_index, create_query_engine, ingest, setup_vector_store

COLLECTION_NAME = "node-postprocessors"

vector_store = setup_vector_store(QDRANT_URL, QDRANT_API_KEY, COLLECTION_NAME)

sentence_splitter = SentenceSplitter(chunk_size=256, chunk_overlap=16)

nodes = ingest(
    documents=senpai_documents,
    transformations=[sentence_splitter, Settings.embed_model],
    vector_store=vector_store
)
index = create_index(
    from_where="vector_store",
    vector_store=vector_store,
    embed_model=Settings.embed_model,
    )

# Node Postprocessors 

A postprocessor is a tool that applies some additional processing or filtering to a list of nodes returned by a query and returns the final results. Node postprocessors are modules that take a set of nodes, apply some kind of transformation or filtering, and return them.

In LlamaIndex, node postprocessors are commonly used within a query engine after the node retrieval step and before the response synthesis step.

Here, we'll discuss the following postprocessors (you can find the source code for all of these [here](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/postprocessor/node.py#L64C1-L140C17))

 - 👥 `SimilarityPostprocessor`

 - 🔑 `KeywordNodePostprocessor`

 - 🔀 `MetadataReplacementPostProcessor`

 - ↸ `LongContextReorder`
 
 - 📝 `SentenceEmbeddingOptimizer`

Node Postprocessors discussed here are instances of the [`BaseNodePostProcessor`](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/postprocessor/types.py) base class that provides a structure for creating custom post-processors in a query pipeline. 

This means they all have a common API and usage pattern.

#### ⠷ Usage Pattern

Node postprocessors are used in a query engine, where they are applied to the nodes returned from a retriever, and before the response synthesis step.

A really simple usage pattern is like so:

```python
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.postprocessor import YourPostProcessOfChoice

... # prior steps to this point: index is defined

your_post_processor = YourPostProcessOfChoice(WhateverArgumentsYouNeedToPass)

query_engine = index.as_query_engine(
    ..., #all your othe query engine arguments
    node_postprocessors = your_post_processor
)

# for just a single query
response = query_engine.query("your query")

# use the query pipeline and apply to your entire dataset if you'd like
```

##### 👉🏼  By this point, after seeing the pattern a dozen or more times, I hope you've picked up on it and can apply it yourself. I'll describe the post processors and how to instantiate them, but you can apply them to your query engine and use them on your own. I believe in you!

## 👥 [`SimilarityPostprocessor`](https://github.com/run-llama/llama_index/blob/f116d75557d6867ed2cc61811a1c2f0b0c4d4ddb/llama-index-core/llama_index/core/postprocessor/node.py#L64C1-L140C17)


This does what it says on the tin. 


It filters a list of nodes based on their similarity scores. You provide a `similarity_cutoff` value to control the filtering threshold. Nodes with similarity scores above the cutoff (or all nodes if no cutoff is provided) are included in the output list.

That's all there is to it.

##### Arguments you need to know

- `similarity_cutoff`: The minimum similarity score required for a node to be included in the output. If not provided, no filtering is performed. Which is kinda pointless to do, so if you use this then make sure you provide a value.

##### Under the hood

Not satisfied with that explanation from above? Ok, let's go more in depth...

1. The method checks if `similarity_cutoff` is set (not `None`).

2. It iterates over each node in the input list.

3. For each node, it checks if the node's similarity score is above the `similarity_cutoff` (if set).

   - If the similarity score is `None` or below the cutoff, the node is not included in the output list.

   - If the similarity score is above the cutoff or if `similarity_cutoff` is not set, the node is included in the output list.

4. The method returns the filtered list of `NodeWithScore` objects that passed the similarity cutoff.

In [None]:
from llama_index.core.postprocessor import SimilarityPostprocessor

similarity_postprocessor = SimilarityPostprocessor(similarity_cutoff=0.7)

# 🔑[`KeywordNodePostprocessor`](https://github.com/run-llama/llama_index/blob/f116d75557d6867ed2cc61811a1c2f0b0c4d4ddb/llama-index-core/llama_index/core/postprocessor/node.py#L20)

This filters nodes based on the presence or absence of specific keywords. You provide a list of required keywords and/or a list of exclude keywords. Nodes that contain all the required keywords and none of the exclude keywords are included in the output list.

#### Arguments you need to know

- `required_keywords`: A list of keywords that must be present in a node's content for it to be included in the output. If not provided, this criterion is not applied.

- `exclude_keywords`: A list of keywords that must not be present in a node's content for it to be included in the output. If not provided, this criterion is not applied.

- `lang`: The language of the text in the nodes. Default is "en" (English).

#### Under the hood

1. The method iterates over each `NodeWithScore` object in the input list.

2. For each node, it retrieves the node's content and processes it using the Spacy language model.

3. If `required_keywords` are provided and the node's content does not match any of the required keywords, the node is skipped.

4. If `exclude_keywords` are provided and the node's content matches any of the exclude keywords, the node is skipped.

5. If a node passes both the required and exclude keyword checks (or if no keywords are provided), it is included in the output list.

6. The method returns the filtered list of `NodeWithScore` objects that passed the keyword criteria.



In [None]:
from llama_index.core.postprocessor import KeywordNodePostprocessor

required_keywords = ["luck", "destiny", "hardwork", "mind", "decision"]

exclude_keywords = ["Trulia", "Redfin", "Zillow"]

keyword_postprocessor = KeywordNodePostprocessor(
    required_keywords=required_keywords, exclude_keywords=exclude_keywords
)

In [None]:
from utils import create_query_engine
from utils import create_query_pipeline
from llama_index.core.query_pipeline import InputComponent

query_engine = create_query_engine(
    index=index, 
    mode="query",
    response_mode="compact",
    similiarty_top_k=5,
    vector_store_query_mode="mmr", 
    vector_store_kwargs={"mmr_threshold": 0.42},
    post_processor=keyword_postprocessor,
    return_source=True
    )

input_component = InputComponent()

query_chain = [input_component, query_engine]

query_pipeline = create_query_pipeline(query_chain)

In [None]:
response = query_engine.query("How to make good decisions?")

In [None]:
response.source_nodes[0].__dict__

In [None]:
query_pipeline.run(input="How to make good decisions?")

# 🔀 [`MetadataReplacementPostProcessor`](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/postprocessor/metadata_replacement.py)

It allows you to replace the content of each node in the input list with a specific metadata key value, instead of the original content.

If the specified field is not present in the metadata, then the original node content remains unchanged. This feature is most useful when combined with the `SentenceWindowNodeParser`. 

#### Arguments you need to know

- `target_metadata_key`: The metadata key whose value will replace the node's content.

Under the hood

1. It iterates over each `NodeWithScore` object in the input list.
2
. For each node, it retrieves the value of the specified `target_metadata_key` from the node's metadata.
   - If the metadata key exists, its value is used to replace the node's content.
   - If the metadata key does not exist, the node's content remains unchanged.

3. The method returns the modified list of `NodeWithScore` objects.


In [None]:
from llama_index.core.postprocessor import MetadataReplacementPostProcessor

postprocessor = MetadataReplacementPostProcessor(
    target_metadata_key="window",
)

# ↸ [`LongContextReorder`](https://github.com/run-llama/llama_index/blob/236fae0e8a30d6f8f9ac31777868d1d868933d79/llama-index-core/llama_index/core/postprocessor/node.py#L358)

[According to a study](https://arxiv.org/abs/2307.03172), language models often struggle to access significant details that are located in the middle of an extended context.

 The study found that the best performance is achieved when important information is positioned at the beginning or end of the input context. Even models designed for long contexts experience a notable drop in performance as the input context lengthens. 

To address this issue, the `LongContextReorder` class reorders the nodes based on their relevance scores, which is helpful in cases where a large top-k is needed. It alternately places the nodes with the highest scores at the beginning and end of the list, with the goal of improving the model's ability to access important information in long contexts. 

This is done to address the issue highlighted in the mentioned study, where models struggle to access important information in the middle of long contexts.

#### Arguments you need to know

You don't need to pass any arguments.

#### Under the hood

1. It sorts the input `nodes` list based on their relevance scores in descending order (highest score first). If a node's score is `None`, it is treated as 0.

2. It iterates over the sorted list.

3. For each node, it checks if the index is even (i.e., 0, 2, 4, ...) or odd (i.e., 1, 3, 5, ...).

   - If the index is even, the node is inserted at the beginning of the `reordered_nodes` list using the `insert` method with index 0. This ensures that nodes with even indices are placed at the beginning of the list.

   - If the index is odd, the node is appended to the end of the `reordered_nodes` list using the `append` method.

4. Finally, the method returns the `reordered_nodes` list containing the nodes in the new order.


In [None]:
from llama_index.core.postprocessor import LongContextReorder

postprocessor = LongContextReorder()

# 📝 [`SentenceEmbeddingOptimizer`](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/postprocessor/optimizer.py)

The `SentenceEmbeddingOptimizer` is a node postprocessor that improves the text content of nodes based on their relevance to a given query. 

It uses embedding-based similarity scores to select the most relevant sentences and shorten the input text. The optimizer removes sentences that are not related to the query using embeddings. The percentile cutoff determines the top percentage of relevant sentences to be used. 

Alternatively, the threshold cutoff can be specified to select which sentences to keep based on a raw similarity cutoff.

# Arguments you need to know

All of these arguments are optional and you can combine them as you see fit.

- `embed_model`: The embedding model to use for generating sentence embeddings. Defaults to `OpenAIEmbedding` if not provided.

- `percentile_cutoff`: The percentile cutoff for selecting the top sentences based on their similarity scores. The number of sentences selected is calculated as `int(len(sentences) * percentile_cutoff)`.  Ex, if `percentile_cutoff` is set to 0.5, it means that the top 50% of sentences with the highest similarity scores will be selected. If you want to select a fixed percentage of the most relevant sentences, use `percentile_cutoff`.

- `threshold_cutoff`: The threshold cutoff for selecting sentences based on their similarity scores. Ex, if `threshold_cutoff` is set to 0.7, only sentences with a similarity score greater than or equal to 0.7 will be selected. If you want to select sentences that meet a minimum similarity score criterion, use `threshold_cutoff`. You can, of course, combine this together with `percentile_cutoff`.

- `tokenizer_fn`: A function to split the text into sentences. Defaults to the NLTK English tokenizer if not provided.

- `context_before`: The number of sentences to include before the selected relevant sentences for additional context.

- `context_after`: The number of sentences to include after the selected relevant sentences for additional context.

#### Under the hood
For each node:

1. Retrieves the text content of the node.

2. Splits the text into sentences using the tokenizer function.

3. Generates embeddings for the query (if not provided) and the sentences.

4. Calculates the similarity scores between the query embedding and sentence embeddings.

5. Selects the top sentences based on the percentile cutoff and/or threshold cutoff.

6. Retrieves the context sentences before and after the selected sentences.

7. Joins the selected sentences and their context to form the optimized text.

8. Sets the optimized text as the new content of the node.

In [None]:
from llama_index.core.postprocessor import SentenceEmbeddingOptimizer
from utils import setup_embed_model

postprocessor = SentenceEmbeddingOptimizer(
    embed_model=Settings.embed_model,
    percentile_cutoff=0.5,
    threshold_cutoff=0.7
)

# You cannot pass a `node_postprocessor` directly to as_retriever.

In the LlamaIndex framework, node postprocessors are configured and used within the context of a `QueryEngine` rather than directly with the retriever. The retriever is responsible for fetching the most relevant nodes based on the query, and the node postprocessors are then applied within the `QueryEngine` to further refine, filter, or augment these nodes before the final response synthesis.

Here's how you typically set up a QueryEngine with node postprocessors:

In [None]:
from llama_index.core import VectorStoreIndex
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.indices.postprocessor import SimilarityPostprocessor

# Assuming 'index' is already created and configured
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=10
)

# Configure node postprocessors
node_postprocessors = [
    SimilarityPostprocessor(similarity_cutoff=0.7)
]

# Assemble the Query Engine with the retriever and node postprocessors
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    node_postprocessors=node_postprocessors
)

# Now you can use this query engine to process queries
response = query_engine.query("Your query here")
print(response)