In [None]:
%%capture
!pip install llama-index==0.10.37 llama-index-embeddings-openai==0.1.9 qdrant-client==1.9.1 llama-index-vector-stores-qdrant==0.2.8 llama-index-llms-openai==0.1.19

In [None]:
import os
import sys
from getpass import getpass
import nest_asyncio

from IPython.display import Markdown, display

from dotenv import load_dotenv

nest_asyncio.apply()

load_dotenv("")

sys.path.append('../helpers')

from utils import setup_llm, setup_embed_model, setup_vector_store

In [None]:
OPENAI_API_KEY = os.environ['OPENAI_API_KEY'] or getpass("Enter your OpenAI API key: ")

In [None]:
CO_API_KEY = os.environ['CO_API_KEY'] or getpass("Enter your Cohere API key: ")

In [None]:
QDRANT_URL = ":memory:"

In [None]:
QDRANT_API_KEY = os.environ['QDRANT_API_KEY'] or  getpass("Enter your Qdrant API Key:")

In [None]:
from llama_index.core.settings import Settings
from llama_index.llms.openai import OpenAI
from utils import setup_llm, setup_embed_model

setup_llm(
    provider="openai", 
    model="gpt-4o", 
    api_key=OPENAI_API_KEY
    )

setup_embed_model(
    provider="openai", 
    model="text-embedding-3-small",
    api_key=OPENAI_API_KEY
    )

In [None]:
import random
from utils import get_documents_from_docstore, group_documents_by_author, sample_documents

documents = get_documents_from_docstore("../data/words-of-the-senpais")

random.seed(42)

documents_by_author = group_documents_by_author(documents)

senpai_documents = sample_documents(documents_by_author, num_samples=10)

In [None]:
from llama_index.core import StorageContext
from llama_index.core.settings import Settings

from llama_index.core.node_parser import SentenceSplitter
from utils import create_index, create_query_engine, ingest, setup_vector_store

COLLECTION_NAME = "node-reranking"

vector_store = setup_vector_store(QDRANT_URL, QDRANT_API_KEY, COLLECTION_NAME)

sentence_splitter = SentenceSplitter(chunk_size=256, chunk_overlap=16)

nodes = ingest(
    documents=senpai_documents,
    transformations=[sentence_splitter, Settings.embed_model],
    vector_store=vector_store
)
index = create_index(
    from_where="vector_store",
    vector_store=vector_store,
    embed_model=Settings.embed_model,
    )

## Recap of the LlamaIndex Order of Operations

In LlamaIndex, the order of operations in the query pipeline typically follows these steps:

**🍽️ Data Ingestion:** This is where your existing data from various sources and formats (APIs, PDFs, SQL, etc.) is ingested into the system.

**🗂️ Data Indexing:** The ingested data is structured into intermediate representations that are easy and performant for Large Language Models (LLMs) to consume.

**🐕 Retrieval:** Information is retrieved from your data sources based on the question or prompt. This is the first step in the Retrieval-Augmented Generation (RAG) process.

**🎖️ Reranking:** The initially retrieved documents or nodes are reordered based on certain criteria to bring the most relevant or useful nodes to the top.

**䷾ Post-processing:** After retrieval and reranking, transformations or filters are applied to the set of nodes to further refine them before they are used to generate the final response.

**💬 Response Generation:** The LLM generates a response based on the enriched prompt, which now includes the context from the retrieved and reranked documents.

We've already talked about post-processing, now let's discuss re-ranking!

# Reranking

In LlamaIndex, reranking and post-processing are two different steps in the query pipeline.

Reranking is a process that takes the initial set of retrieved nodes (documents or pieces of information) and reorders them based on some criteria. This could be based on a model's prediction of relevance, a time-based factor, or any other custom criteria. 

The goal of reranking is to bring the most relevant or useful nodes to the top of the list.

On the other hand, post-processing is a step that happens after the retrieval and reranking steps. 

It involves applying transformations or filters to the set of nodes. This could include filtering out nodes below a certain similarity score, applying a time decay factor, or any other custom transformation. The goal of post-processing is to further refine the set of nodes before they are used to synthesize the final response.

Both reranking and post-processing involve manipulating the set of retrieved nodes, they serve different purposes and occur at different stages in the query pipeline. 

### Reranking is about ordering the nodes, while post-processing is about transforming or filtering the nodes.

By far, the most popular reranking technique is using Cohere's Rerank model. And that's the only one we'll cover in this section.

### **ℹ️ Note:** Rerankers have the same usage pattern as post processors:


```python
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.postprocessor import YourPostProcessOfChoice

... # prior steps to this point: index is defined

your_post_processor = YourPostProcessOfChoice(WhateverArgumentsYouNeedToPass)

query_engine = index.as_query_engine(
    ..., #all your othe query engine arguments
    node_postprocessors = your_post_processor
)

# for just a single query
response = query_engine.query("your query")

# use the query pipeline and apply to your entire dataset if you'd like
```

Keep in mind that order does matter when you put a reranker and a postprocessor into a query engine. The order in which you list them determines the order in which they are applied.

# [Cohere Rerank](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/postprocessor/llama-index-postprocessor-cohere-rerank/llama_index/postprocessor/cohere_rerank/base.py)

The `CohereRerank` postprocessor  reranks a list of nodes based on their relevance to a given query using the Cohere API. It enhances the relevance of the retrieved nodes by applying the Cohere reranking model.

Be sure you install it, like so `pip install llama-index-postprocessor-cohere-rerank`

#### Arguments you need to know

- `top_n`: The number of top nodes to return (default is 2).

- `model`: The name of the Cohere model to use. Default is `"rerank-english-v2.0"`, however I suggest using the latest model: `"rerank-english-v3.0"`.

- `api_key`: The Cohere API key. If not provided, it tries to retrieve the key from the `COHERE_API_KEY` environment variable.


#### Under the hood

The reranker is added as a postprocessor to the query engine. When a query is made, the retrieved nodes will be reranked using the Cohere API based on their relevance to the query, and the top 3 nodes will be returned.

1. When the postprocessor is called with a list of nodes and a query bundle, it extracts the text content from each node.

2. It uses the Cohere API's reranking functionality to rerank the nodes based on their relevance to the query.

3. The Cohere API returns a list of reranked results, each containing the index of the original node and a relevance score.

4. The postprocessor creates a new list of `NodeWithScore` objects based on the reranked results, preserving the original node and updating the relevance score.

5. The reranked list of nodes is returned.

In [None]:
from llama_index.postprocessor.cohere_rerank import CohereRerank

postprocessor = CohereRerank(
    top_n=5, model="rerank-english-v3.0", api_key=CO_API_KEY
)

In [None]:
from utils import create_query_engine
from utils import create_query_pipeline
from llama_index.core.query_pipeline import InputComponent

query_engine = create_query_engine(
    index=index, 
    mode="query",
    response_mode="compact",
    similiarty_top_k=15,
    vector_store_query_mode="mmr", 
    vector_store_kwargs={"mmr_threshold": 0.42},
    post_processor=postprocessor,
    return_source=True
    )

input_component = InputComponent()

query_chain = [input_component, query_engine]

query_pipeline = create_query_pipeline(query_chain)

In [None]:
response = query_engine.query("How can I ensure that my mind is always the sharpest it can be?")

In [None]:
response.__dict__

In [None]:
query_pipeline.run(input="How can I ensure that my mind is always the sharpest it can be?")

The two most popular alternatives to Cohere rerank are:

- [Colbert Rerank](https://docs.llamaindex.ai/en/stable/examples/node_postprocessor/ColbertRerank/). You can install this as `pip install llama-index-postprocessor-colbert-rerank` and import as `from llama_index.postprocessor.colbert_rerank import ColbertRerank`.

- [Flag Embedding Reranker](https://docs.llamaindex.ai/en/stable/examples/node_postprocessor/FlagEmbeddingReranker/). You can install this as `pip install llama-index-postprocessor-flag-embedding-reranker` and import as `from llama_index.postprocessor.flag_embedding_reranker import FlagEmbeddingReranker`.


Both have the same usage pattern as above, and you can refer to the source code for the specific arguments.


