# Post-retrieval processing

In the "Post-retrieval" phase of RAG, the retrieved documents are processed to extract the relevant information. In order to optimize generation. 

The retrieval phase results in a list of documents. 

This notebook demonstrates three different techniques for post-retrieval processing:

- Reranking
- Compression
- Fusion

In [None]:
%pip install llama-index
%pip install llama-index-llms-openai
%pip install llama-index-postprocessor-rankgpt-rerank
%pip install llama-index-postprocessor-cohere-rerank
%pip install llama-index-postprocessor-longllmlingua
%pip install llmlingua

In [None]:
import os
from dotenv import load_dotenv
from util.helpers import get_wiki_pages, create_and_save_wiki_md_files

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.postprocessor.rankgpt_rerank import RankGPTRerank
from llama_index.postprocessor.cohere_rerank import CohereRerank
from llama_index.postprocessor.longllmlingua import LongLLMLinguaPostprocessor

Add `COHERE_API_KEY` with API key for the Cohere API to `.env` file.
Sign up for free and create one here: [Cohere Dashboard](https://dashboard.cohere.com/api-keys)

In [None]:
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
COHERE_API_KEY = os.getenv("COHERE_API_KEY")

In [None]:
pages = get_wiki_pages(["Vincent Van Gogh"])

In [None]:
create_and_save_wiki_md_files(pages=pages, path="./data/docs/wiki/")

In [None]:
documents = SimpleDirectoryReader("./data/docs/wiki/").load_data()

In [None]:
llm = OpenAI(api_key=OPENAI_API_KEY, model="gpt-3.5-turbo")

In [None]:
index = VectorStoreIndex.from_documents(
    documents,
    

)
retriever = VectorIndexRetriever(index=index)

## Reranking

Since vectors are essentially compressions of the meeaning behind some text, there is a loss of information. So what do we do if relevant information is below top_k cutoff for ou retrieval? The simplest solution would be to increase the top_k value, but this would increase the computational cost. Another problem is that LLMs suffer from the "Lost in the Middle" phenomenon, where it usually focuses on the extremes of the input prompt. This means that its prudent to have the most relevant information at the top of the list.

A solution to this problem is **reranking**. Reranking fundamentally reorders the documents chunks to highlight the most pertinent results first, effectively reducing the overall document pool, severing a dual purpose in information retrieval, acting as both an enhancer and a filter, delivering refined inputs for more precise language model processing.

In this example we will see two approaches to reranking:
- LLM reranking 
    - having a language model rerank the documents
    - specifically, we will use RankGPT using ChatGPT from OpenAI
- Ranking using Cohere Rerank3 - A managed reranking model by Cohere

### LLMRerank

The benefits of using a language model to rerank documents are that it can understand the context of the query and the documents, and can provide a more nuanced ranking.

RankGPT uses the following prompt to rank the retrieved documents:
```
You are RankGPT, an intelligent assistant that can rank passages based on their relevancy to the query.

I will provide you with {num} passages, each indicated by number identifier []. 

Rank the passages based on their relevance to query: {query}.
```

In [None]:
reranker = RankGPTRerank(llm=llm, top_n=3, verbose=True)

In [None]:
query_engine = index.as_query_engine(
    similarity_top_k=10,
    node_postprocessors=[reranker],
)

In [None]:
response = query_engine.query("Which places did Van Gogh live?")
response

### Cohere

TODO 

In [None]:
reranker = CohereRerank(api_key=COHERE_API_KEY, top_n=3, verbose=True)

In [None]:
query_engine = index.as_query_engine(
    similarity_top_k=20,
    node_postprocessors=[reranker],
)

In [None]:
response = query_engine.query("Who did Van Gogh live with?")
response

## Compression (LLMLingua)

TODO

In [None]:
llm_lingua_compressor = LongLLMLinguaPostprocessor(
    instruction_str="Given the context, please answer the final question",
    target_token=300,
    rank_method="longllmlingua",
    additional_compress_kwargs={
        "condition_compare": True,
        "condition_in_question": "after",
        "context_budget": "+100",
        "reorder_context": "sort",  # enable document reorder
        "dynamic_context_compression_ratio": 0.4, # enable dynamic compression ratio
    },
)

In [None]:
query_engine = index.as_query_engine(
    similarity_top_k=10,
    node_postprocessors=[llm_lingua_compressor],
)