# Re-Ranking

In the context of RAG (Retrieval-Augmented Generation), reranking of retrieval results is a crucial step that refines the initial set of retrieved documents based on their relevance to the input query. This process involves re-scoring the retrieved documents using a more sophisticated model, such as a cross-encoder, to better capture the semantic similarity between the query and the documents. The reranked list of documents is then used as input for the generation model, ensuring that the most relevant and accurate information is utilized to generate the final output.

![Cross Encoder Image](https://raw.githubusercontent.com/UKPLab/sentence-transformers/master/docs/img/CrossEncoder.png)


Read more [here](https://www.sbert.net/examples/applications/retrieve_rerank/README.html)

Here are the steps:
* [Loading the reranking model](#loading-the-reranking-model)
* [Lading retrieval results](#loading-retrieval-results)
* [Calculating reranking score](#calculating-the-re-ranking-scores)
* [Generating a reply on the reranked documents](#using-merged-results-to-generate-a-reply)

## Visual Improvements

In [1]:
from rich.console import Console
from rich_theme_manager import Theme, ThemeManager
import pathlib

theme_dir = pathlib.Path("themes")
theme_manager = ThemeManager(theme_dir=theme_dir)
dark = theme_manager.get("dark")

# Create a console with the dark theme
console = Console(theme=dark)

In [2]:
import warnings

# Suppress warnings
warnings.filterwarnings('ignore')

## Loading the Reranking model

In [3]:
from sentence_transformers import CrossEncoder 
cross_encoder = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
console.print(cross_encoder.model)

## Loading retrieval results

We will load the retrieval results from the previous Hybrid-Search notebook, to avoid repetition. We can ignore the scores of the dense and sparse index, as we will calculate the ranking score based on the text of the document/chunk.

In [4]:
import json
hybrid_search_results = {}
with open('data/dense_results.json') as f:
    dense_results = json.load(f)
    for doc in dense_results:
        hybrid_search_results[doc['id']] = doc
with open('data/sparse_results.json') as f:
    sparse_results = json.load(f)
    for doc in sparse_results:
        hybrid_search_results[doc['id']] = doc
console.print(hybrid_search_results)

In [5]:
# This is the query that we used for the retrieval of the above documents
query = "What is context size of Mixtral?"

## Calculating the re-ranking scores

We are using the `cross_encoder` to calculate the match score.

In [6]:
pairs = [[query, doc['text']] for doc in hybrid_search_results.values()] 
scores = cross_encoder.predict(pairs) 

console.print(scores)

## Selecting top 3 reranked documents

In [7]:
# Combine scores with corresponding document IDs
results_with_scores = [
    (doc_id, hybrid_search_results[doc_id]['text'], score)
    for doc_id, score in zip(hybrid_search_results.keys(), scores)
]

# Sort results by score in descending order and take the top 3
top_results = sorted(results_with_scores, key=lambda x: x[2], reverse=True)[:3]


In [8]:
import numpy as np
from rich.table import Table
table = Table(title="Top 3 Documents after Reranking", show_lines=True)

table.add_column("ID", justify="right", style="cyan", no_wrap=True)
table.add_column("Score", justify="right", style="green", no_wrap=True)
table.add_column("Document", style="#e87d3e")

# Add rows to the table with top 3 results
for doc_id, text, score in top_results:
    table.add_row(str(doc_id), f"{score:.4f}", text)

console.print(table)

## Using merged results to generate a reply

We can now take the improved merged results and call the LLM to generate the reply to the user's query.

In [9]:
# define a variable to hold the search results for the generation model
search_results = [doc[1] for doc in top_results]

In [10]:
from dotenv import load_dotenv

load_dotenv()

True

In [11]:
# Now time to connect to the large language model
from openai import OpenAI
from rich.text import Text

client = OpenAI()
completion = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are chatbot, an research expert. Your top priority is to help guide users to understand reserach papers."},
        {"role": "user", "content": query},
        {"role": "assistant", "content": str(search_results)}
    ]
)

response_text = Text(completion.choices[0].message.content)

In [12]:
from rich.panel import Panel

panel = Panel(response_text, title=f"Hybrid Search with Reranking Reply to \"{query}\"")
console.print(panel)