# Chapter02 - Reranking

## Installation

In [1]:
!pip install -q -r requirements.txt

# Imports

In [5]:
import cohere
import numpy as np
import re
import pandas as pd
from tqdm import tqdm
from sklearn.metrics.pairwise import cosine_similarity
from annoy import AnnoyIndex

# Cohere API key
Create an `.env` file and add the Cohere API key to the following key `COHERE_API_KEY`

In [3]:
import os
from dotenv import load_dotenv
# Load env variables
load_dotenv(".env")

# Get the API key
COHERE_API_KEY = os.getenv("COHERE_API_KEY")

MODEL_NAME = os.getenv("MODEL_NAME") or "rerank-english-02"

In [6]:
# load texts
texts = np.load("texts.npy")
texts[:2]

array(['Interstellar is a 2014 epic science fiction film co-written, directed, and produced by Christopher Nolan',
       'It stars Matthew McConaughey, Anne Hathaway, Jessica Chastain, Bill Irwin, Ellen Burstyn, Matt Damon, and Michael Caine'],
      dtype='<U192')

## Create cohere connection

In [7]:
cohere_client = cohere.Client(COHERE_API_KEY)

### Using cohere for reranking

In [10]:
query = "film gross"
results = cohere_client.rerank(query=query, model=MODEL_NAME, documents=texts, top_n=3)

In [12]:
results = cohere_client.rerank(query=query, model=MODEL_NAME, documents=texts, top_n=3) # Change top_n to change the number of results returned. If top_n is not passed, all results will be returned.
for index, result in enumerate(results):
  print(f"Document Rank: {index + 1}, Document Index: {result.index}")
  print(f"Document: {result.document['text']}")
  print(f"Relevance Score: {result.relevance_score:.2f}")
  print("\n")

Document Rank: 1, Document Index: 10
Document: The film had a worldwide gross over $677 million (and $773 million with subsequent re-releases), making it the tenth-highest grossing film of 2014
Relevance Score: 0.92


Document Rank: 2, Document Index: 12
Document: It has also received praise from many astronomers for its scientific accuracy and portrayal of theoretical astrophysics
Relevance Score: 0.11


Document Rank: 3, Document Index: 2
Document: Set in a dystopian future where humanity is struggling to survive, the film follows a group of astronauts who travel through a wormhole near Saturn in search of a new home for mankind
Relevance Score: 0.03




As it's evident, passing thousands of documents to cohere is not feasible, therefore, we should have another mechanism to filter the indices before reranking which is called first stage of search! 

A very popular method is to classify texts based on being relevant to the topic:
1. Multi-Stage Document Ranking with BERT: https://arxiv.org/abs/1910.14424
2. Pretrained Transformers for Text Ranking: BERT and Beyond: https://arxiv.org/abs/2010.06467

*_:)_*