### Gen AI Retrieval Optimization

#### Cohere Reranking Model
- Reranking takes a query and a response, and outputs a relevance score between them
- Reranking is a technique that leverages embeddings as the last stage of a retrieval process, and is especially useful in RAG systems.


#### Reranking lexical/semantic search results
- Rerank requires just a single line of code to implement. We call the endpoint using co.rerank() and pass the following arguments:

       - query: The user query
       - documents: The list of documents
       - top_n: The top reranked documents to select
       - model: We choose Rerank English 3

In [None]:
import cohere

# API Token for Cohere to be fetched
co = cohere.ClientV2(api_key="89YCpwWN6NK1uzPedI24mNstkVDebz8WMy2D7286")



In [2]:
# Define the documents
faqs_short = [
    {"text": "Reimbursing Travel Expenses: Easily manage your travel expenses by submitting them through our finance tool. Approvals are prompt and straightforward."},
    {"text": "Working from Abroad: Working remotely from another country is possible. Simply coordinate with your manager and ensure your availability during core hours."},
    {"text": "Health and Wellness Benefits: We care about your well-being and offer gym memberships, on-site yoga classes, and comprehensive health insurance."},
    {"text": "Performance Reviews Frequency: We conduct informal check-ins every quarter and formal performance reviews twice a year."},
    {"text": "Internal Performance Check: Quarterly Performance Appraisal meetings to be held as part of evaluation."}
]

# Add the user query
query = "Can you provide some information related to Performance review?"
# Rerank the documents
results = co.rerank(query=query,
                    documents=faqs_short,
                    top_n=3,
                    model='rerank-english-v3.0')
print(results)



In [5]:
# Display the reranking results
def return_results(results, documents):    
    for idx, result in enumerate(results.results):
        print(f"Rank: {idx+1}") 
        print(f"Score: {result.relevance_score}")
        print(f"Document: {documents[result.index]}\n")
    
return_results(results, faqs_short)

Rank: 1
Score: 0.44095063
Document: {'text': 'Performance Reviews Frequency: We conduct informal check-ins every quarter and formal performance reviews twice a year.'}

Rank: 2
Score: 0.00109019
Document: {'text': 'Internal Performance Check: Quarterly Performance Appraisal meetings to be held as part of evaluation.'}

Rank: 3
Score: 3.4808123e-05
Document: {'text': 'Working from Abroad: Working remotely from another country is possible. Simply coordinate with your manager and ensure your availability during core hours.'}



#### Reranking Semi-structured search
- Rerank model supports multi-aspect and semi-structured data like emails, invoices, JSON documents, code, and tables

In [None]:
cohere_key="89YCpwWN6NK1uzPedI24mNstkVDebz8WMy2D7286"
co = cohere.Client(cohere_key)

# JSON to represent emails
emails = [
    {
        "from": "Paul Doe <paul_fake_doe@oracle.com>",
        "to": ["Steve <steve@me.com>", "lisa@example.com"],
        "date": "2024-03-27",
        "subject": "Follow-up",
        "text": "We are happy to give you the following pricing for your project."
    },
    {
        "from": "John McGill <john_fake_mcgill@microsoft.com>",
        "to": ["Steve <steve@me.com>"],
        "date": "2024-03-28",
        "subject": "Missing Information",
        "text": "Sorry, but here is the pricing you asked for for the newest line of your models."
    },
    {
        "from": "John McGill <john_fake_mcgill@microsoft.com>",
        "to": ["Steve <steve@me.com>"],
        "date": "2024-02-15",
        "subject": "Commited Pricing Strategy",
        "text": "I know we went back and forth on this during the call but the pricing for now should follow the agreement at hand."
    },
    {
        "from": "Generic Airline Company<no_reply@generic_airline_email.com>",
        "to": ["Steve <steve@me.com>"],
        "date": "2023-07-25",
        "subject": "Your latest flight travel plans",
        "text": "Thank you for choose to fly Generic Airline Company. Your booking status is confirmed."
    },
    {
        "from": "Generic SaaS Company<marketing@generic_saas_email.com>",
        "to": ["Steve <steve@me.com>"],
        "date": "2024-01-26",
        "subject": "How to build generative AI applications using Generic Company Name",
        "text": "Hey Steve! Generative AI is growing so quickly and we know you want to build fast!"
    },
    {
        "from": "Paul Doe <paul_fake_doe@oracle.com>",
        "to": ["Steve <steve@me.com>", "lisa@example.com"],
        "date": "2024-04-09",
        "subject": "Price Adjustment",
        "text": "Re: our previous correspondence on 3/27 we'd like to make an amendment on our pricing proposal. We'll have to decrease the expected base price by 5%."
    },
]

#Define which fields we want to include for the ranking:
rank_fields = ["from", "to", "date", "subject", "text"]

# Define a query. Here we ask for the pricing from Mircosoft (MS). 
# The model needs to combine information from the email (john_fake_mcgill@microsoft.com>)
# and the body
query = "What is the pricing that we received from MS?\n"

#Call rerank, pass in the query, docs, and the rank_fields. Set the model to 'rerank-english-v3.0' or 'rerank-multilingual-v3.0'
results = co.rerank(query=query, documents=emails, top_n=2, model='rerank-english-v3.0', rank_fields=rank_fields)

print("Query:", query)
for hit in results.results:
  email = emails[hit.index]
  print(email)
print('\n')

print('Below are the results from Reranker:')
return_results(results,emails)


Query: What is the pricing that we received from MS?

{'from': 'John McGill <john_fake_mcgill@microsoft.com>', 'to': ['Steve <steve@me.com>'], 'date': '2024-03-28', 'subject': 'Missing Information', 'text': 'Sorry, but here is the pricing you asked for for the newest line of your models.'}
{'from': 'John McGill <john_fake_mcgill@microsoft.com>', 'to': ['Steve <steve@me.com>'], 'date': '2024-02-15', 'subject': 'Commited Pricing Strategy', 'text': 'I know we went back and forth on this during the call but the pricing for now should follow the agreement at hand.'}


Below are the results from Reranker:
Rank: 1
Score: 0.22084616
Document: {'from': 'John McGill <john_fake_mcgill@microsoft.com>', 'to': ['Steve <steve@me.com>'], 'date': '2024-03-28', 'subject': 'Missing Information', 'text': 'Sorry, but here is the pricing you asked for for the newest line of your models.'}

Rank: 2
Score: 0.040845715
Document: {'from': 'John McGill <john_fake_mcgill@microsoft.com>', 'to': ['Steve <steve@me.c

#### Hybrid Reranking Model



##### Bi-Encoder models
- The models that compute dense vector embeddings for data (which later can be used for search queries) are so-called Bi-Encoder models
- Data vectors and query vectors can be compared by computing the similarity (for example cosine similarity) between vectors
##### Cross-Encoder models
- Cross-Encoder models do not produce vector embeddings for data, but use a classification mechanism for data pairs instead
- Cross-Encoder always takes inputs in pairs i.e. Search Query and Data Item

_Combining both the above encoders is more beneficial as illustrated in below example,_
    



all-MiniLM-L6-v2 - _Sentence-transformers model that maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search._


In [2]:
"""
This script contains an example how to perform re-ranking with a Cross-Encoder for semantic search.

First, we use an efficient Bi-Encoder to retrieve similar questions from the Quora Duplicate Questions dataset:
https://www.quora.com/q/quoradata/First-Quora-Dataset-Release-Question-Pairs

Then, we re-rank the hits from the Bi-Encoder using a Cross-Encoder.
"""

import csv
import os
import pickle
import time

from sentence_transformers import CrossEncoder, SentenceTransformer, util

# We use a BiEncoder (SentenceTransformer) that produces embeddings for questions.
# We then search for similar questions using cosine similarity and identify the top 100 most similar questions
model_name = "all-MiniLM-L6-v2"
model = SentenceTransformer(model_name)
num_candidates = 500

# To refine the results, we use a CrossEncoder. A CrossEncoder gets both inputs (input_question, retrieved_question)
# and outputs a score 0...1 indicating the similarity.
cross_encoder_model = CrossEncoder("cross-encoder/stsb-roberta-base")  # Model - Sentence similarity pair regression

# Dataset we want to use
url = "http://qim.fs.quoracdn.net/quora_duplicate_questions.tsv"
dataset_path = "quora_duplicate_questions.tsv"
max_corpus_size = 20000

# Some local file to cache computed embeddings
embedding_cache_path = "quora-embeddings-{}-size-{}.pkl".format(model_name.replace("/", "_"), max_corpus_size)

# Check if embedding cache path exists
if not os.path.exists(embedding_cache_path):
    # Check if the dataset exists. If not, download and extract
    # Download dataset if needed
    if not os.path.exists(dataset_path):
        print("Download dataset")
        util.http_get(url, dataset_path)

    # Get all unique sentences from the file
    corpus_sentences = set()
    with open(dataset_path, encoding="utf8") as fIn:
        reader = csv.DictReader(fIn, delimiter="\t", quoting=csv.QUOTE_MINIMAL)
        for row in reader:
            corpus_sentences.add(row["question1"])
            if len(corpus_sentences) >= max_corpus_size:
                break

            corpus_sentences.add(row["question2"])
            if len(corpus_sentences) >= max_corpus_size:
                break

    corpus_sentences = list(corpus_sentences)
    print("Encode the corpus. This might take a while")
    corpus_embeddings = model.encode(corpus_sentences, show_progress_bar=True, convert_to_tensor=True)

    print("Store file on disc")
    with open(embedding_cache_path, "wb") as fOut:
        pickle.dump({"sentences": corpus_sentences, "embeddings": corpus_embeddings}, fOut)
else:
    print("Load pre-computed embeddings from disc")
    with open(embedding_cache_path, "rb") as fIn:
        cache_data = pickle.load(fIn)
        corpus_sentences = cache_data["sentences"][0:max_corpus_size]
        corpus_embeddings = cache_data["embeddings"][0:max_corpus_size]

###############################
print(f"Corpus loaded with {len(corpus_sentences)} sentences / embeddings")

while True:
    inp_question = input("Please enter a question: ")
    print("Input question:", inp_question)

    # First, retrieve candidates using cosine similarity search
    start_time = time.time()
    question_embedding = model.encode(inp_question, convert_to_tensor=True)
    hits = util.semantic_search(question_embedding, corpus_embeddings, top_k=num_candidates)
    hits = hits[0]  # Get the hits for the first query

    print(f"Cosine-Similarity search took {time.time() - start_time:.3f} seconds")
    print("Top 5 hits with cosine-similarity:")
    for hit in hits[0:5]:
        print("\t{:.3f}\t{}".format(hit["score"], corpus_sentences[hit["corpus_id"]]))

    # Now, do the re-ranking with the cross-encoder
    start_time = time.time()
    sentence_pairs = [[inp_question, corpus_sentences[hit["corpus_id"]]] for hit in hits]
    ce_scores = cross_encoder_model.predict(sentence_pairs)

    for idx in range(len(hits)):
        hits[idx]["cross-encoder_score"] = ce_scores[idx]

    # Sort list by CrossEncoder scores
    hits = sorted(hits, key=lambda x: x["cross-encoder_score"], reverse=True)
    print(f"\nRe-ranking with CrossEncoder took {time.time() - start_time:.3f} seconds")
    print("Top 5 hits with CrossEncoder:")
    for hit in hits[0:5]:
        print("\t{:.3f}\t{}".format(hit["cross-encoder_score"], corpus_sentences[hit["corpus_id"]]))

    print("\n\n========\n")

Load pre-computed embeddings from disc
Corpus loaded with 20000 sentences / embeddings
Input question: Can you brief about Operation Sindoor
Cosine-Similarity search took 0.118 seconds
Top 5 hits with cosine-similarity:
	0.415	If I hacked into the NSA and Pentagon and dumped all their files, (1) how big would the manhunt for me be; (2) would it finally open the eyes of the public, and; (3) how would I be seen by the hacker circle?
	0.389	What is operation quality?
	0.388	What is cyber security?
	0.385	What is the operation quality?
	0.385	How can Bitcoin be hacked?

Re-ranking with CrossEncoder took 24.780 seconds
Top 5 hits with CrossEncoder:
	0.236	We are maintaining our data as a flat file in server. What are the ways handled by hackers to open the flat file?
	0.216	If I hacked into the NSA and Pentagon and dumped all their files, (1) how big would the manhunt for me be; (2) would it finally open the eyes of the public, and; (3) how would I be seen by the hacker circle?
	0.209	Escal

KeyboardInterrupt: 

In [4]:
pip freeze > requirements.txt


Note: you may need to restart the kernel to use updated packages.
