# Semantic Search and Retreival Augmented Generation

## Useful Resources

1. https://oreil.ly/5NRQi
2. https://oreil.ly/Bbnrd

## Overview Of Semantic Search and RAG

Three broad categories of models based on their best use for search :
- Dense Retrieval
    - Relies on concept of embeddings and turn the search problem into retrieving the nearest neighbors of search queries.
    - Dense retrieval takes a search query, consults it's archive of texts, and outputs a set of relevant results.
    - Dense retrieval is one of the key types of semantic search, relying on the similarity of text embeddings to retrieve relevant results.
- Re-Ranking :
    - Rerankers, the second key type of semantic search, take a search queryand a collection of results, and reorder them by relevance, often resulting in vastly improved results.
- RAG
    - RAG are text generation systems that incorporate search
capabilities to reduce hallucinations, increase factuality, and/or ground the generation model on a specific dataset.
    - A RAG system formulates an answer to a question and preferably cites it's information sources.

## Sematic search with Language Models


### Dense Retrieval


In [None]:

import cohere
import numpy as np
import pandas as pd
from tqdm import tqdm

In [None]:
from dotenv import load_dotenv
load_dotenv()

In [None]:
api_key = "YOUR_COHERE_API_KEY"


In [None]:
# Create and retrieve a Cohere API key from os.cohere.ai
co = cohere.Client(api_key)

In [None]:
text = """
Interstellar is a 2014 epic science fiction film co-written, directed, and pro
duced by Christopher Nolan.
It stars Matthew McConaughey, Anne Hathaway, Jessica Chastain, Bill Irwin,
Ellen Burstyn, Matt Damon, and Michael Caine.
Set in a dystopian future where humanity is struggling to survive, the film
follows a group of astronauts who travel through a wormhole near Saturn in
search of a new home for mankind. 

Brothers Christopher and Jonathan Nolan wrote the screenplay, which had its
origins in a script Jonathan developed in 2007.
Caltech theoretical physicist and 2017 Nobel laureate in Physics[4] Kip Thorne
was an executive producer, acted as a scientific consultant, and wrote a tie-in
book, The Science of Interstellar.
Cinematographer Hoyte van Hoytema shot it on 35 mm movie film in the Panavision
anamorphic format and IMAX 70 mm.
Principal photography began in late 2013 and took place in Alberta, Iceland,
and Los Angeles.

Interstellar uses extensive practical and miniature effects and the company
Double Negative created additional digital effects.
Interstellar premiered on October 26, 2014, in Los Angeles.
In the United States, it was first released on film stock, expanding to venues
using digital projectors.
The film had a worldwide gross over $677 million (and $773 million with subse
quent re-releases),making it the tenth-highest grossing film of 2014.
It received acclaim for its performances, direction, screenplay, musical score,
visual effects, ambition, themes, and emotional weight.
It has also received praise from many astronomers for its scientific accuracy
and portrayal of theoretical astrophysics. Since its premiere, Interstellar
gained a cult following,[5] and now is regarded by many sci-fi experts as one
of the best science-fiction films of all time.
Interstellar was nominated for five awards at the 87th Academy Awards, winning
Best Visual Effects, and received numerous other accolades
"""

In [None]:
# Split into a list of sentences
texts = text.split('.')

In [None]:
# Clean up to remove empty spaces and new lines
texts = [t.strip('\n') for t in texts]

**Embedding the Text Chunks**

In [None]:
# Get the embeddings
response = co.embed(
    texts = texts,
    input_type = "search_document",
).embeddings

embeds = np.array(response)
print(embeds.shape)

**Building the Search Index**

An index stores the embeddings and is optimized to quickly retrieve the nearest neighbors even if we have a very large number of points.

In [None]:
import faiss

dim = embeds.shape[1]
index = faiss.IndexFlatL2(dim)
print(index.is_trained)
index.add(np.float32(embeds))

**Search The Index**

We can now search the dataset using any query we want. We simply
embed the query and present its embedding to the index, which will retrieve the most similar sentence from the Wikipedia article.

In [None]:
def search(query, number_of_results = 3):
    # 1. Get the query's embeddings
    query_embed = co.embed(
        texts = [query],
        input_type = "search_query"
    ).embeddings[0]

    # Retrieve the nearest neighbours
    distances, similar_item_ids = index.search(np.float32([query_embed]), number_of_results)

    # 3. Format the results
    texts_np = np.array(texts)   # Convert texts list to numpy for easier indexing
    results = pd.DataFrame(data = {
        'tests': texts_np[similar_item_ids[0]],
        'distance' : distances[0]
    })

    # Print and return the results
    print(f"Query: '{query}' \nNearest neighbours:")
    return results

In [None]:
query = "how precise was the science"
results = search(query)
results

defining a keyword search function to compare the two. We’ll use the BM25 algorithm, which is one of the leading lexical search methods

In [None]:
from rank_bm25 import BM25Okapi
from sklearn.feature_extraction import _stop_words
import string

In [None]:
def bm25_tokenizer(text):
    tokenized_doc = []
    for token in text.lower().split():
        token = token.strip(string.punctuation)

        if len(token) > 0 and token not in _stop_words.ENGLISH_STOP_WORDS:
            tokenized_doc.append(token)
    return tokenized_doc



In [None]:
tokenized_corpus = []
for passage in tqdm(texts):
    tokenized_corpus.append(bm25_tokenizer(passage))

bm25 = BM25Okapi(tokenized_corpus)


In [None]:
def keyword_search(query, top_k = 3, num_candidates = 15):
    print("Input Question: ", query)

    ########## BM25 search (Lexical Search) ######################
    bm25_scores = bm25.get_scores(bm25_tokenizer(query))
    top_n = np.argpartition(bm25_scores, -num_candidates)[-num_candidates:]
    bm25_hits = [{'corpus_id': idx, 'score': bm25_scores[idx]} for idx in top_n]
    bm25_hits = sorted(bm25_hits, key = lambda x:x['score'], reverse = True)

    print(f"Top-3 lexical search (BM25) hits")
    for hit in bm25_hits[0:top_k]:
        print("\t{:.3f}\t{}".format(hit['score'], texts[hit['corpus_id']].replace("\n", " ")))
        

In [None]:
keyword_search(query = "how precise was the science")


## Re-Ranking


In [None]:
query = "how precise was the science"
results = co.rerank(
    query = query,
    documents = texts,
    top_n = 3,
    return_documents = True
)
results.results

# Print the results
for idx, result in enumerate(results.results):
    print(idx, result.relevance_score, result.document.text)

In [None]:
def keyword_and_reranking_search(query, top_k = 3, num_candidates = 10):
    print("Input Question : ", query)

    ######### BM25 search (lexical search) ###################
    bm25_scores = bm25.get_scores(bm25_tokenizer(query))
    top_n = np.argpartition(bm25_scores, -num_candidates)[-num_candidates: ]
    bm25_hits = [{'corpus_id':idx, 'score':bm25_scores[idx]} for idx in top_n]
    bm25_hits = sorted(bm25_hits, key= lambda x: x['score'], reverse = True)

    print(f"Top-3 lexical search (BM25) hits")
    for hit in bm25_hits[0:top_k]:
        print("\t{:.3f}\t{}".format(hit['score'], texts[hit['corpus_id']].replace("\n", " ")))

    # Add reranking
    docs = [texts[hit['corpus_id']] for hit in bm25_hits]

    print(f"\nTop-3 hits by rank-API ({len(bm25_hits)} BM25 hits reranked)")
    results = co.rerank(query=query, documents = docs, top_n = top_k, return_documents = True)

    # Print(results.results)
    for hit in results.results:
        # print(hit)
        print("\t{:.3f}\t{}".format(hit.relevance_score, hit.document.text.replace("\n", " ")))
    
    

In [None]:
keyword_and_reranking_search(query = "how precise was the science")

## Grounded Generation with an LLM API


In [None]:
query = "income generated"

# 1. Retrieval
# We'll use embedding search. But ideally we'd do hybrid
results = search('query')

print(results)

In [None]:
# 2. Grounded Generation
docs_dict = [{'text': text} for text in results['tests']]
response = co.chat(
    message = query,
    documents = docs_dict
)

print(response.text)

## Example : RAG with Local Models


In [None]:
from langchain import LlamaCpp

# Make sure the model is correct for you system
llm = LlamaCpp(
    model_path = "Phi-3-mini-4k-instruct-fp16.gguf",
    n_gpu_layers = -1,
    max_tokens = 500,
    n_ctx = 2048,
    seed = 42,
    verbose = False
)

### Load the embedding Model


In [None]:
from langchain.embeddings.huggingface import HuggingFaceEmbeddings


In [None]:
# Embedding Model for converting text to numerical representations
embedding_model = HuggingFaceEmbeddings(
    model_name = 'thenlper/gte-small'
)

In [None]:
from langchain.vectorstores import FAISS

# Create a local vector database
db = FAISS.from_texts(texts, embedding_model)

### The RAG prompt

In [None]:
from langchain import PromptTemplate

In [None]:
# Create a prompt template
template = """<|user|>
Relevant Information:
{context}

Provide a concise answer the following question using the relevant information provided above:
{question}<|end|>
<|assistant|>"""

prompt = PromptTemplate(
    template = template,
    input_variables = ["context", "question"]
)

In [None]:
from langchain.chains import RetrievalQA


In [None]:
# RAG pipeline
rag = RetrievalQA.from_chain_type(
    llm = llm,
    chain_type = 'stuff',
    retriever = db.as_retriever(),
    chain_type_kwargs = {'prompt': prompt},
    verbose = True
)

In [None]:
rag.invoke('Income Generated')