### Retrieval-Augmented Generation
`Query -> Search a Database -> Relevant Documents -> Send to LLM -> Contextually Relevant Answer` <br/>

Complexity from decisions based on:
- Chunking.
- Databases.
- Preprocessing query.
- Postprocessing results.
- Semantic vs Keywords.
- Hypothetical searches.
- Multi-hop retrieval.
- Agentic retrieval.

#### Multi-Hop Retrieval
`Question -> LM <-> Hybrid Search from DB` <br/>
`Context -> LM <-> DB` <br/>
`Context -> LM -> Answer` <br/>

#### Hybrid HyDE Search
`Question -> HyDE LM -> (Semantic Query -> Embedding Search) + (BM-25 Query -> BM-25 Search) -> Reciprocal Rank Fusion`

### Setup Jokes DB
<a href="https://www.kaggle.com/datasets/abhinavmoudgil95/short-jokes">Dataset link.</a>

In [None]:
import torch
import numpy as np
from transformers import DistilBertModel, DistilBertTokenizer

tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
model = DistilBertModel.from_pretrained('distilbert-base-uncased')

def embed_texts(texts):
    encoded_input = tokenizer(texts, padding=True, return_tensors='pt')
    with torch.no_grad():
        model_output = model(**encoded_input)
    embeddings = model_output.last_hidden_state[:,0,:].numpy()
    embeddings = embeddings / np.linalg.norm(embeddings, axis=1, keepdims=True)
    
    return embeddings

In [2]:
import pandas as pd
from tqdm import tqdm

data = pd.read_csv('shortjokes.csv')
jokes = data['Joke'].values
jokes = jokes[:5000]
batch_size = 512
all_embeddings = []
for i in tqdm(range(0, len(jokes), batch_size), desc='Generating embeddings'):
    batch_texts = jokes[i:i+batch_size].tolist()
    batch_embeddings = embed_texts(batch_texts)
    all_embeddings.append(batch_embeddings)

embeddings = np.concatenate(all_embeddings, axis=0)
print(f'Total embeddings: {len(embeddings)}')
np.save('embeddings.npy', embeddings)
with open('jokes.txt', 'w') as f:
    for joke in jokes:
        f.write(joke+'\n')


Generating embeddings: 100%|██████████| 10/10 [01:45<00:00, 10.54s/it]

Total embeddings: 5000





In [8]:
class BasicEmbeddingsRAG:
    def __init__(self, texts, embeddings):
        self.texts = texts
        self.embeddings = embeddings / np.linalg.norm(embeddings, axis=1, keepdims=True)
    
    def get_nearest(self, query: str, k: int = 10):
        query_emb = embed_texts([query])
        query_emb = query_emb / np.linalg.norm(query_emb, axis=1, keepdims=True)
        
        # cosine similarity
        # only need dot-product as the embeddings are already normalized
        similarity = np.dot(query_emb, self.embeddings.T).flatten()
        
        topk_idxs = np.argpartition(similarity, -k)[-k:]
        topk_idxs = sorted(topk_idxs, key=lambda x: similarity[x],
                           reverse=True)
        
        return [self.texts[i] for i in topk_idxs]

In [16]:
import time

query = 'Laugh'
with open('jokes.txt', 'r') as f:
    jokes = [l.strip() for l in f.readlines()]
embs = np.load('embeddings.npy')

basic_rag = BasicEmbeddingsRAG(jokes, embs)

start = time.time()
nearest = basic_rag.get_nearest(query, k=10)
end = time.time()

print(f'Time: {end - start}')
print(nearest)

Time: 0.017000675201416016
["The best joke you'll never hear", 'Meet the parents', 'Hire The Pretty Blonde', 'Just one time I wanna see The Bachelor get a cold sore', 'What do you call a bald porcupine? Pointless!', 'pull my upvote', "My life That's the joke.", 'What do you call corn with a sense of humor? Laughing stalk', 'What do you call a bald porcupine? Pointless.', 'Velcro. What a rip off!']


### JokeGenerator Example
`Query -> (Idea LM <-> WebSearch) -> Joke Idea -> (Joke LM <-> Joke DB) -> Joke`