<a href="https://colab.research.google.com/github/nicolaiberk/llm_ws/blob/main/notebooks/06a_rag.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Informed Prompting

THIS NOTEBOOK IS IN DEVELOPMENT. CURRENTLY CANNOT INSTALL FAISS, FINALIZE NOTEBOOK AT A LATER POINT

> ❗ ACTIVATE THE GPU BY SELECTING RUNTIME IN THE UPPER RIGHT > CONNECT TO RUNTIME > T4 GPU

In [None]:
!pip install sentence_transformers datasets faiss-cpu  transformers

> ❗ RESTART THE NOTEBOOK (DROPDOWN NEXT TO RUN ALL > RESTART SESSION)

The [sentence-transformers](https://sbert.net/) library provides an ecosystem of models designed specifically for efficient embedding generation. It works very similar to transformers:

In [None]:
from sentence_transformers import SentenceTransformer, CrossEncoder

We load a pretrained model:

In [None]:
model = SentenceTransformer("all-MiniLM-L6-v2")

Then we encode some sentences of interest:

In [None]:
sentences = [
    "The Great Wall of China was built over several dynasties, with most of the existing structure dating from the Ming Dynasty (1368-1644).",
    "The blue whale's heart alone can weigh as much as an automobile and is roughly the size of a small car.",
    "Studies show that the Dunning-Kruger effect causes people with low ability in a domain to overestimate their competence in that area.",
]

And encode them as embeddings:

In [None]:
# 2. Calculate embeddings by calling model.encode()
embeddings = model.encode(sentences)
print(embeddings.shape)

We can then calculate the cosine similarity of the sentences with each other:

In [None]:
# 3. Calculate the embedding similarities
similarities = model.similarity(embeddings, embeddings)
print(similarities)

## Similarity Search

This is particularly useful if we are searching something using a query:

In [None]:
query = "How large is a blue whales heart?"
query_embedding = model.encode([query])
similarities = model.similarity(query_embedding, embeddings)
print(similarities)

Looks good! Now we can then select the most similar context to add to the prompt:

In [None]:
best_index = similarities.squeeze().argmax().item() # get the index of the highest similarity

In [None]:
prompt = "Answer the Question. \nQuery:" + query + "\nContext: " + sentences[best_index]
print(prompt)

We can try and

In [None]:
from datasets import load_dataset

dataset = load_dataset("rag-datasets/rag-mini-wikipedia", "text-corpus")

In [None]:
# Always clean + use this corpus consistently
corpus = []
for item in dataset["passages"]:
    text = str(item).strip()
    if text:
        corpus.append(text)

# Embedding model
print("Encoding corpus...")
embedder = SentenceTransformer("all-MiniLM-L6-v2")
corpus_embeddings = embedder.encode(corpus, convert_to_tensor=True, device='cpu')
corpus_embeddings_np = corpus_embeddings.numpy()

# FAISS index
index = faiss.IndexFlatL2(corpus_embeddings_np.shape[1])
index.add(corpus_embeddings_np)

# Reranker model
# reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")

# Generator (choose one: local HF model or OpenAI)
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.3")
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.3", torch_dtype=torch.float16)
generator = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=150)

@spaces.GPU
def rag_pipeline(query):
    # Embed query
    query_embedding = embedder.encode([query], convert_to_tensor=True, device='cpu').numpy()

    # Retrieve top-k from FAISS
    D, I = index.search(query_embedding, k=5)
    retrieved_docs = [corpus[idx] for idx in I[0]]

    print("Retrieved indices:", I[0])
    print("Retrieved docs:")
    for doc in retrieved_docs:
        print("-", repr(doc))

    # # Rerank
    # rerank_pairs = [[str(query), str(doc)] for doc in retrieved_docs]
    # scores = reranker.predict(rerank_pairs)
    # reranked_docs = [doc for _, doc in sorted(zip(scores, retrieved_docs), reverse=True)]

    # Combine for context
    context = "\n\n".join(retrieved_docs[:2])
    prompt = f"""Answer the following question using the provided context.\n\nContext:\n{context}\n\nQuestion: {query}\nAnswer:"""

    # Generate
    response = generator(prompt)[0]["generated_text"]
    return response.split("Answer:")[-1].strip()