### **Very Simple RAG**  

We will use a very basic retrieval-augmented generation (RAG) approach. The idea is:  

1. The user performs a query.  
2. We take the query from (1) and perform an additional lookup to gather context.  
3. We take the query from (1) and the context from (2) to ask the LLM.  

In most applications and use cases, customer data is confidential, and the LLM has no prior knowledge of it. Furthermore, recent changes or factual data require a reference as a guide.  

In RAG, for step (2), we typically query a vector database, Elasticsearch, or other external services to retrieve context. Additionally, depending on the user's role and resource permissions, we refine the results further to prevent privileged information from being leaked.  

From 2023 to early 2024, most SaaS providers began incorporating RAG into their search and reporting tools to surface relevant information. Many startups also introduced capabilities to extract context from PDFs—for example, companies analyzing construction bids.  

---

### Ranking  

- Yes, still think about ranking—BM25 and similar techniques remain relevant.  
- Also, consider how you would validate performance. How would you measure CTR@1, CTR@5, and other key metrics?  

Nowadays, ranking is less critical if you don't require an immediate response.  

---

### Authorization  

- How would you ensure that privileged information is not exposed?  
---

### **RAG Use Cases**  

These are very simple to implement! You could build them in a day or so.  

- How would you design a service that automatically identifies **cheaper deals** or **artsy alternatives** based on the webpage you're browsing?  
- How would you build a **search engine** for permits or a city-wide **311 bot**?  
- How could we help a city's 311 operators with training or improve their efficiency? Could we use live transcription services to surface relevant content?

### Quiz  

We will build a basic RAG system that can answer questions about movies.  

1. The user performs a query.  
2. We take the query from (1) and retrieve relevant context.  
3. We select a few results from (2) and create a prompt.  
4. We ask the LLM to answer the question.  

This is the simplest example, but notice how the request flow remains the same, whether you're building RAG for **Cineplex booking info**, a **legal firm**, or any other domain.  

The core idea—performing an internal search, ranking the results, and adding relevant context to our prompt—remains unchanged!

In [None]:
import anthropic
from utils import chunk_text, ANTHROPIC_API_KEY, visualize_citations

client = anthropic.Anthropic(api_key=ANTHROPIC_API_KEY)

In [None]:
from typing import List, Dict
from sentence_transformers import SentenceTransformer
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

class SimpleRag():
    def __init__(self):
        self.encoder = SentenceTransformer('all-MiniLM-L6-v2')
        self.vector_store = {
            'embeddings': [],
            'sources': [],
            'chunks': []
        }

    def _add_to_store(self, file: str):
        with open(file, 'r', encoding='utf-8') as f:
            content = f.read()
        chunks = chunk_text(content)        
        embeddings = self.encoder.encode(chunks)
        self.vector_store['embeddings'].extend(embeddings)
        self.vector_store['sources'].extend([file] * len(chunks))
        self.vector_store['chunks'].extend(chunks)

    def retrieve_context(self, q: str, k: str):
        question_embedding = self.encoder.encode([q])[0]
        similarities = cosine_similarity([question_embedding], self.vector_store['embeddings'])[0]
        top_k_indices = np.argsort(similarities)[-k:][::-1]
        results = []

        # example
        # source = self.vector_store['sources'][top_k_indices[0]]
        # content = ....
        
        raise NotImplementedError("fill me")
        return results

    def make_prompt(self, q: str, ctxs: list[str]):
        raise NotImplementedError("fill me")
    
    def query(self, q: str):
        raise NotImplementedError("fill me")

In [None]:
rag = SimpleRag()
rag._add_to_store("docs/example.txt")

In [None]:
query = "Who directed movie Mickey 17, and what is it about?"

In [None]:
rag.query(query)

## Using Citation Support

Claude (our LLM) to provide detailed citations when answering questions about documents. 
Citations are a valuable affordance in many LLM powered applications to help users track and verify the sources of information in responses.

https://docs.anthropic.com/en/docs/build-with-claude/citations


In [None]:
import base64
import json

pdf_path = '' # fill me
with open(pdf_path, "rb") as f:
    pdf_data = base64.b64encode(f.read()).decode()

pdf_response = client.messages.create(
    model="claude-3-5-sonnet-latest",
    temperature=0.0,
    max_tokens=4000,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "base64",
                        "media_type": "application/pdf",
                        "data": pdf_data
                    },
                    "title": "", # fill me
                    "citations": {"enabled": True}
                },
                {
                    "type": "text",
                    "text": "" # fill me
                }
            ]
        }
    ]
)

print(pdf_response)

In [None]:
print(visualize_citations(pdf_response))