### Load documents with IDs

- Loads a JSON file (`documents-with-ids.json`) containing documents with unique IDs used for RAG.

In [43]:
import requests
import pandas as pd

url_prefix = 'https://raw.githubusercontent.com/DataTalksClub/llm-zoomcamp/main/03-evaluation/search_evaluation/'

docs_url = url_prefix + 'documents-with-ids.json'
documents = requests.get(docs_url).json()

ground_truth_url = url_prefix + 'ground-truth-data.csv'
df_ground_truth = pd.read_csv(ground_truth_url)
ground_truth = df_ground_truth.to_dict(orient='records')

In [44]:
# sample of the orginal FAQ

documents[10]

{'text': 'It depends on your background and previous experience with modules. It is expected to require about 5 - 15 hours per week. [source1] [source2]\nYou can also calculate it yourself using this data and then update this answer.',
 'section': 'General course-related questions',
 'question': 'Course - \u200b\u200bHow many hours per week am I expected to spend on this  course?',
 'course': 'data-engineering-zoomcamp',
 'id': 'ea739c65'}

In [45]:
# lets create a map (dict) id to text
doc_idx = {d['id']: d for d in documents}

# here text is the answers to the questions
sample_data = doc_idx['c02e79ef']['text']
print(f'At id \'c02e79ef\' we have\n{sample_data}')

At id 'c02e79ef' we have
The purpose of this document is to capture frequently asked technical questions
The exact day and hour of the course will be 15th Jan 2024 at 17h00. The course will start with the first  “Office Hours'' live.1
Subscribe to course public Google Calendar (it works from Desktop only).
Register before the course starts using this link.
Join the course Telegram channel with announcements.
Don’t forget to register in DataTalks.Club's Slack and join the channel.


> now that we the data we want, it needs to be `indexed`.

### Index Data

#### Indexing:

Is the process of storing data in a structured way that allows for fast and efficient retrieval.

So let's use `elasticsearch` here.


Step 1: Pick a Transformer Model


In [46]:
from sentence_transformers import SentenceTransformer
model_name = 'multi-qa-MiniLM-L6-cos-v1'
model = SentenceTransformer(model_name)

### What did we do in this step?

#### Picked a Transformer Model

- This is a bi-encoder model trained for semantic similarity (e.g., question–answer retrieval)
- It turns input text into a 384-dimensional vector.
- You chose this so that instead of using exact words, you can compare meanings.

### Step 2: Connect to Elasticsearch

In [47]:
from elasticsearch import Elasticsearch
es_client = Elasticsearch('http://localhost:9200')

In [48]:
index_settings = {
    "settings": {
        "number_of_shards": 1,
        "number_of_replicas": 0
    },
    "mappings": {
        "properties": {
            "text": {"type": "text"},
            "section": {"type": "text"},
            "question": {"type": "text"},
            "course": {"type": "keyword"},
            "id": {"type": "keyword"},
            "question_text_vector": {
                "type": "dense_vector",
                "dims": 384,
                "index": True,
                "similarity": "cosine"
            }
        }
    }
}

# setting the name of the Elasticsearch index
index_name = "course-questions"

# This deletes the existing index named "course-questions" if it already exists.
# for the purpose of index exist error, we can find a better way of dealing with this
es_client.indices.delete(index=index_name, ignore_unavailable=True)

# This creates a new index named "course-questions" using the settings and mappings defined in index_settings.
es_client.indices.create(index=index_name, body=index_settings)


ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'course-questions'})

### Step 3: Define and Create the Index

#### Think of the index like this:

> An index in Elasticsearch is like a database table that's been created with a structure (`settings` + `mappings`) but no data yet.

- At creation, it’s empty, just a “container” with rules.
- You can then send documents into it, one by one or in bulk.
- As long as each document matches the mapping rules (e.g., `question_text_vector` must be a list of 384 floats), it will be accepted.

### Now lets prepare and send out data to our index (database)

- for prgress bar we will use the `tqdm`
    - `tqdm.auto` automatically decides the best way to show the progress bar

In [49]:
from tqdm.auto import tqdm

for doc in tqdm(documents):
    question = doc['question']
    text = doc['text']
    doc['question_text_vector'] = model.encode(question + ' ' + text)

    es_client.index(index=index_name, document=doc)

  0%|          | 0/948 [00:00<?, ?it/s]

- You take the question and text fields,

- Concatenate them,

- Encode the result using your transformer model, producing a vector,

- Store that vector inside the doc under the question_text_vector key.

- You send this updated document to Elasticsearch.

> Now that we have our knowledge DB, it's time to perforem different actions on it



### Retrieval 
- searching in the knowledge db

In [50]:
# field: the vector field to search in (e.g., 'question_text_vector')
# vector: the query vector you’re searching for
# course: filters results only to this course (e.g., "data-engineering")

def elastic_search_knn(field, vector, course):

    knn = {
        "field": field,
        "query_vector": vector,
        "k": 5, # top 5 
        "num_candidates": 10000, # Elasticsearch looks at up to 10,000 docs to find the best 5 (k). This improves quality.
        # filter: only search documents that belong to the
        "filter": {
            "term": {
                "course": course
            }
        }
    }

    # _source: only return these fields in the result (not everything), to keep it clean.
    search_query = {
        "knn": knn,
        "_source": ["text", "section", "question", "course", "id"]
    }

    # Runs the search in Elasticsearch using the query we just built.
    es_results = es_client.search(
        index=index_name,
        body=search_query
    )
    
    result_docs = []
    
    
    # ['hits']['hits'] is a list of  list of individual matched documents.
    for hit in es_results['hits']['hits']:
        result_docs.append(hit['_source'])

    return result_docs

# prepare your query
def question_text_vector_knn(q):
    
    # get the question text and course from the input q
    question = q['question']
    course = q['course']

    
    # use your transformer model to turn the question into a vector
    v_q = model.encode(question)

    return elastic_search_knn('question_text_vector', v_q, course)

### knn (k-nearest neighbors)

`knn` - In Elasticsearch, starting from version 8.0+, there's built-in support for k-NN (k-nearest neighbors) search on dense vectors.

```json
"knn": {
    "field": "question_text_vector",
    "query_vector": [...],
    "k": 5,
    "num_candidates": 10000,
    "filter": {
        "term": {"course": "search101"}
    }
}

```
this is Elasticsearch-specific syntax for dense vector search (using their built-in k-NN engine like HNSW).

So:

- `field` tells Elasticsearch which vector field to compare.

- `query_vector` is the encoded input vector.

- `k` is how many similar results you want.

- `num_candidates` affects performance/quality.

- `filter` lets you limit the search (e.g., by course).


### What is `['hits']['hits']`?

When you run a search in Elasticsearch, the response is a nested JSON object. It looks something like this:

```json
{
  "hits": {
    "total": 123,
    "hits": [
      {"_source": {...}},  // 1st result
      {"_source": {...}},  // 2nd result
      ...
    ]
  }
}

```
So:

- `es_results['hits']` → gives you the whole section of search results.

- `es_results['hits']['hits']` → gives you just the list of individual matched documents.

Then inside each hit, the actual document is found under `['_source']`.


### 🧠 Important point

- The `model` is not doing the search.

- `knn` is not doing the embedding.

They’re separate, but used together:

   -  `Model` = turns input → vector.

   -  `k-NN` = finds vectors most similar to it



##### Now that we have our knowledge DB and the way we do search on it (`knn`), let's test it.

In [51]:
question_text_vector_knn(dict(
    question='Are sessions recorded if I miss one?',
    course='machine-learning-zoomcamp'
))

[{'text': 'Everything is recorded, so you won’t miss anything. You will be able to ask your questions for office hours in advance and we will cover them during the live stream. Also, you can always ask questions in Slack.',
  'section': 'General course-related questions',
  'question': 'What if I miss a session?',
  'course': 'machine-learning-zoomcamp',
  'id': '5170565b'},
 {'text': 'The course videos are pre-recorded, you can start watching the course right now.\nWe will also occasionally have office hours - live sessions where we will answer your questions. The office hours sessions are recorded too.\nYou can see the office hours as well as the pre-recorded course videos in the course playlist on YouTube.',
  'section': 'General course-related questions',
  'question': 'Is it going to be live? When?',
  'course': 'machine-learning-zoomcamp',
  'id': '39fda9f0'},
 {'text': '(Hrithik Kumar Advani)',
  'section': '2. Machine Learning for Regression',
  'question': 'Useful Resource for

#### Now we have retrival working let go the next step. using the llm to give smarter answer. 

###

### The RAG Flow

> Take a question → find related answers (knn search) → feed both to GPT (llm) → get a smart, grounded response.

#### Let prepare the `prompt` for our llm model. This will shape the answer we get at the final result.

In [52]:
# query: the user’s question (a string)
# search_results: a list of documents returned from Elasticsearch
def build_prompt(query, search_results):
    prompt_template = """
You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database.
Use only the facts from the CONTEXT when answering the QUESTION.

QUESTION: {question}

CONTEXT: 
{context}
""".strip() # strip to remove a place whitespaces at the start and end of a string

    context = ""
    
    # from the list of documents we get from Elasticsearch (after knn), get the secion question and text fields
    # so we can use them as a context
    for doc in search_results:
        context = context + f"section: {doc['section']}\nquestion: {doc['question']}\nanswer: {doc['text']}\n\n"
    
    # format the prompt, the .format will help as insert the variables in the {}
    prompt = prompt_template.format(question=query, context=context).strip()
    return prompt

We buidl the promp function which gives us a nicely formated formated query for the llm. Now lets get do the communication with LLM (chatgpt).

In [53]:
from openai import OpenAI

client = OpenAI()

def llm(prompt, model='gpt-3.5-turbo-1106'):
    
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}]
    )
    print
    
    return response.choices[0].message.content

#### now lets investigate the code.

In [54]:
# example
response = client.chat.completions.create(model='gpt-3.5-turbo-1106', messages=[{"role": "user", "content": "hi"}])

print("\nThe type of reponse:\n")
print(type(response))

print("\nOur repsonse object looks like this:\n")
print(response.model_dump())

print("\nThis goes in the object tree of datastructures get what we need, which is the chat reponse:\n")
print(response.choices[0].message.content)


The type of reponse:

<class 'openai.types.chat.chat_completion.ChatCompletion'>

Our repsonse object looks like this:

{'id': 'chatcmpl-BtPw0tpCrk0uk9sO6OoIqB7YD9eD6', 'choices': [{'finish_reason': 'stop', 'index': 0, 'logprobs': None, 'message': {'content': 'Hello! How can I assist you today?', 'refusal': None, 'role': 'assistant', 'annotations': [], 'audio': None, 'function_call': None, 'tool_calls': None}}], 'created': 1752547432, 'model': 'gpt-3.5-turbo-1106', 'object': 'chat.completion', 'service_tier': 'default', 'system_fingerprint': 'fp_982035f36f', 'usage': {'completion_tokens': 9, 'prompt_tokens': 8, 'total_tokens': 17, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}}

This goes in the object tree of datastructures get what we need, which is the chat reponse:

Hello! How can I assist you today?


### How we accessed the `<class 'openai.types.chat.chat_completion.ChatCompletion'>` object repsonse 
```
response
├── choices (list)
│   └── [0]|
│       └── message (dict)
│           └── content → 🟩 "Hello! How can I assist you today?"

```

### role field
- `system` and `user`

`"system"`: Tells the assistant how to answer (the behavior or style). (we are telling it in our prompt, but we could also set that using the system role).

`"user"`: Gives the assistant what to answer (the question or input).

- The assistant replies based on both the system’s instructions and the user’s input.

In [55]:
# example
client = OpenAI()

messages = [
    {"role": "system", "content": "You are a friendly assistant who replies with short answers."},
    {"role": "user", "content": "What's the capital of France?"}
]

response = client.chat.completions.create(
    model="gpt-3.5-turbo-1106",
    messages=messages
)
print(response.choices[0].message.content)

Paris.


### now we ahve the `search using knn` and `llm response` functions, the only missing is a function that combines these two and provide a final result.

In [61]:
def rag(query: dict, model="gpt-3.5-turbo-1106") -> str:
    
    # get top k answers
    search_results = question_text_vector_knn(query)
    
    # build prompt, using question and search result
    prompt = build_prompt(query['question'], search_results)
    
    # feed llm with prompt, and choose the model
    answer = llm(prompt, model=model)
    return answer

Now lets check our `rag` function (rag search):

In [62]:
ground_truth[10] 

{'question': 'Can I enroll in the course after it starts?',
 'course': 'data-engineering-zoomcamp',
 'document': '7842b56a'}

In [63]:
rag(ground_truth[10])

"Yes, you can still enroll in the course after it starts. There will be deadlines for turning in the final projects, so it's recommended not to leave everything for the last minute. Additionally, all the materials will be kept after the course finishes, so you can follow the course at your own pace after it finishes."

### Now we have a workinf RAG system, it's time to evauate using evaluating metrics

#### offline-evaluation 1 :  Cosine Similarity Metric