## Q1. Running Elasticsearch
***

In [1]:
!curl localhost:9200

{
  "name" : "13df3b358b29",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "RO09YFAzTHSrScv13Pt7NQ",
  "version" : {
    "number" : "9.0.1",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "73f7594ea00db50aa7e941e151a5b3985f01e364",
    "build_date" : "2025-04-30T10:07:41.393025990Z",
    "build_snapshot" : false,
    "lucene_version" : "10.1.0",
    "minimum_wire_compatibility_version" : "8.18.0",
    "minimum_index_compatibility_version" : "8.0.0"
  },
  "tagline" : "You Know, for Search"
}


What is the `version.build.hash` value? 73f7594ea00db50aa7e941e151a5b3985f01e364

## Getting the data
***

In [2]:
import requests 

docs_url = 'https://github.com/DataTalksClub/llm-zoomcamp/blob/main/01-intro/documents.json?raw=1'
docs_response = requests.get(docs_url)
documents_raw = docs_response.json()

documents = []

for course in documents_raw:
    course_name = course['course']

    for doc in course['documents']:
        doc['course'] = course_name
        documents.append(doc)

In [3]:
documents[0]

{'text': "The purpose of this document is to capture frequently asked technical questions\nThe exact day and hour of the course will be 15th Jan 2024 at 17h00. The course will start with the first  “Office Hours'' live.1\nSubscribe to course public Google Calendar (it works from Desktop only).\nRegister before the course starts using this link.\nJoin the course Telegram channel with announcements.\nDon’t forget to register in DataTalks.Club's Slack and join the channel.",
 'section': 'General course-related questions',
 'question': 'Course - When will the course start?',
 'course': 'data-engineering-zoomcamp'}

## Q2. Indexing the data
***
Index the data in the same way as was shown in the course videos. Make the `course` field a keyword; the rest should be text.<br><br>
Which function do you use for adding your data to Elastic?<br>
* `insert`
* **`index`**
* `put`
* `add`

In [5]:
from elasticsearch import Elasticsearch
es_client = Elasticsearch('http://localhost:9200')
es_client.info()

ObjectApiResponse({'name': '13df3b358b29', 'cluster_name': 'docker-cluster', 'cluster_uuid': 'RO09YFAzTHSrScv13Pt7NQ', 'version': {'number': '9.0.1', 'build_flavor': 'default', 'build_type': 'docker', 'build_hash': '73f7594ea00db50aa7e941e151a5b3985f01e364', 'build_date': '2025-04-30T10:07:41.393025990Z', 'build_snapshot': False, 'lucene_version': '10.1.0', 'minimum_wire_compatibility_version': '8.18.0', 'minimum_index_compatibility_version': '8.0.0'}, 'tagline': 'You Know, for Search'})

In [6]:
index_settings = {
    "settings": {
        "number_of_shards": 1,
        "number_of_replicas": 0
    },
    "mappings": {
        "properties": {
            "text": {"type": "text"},
            "section": {"type": "text"},
            "question": {"type": "text"},
            "course": {"type": "keyword"} 
        }
    }
}

index_name = "course-questions"

es_client.indices.create(index = index_name, body = index_settings)

ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'course-questions'})

In [7]:
from tqdm.auto import tqdm

  from .autonotebook import tqdm as notebook_tqdm


In [8]:
for doc in tqdm(documents):  # tqdm provides progress bar
    es_client.index(index = index_name, document = doc)

100%|█████████████████████████████████████████████████████████████████████████████████████████████| 948/948 [00:04<00:00, 219.96it/s]


## Q3. Searching
***
Now let's search in our index

Execute the query "How do I execute a command on Kubernetes pod?".

Use only `question` and `text` fields and give `question` a boost of 4, and use `"type": "best_fields"`

What is the score for the top ranking result?

* 84.50
* 64.50
* **44.50**
* 24.50

Look at the `_score` field.

In [9]:
query = "How do I execute a command on Kubernetes pod?"

In [12]:
search_query = {
    "size": 5,
        "query": {
            "bool": {
                "must": {
                    "multi_match": {
                        "query": query,
                        "fields": ["question^4", "text", "section"],
                        "type": "best_fields"
                    }
            }
        }
    }
}

response = es_client.search(index = index_name, body = search_query)

result_docs = []

for hit in response['hits']['hits']:
    result_docs.append(hit['_score'])

result_docs

[44.50556, 39.406174, 37.250755, 35.433445, 33.70974]

## Q.4 Filtering
***
Now ask a different question: "How do copy a file to a Docker container?".

This time we are only interested in questions from `machine-learning-zoomcamp`.

Return 3 results. What's the 3rd question returned by the search engine?

* **How do I debug a docker container?**
* How do I copy files from a different folder into docker container’s working directory?
* How do Lambda container images work?
* How can I annotate a graph?

In [13]:
query = "How do copy a file to a Docker container?"

In [25]:
search_query = {
    "size": 3,
        "query": {
            "bool": {
                "must": {
                    "multi_match": {
                        "query": query,
                        "fields": ["question^4", "text", "section"],
                        "type": "best_fields"
                    }
                },
                "filter": {
                        "term": {
                        "course": "machine-learning-zoomcamp"
                    }
                }
                
        }
    }
}

response = es_client.search(index = index_name, body = search_query)

result_docs = [doc['_source'] for doc in response['hits']['hits']]

result_docs

[{'text': 'Launch the container image in interactive mode and overriding the entrypoint, so that it starts a bash command.\ndocker run -it --entrypoint bash <image>\nIf the container is already running, execute a command in the specific container:\ndocker ps (find the container-id)\ndocker exec -it <container-id> bash\n(Marcos MJD)',
  'section': '5. Deploying Machine Learning Models',
  'question': 'How do I debug a docker container?',
  'course': 'machine-learning-zoomcamp'},
 {'text': "You can copy files from your local machine into a Docker container using the docker cp command. Here's how to do it:\nTo copy a file or directory from your local machine into a running Docker container, you can use the `docker cp command`. The basic syntax is as follows:\ndocker cp /path/to/local/file_or_directory container_id:/path/in/container\nHrithik Kumar Advani",
  'section': '5. Deploying Machine Learning Models',
  'question': 'How do I copy files from my local machine to docker container?',
 

## Q.5 Building a prompt
***
Now we're ready to build a prompt to send to an LLM.

Take the records returned from Elasticsearch in Q4 and use this template to build the context. Separate context entries by two linebreaks (`\n\n`)

```python
context_template = """
Q: {question}
A: {text}
""".strip()
```

Now use the context you just created along with the "How do copy a file to a Docker container?" question to construct a prompt using the template below:

```python
prompt_template = """
You're a course teaching assistant . Answer the QUESTION based on the CONTEXT from the FAQ database.
Use only the facts from the CONTEXT when answering the QUESTION.

QUESTION: {question}

CONTEXT:
{context}
""".strip()
```

What's the length of the resulting prompt? (use the `len` function)

* 946
* 1446
* 1946
* 2446

In [29]:
from openai import OpenAI

In [30]:
client = OpenAI()  # make sure that the OPENAI_API_KEY environment variable is set.def search(query)

In [31]:
query = "How do copy a file to a Docker container?"

In [32]:
response = client.chat.completions.create(
    model = 'gpt-4o',
    messages = [{"role": "user", "content": query}]
)

In [33]:
response.choices[0].message.content

"To copy a file to a Docker container, you can use the `docker cp` command. This command allows you to copy files or directories between your host machine and a running Docker container. \n\nHere's how you can use it:\n\n### Copying from Host to Container\n\nLet's say you want to copy a file from your host machine to a Docker container. The syntax is:\n\n```\ndocker cp <source-path> <container-id>:<destination-path>\n```\n\n- `<source-path>` is the path to the file or directory on your host that you want to copy.\n- `<container-id>` is the ID or name of the container to which you want to copy the file.\n- `<destination-path>` is the location inside the container where you want to copy the file.\n\n**Example**: Copy a file named `example.txt` from your current directory on the host to the `/tmp` directory of a container named `my_container`.\n\n```bash\ndocker cp example.txt my_container:/tmp/\n```\n\n### Copying from Container to Host\n\nYou can also copy files from a Docker container 

In [35]:
prompt_template = """
You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database. 
Use only the facts from the CONTEXT when answering the QUESTION.

QUESTION: {question}

CONTEXT: {context}
""".strip()

context = ""

for doc in result_docs:
    context = context + f"section: {doc['section']}\nquestion: {doc['question']}\nanswer: {doc['text']}\n\n"

print(context)

section: 5. Deploying Machine Learning Models
question: How do I debug a docker container?
answer: Launch the container image in interactive mode and overriding the entrypoint, so that it starts a bash command.
docker run -it --entrypoint bash <image>
If the container is already running, execute a command in the specific container:
docker ps (find the container-id)
docker exec -it <container-id> bash
(Marcos MJD)

section: 5. Deploying Machine Learning Models
question: How do I copy files from my local machine to docker container?
answer: You can copy files from your local machine into a Docker container using the docker cp command. Here's how to do it:
To copy a file or directory from your local machine into a running Docker container, you can use the `docker cp command`. The basic syntax is as follows:
docker cp /path/to/local/file_or_directory container_id:/path/in/container
Hrithik Kumar Advani

section: 5. Deploying Machine Learning Models
question: How do I copy files from a diff

In [36]:
prompt = prompt_template.format(question = query, context = context).strip()

print(prompt)

You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database. 
Use only the facts from the CONTEXT when answering the QUESTION.

QUESTION: How do copy a file to a Docker container?

CONTEXT: section: 5. Deploying Machine Learning Models
question: How do I debug a docker container?
answer: Launch the container image in interactive mode and overriding the entrypoint, so that it starts a bash command.
docker run -it --entrypoint bash <image>
If the container is already running, execute a command in the specific container:
docker ps (find the container-id)
docker exec -it <container-id> bash
(Marcos MJD)

section: 5. Deploying Machine Learning Models
question: How do I copy files from my local machine to docker container?
answer: You can copy files from your local machine into a Docker container using the docker cp command. Here's how to do it:
To copy a file or directory from your local machine into a running Docker container, you can use the `docker 

#### Length of prompt

In [37]:
len(prompt)

1621

## Q.6 Tokens
***
When we use the OpenAI Platform, we're charged by the number of tokens we send in our prompt and receive in the response.

The OpenAI python package uses `tiktoken` for tokenization.

Let's calculate the number of tokens in our query:
```python
encoding = tiktoken.encoding_for_model("gpt-4o")
```
Use the encode function. How many tokens does our prompt have?

* 120
* 220
* **320**
* 420

Note: to decode back a token into a word, you can use the decode_single_token_bytes function:
```python
encoding.decode_single_token_bytes(63842)
```

In [40]:
import tiktoken

In [42]:
encoding = tiktoken.encoding_for_model("gpt-4o")

In [44]:
len(encoding.encode(prompt))

354

## Bonus: generating the answer (ungraded)
***
Let's send the prompt to OpenAI. What's the response?

In [38]:
response = client.chat.completions.create(
    model = 'gpt-4o',
    messages = [{"role": "user", "content": prompt}]
)

response.choices[0].message.content

'To copy a file to a Docker container, you can use the `docker cp` command. The basic syntax is:\n\n```bash\ndocker cp /path/to/local/file_or_directory container_id:/path/in/container\n```\n\nThis command enables you to transfer files from your local machine into a running Docker container.'

## Bonus: calculating the costs (ungraded)
***
Suppose that on average per request we send 150 tokens and receive back 250 tokens.

How much will it cost to run 1000 requests?

You can see the prices __[here](https://openai.com/api/pricing/)__

Here are today's API pricing for GPT-4o, the most cost-effective model.

* **Input tokens**: \$2.50 per 1 million
* **Output tokens**: \$10.00 per 1 million

***
### Cost per request
Given:

* ~150 input tokens
* ~250 output tokens

Cost individual request:

* Input cost = 150 tokens × (\$2.50 / 1 000 000) = \$0.000375
* Output cost = 250 tokens × (\$10.00 / 1 000 000) = \$0.002500
Total ≈ \$0.002875 per request
***
### Total cost for 1,000 requests
1,000 × \$0.002875 = **\$2.875**