# Homework Module 1

## Imports

In [33]:
import requests
from elasticsearch import Elasticsearch
from tqdm.auto import tqdm
import tiktoken

## Q1. Running ElasticSearch

After executing the docker command to run ElasticSearch (I made it in the terminal), it is possible to check the cluster information.

In [6]:
!curl localhost:9200

{
  "name" : "5afa863f05f5",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "MpbXXBSNRMmtkl8bAa-3Pw",
  "version" : {
    "number" : "8.4.3",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "42f05b9372a9a4a470db3b52817899b99a76ee73",
    "build_date" : "2022-10-04T07:17:24.662462378Z",
    "build_snapshot" : false,
    "lucene_version" : "9.3.0",
    "minimum_wire_compatibility_version" : "7.17.0",
    "minimum_index_compatibility_version" : "7.0.0"
  },
  "tagline" : "You Know, for Search"
}


**Answer:** "build_hash" : "42f05b9372a9a4a470db3b52817899b99a76ee73"

## Q2. Indexing the data

In [7]:
# Get the data
docs_url = 'https://github.com/DataTalksClub/llm-zoomcamp/blob/main/01-intro/documents.json?raw=1'
docs_response = requests.get(docs_url)
documents_raw = docs_response.json()

In [8]:
documents = []

for course in documents_raw:
    course_name = course['course']

    for doc in course['documents']:
        doc['course'] = course_name
        documents.append(doc)

In [9]:
documents[40]

{'text': 'When the troubleshooting guide above does not help resolve it and you need another pair of eyeballs to spot mistakes. When asking a question, include as much information as possible:\nWhat are you coding on? What OS?\nWhat command did you run, which video did you follow? Etc etc\nWhat error did you get? Does it have a line number to the “offending” code and have you check it for typos?\nWhat have you tried that did not work? This answer is crucial as without it, helpers would ask you to do the suggestions in the error log first. Or just read this FAQ document.',
 'section': 'General course-related questions',
 'question': 'How to ask questions',
 'course': 'data-engineering-zoomcamp'}

In [10]:
es_client = Elasticsearch("http://localhost:9200")

In [11]:
# Here we can obtain the same answer as in Q1 but without using curl in the terminal
es_client.info()

ObjectApiResponse({'name': '5afa863f05f5', 'cluster_name': 'docker-cluster', 'cluster_uuid': 'MpbXXBSNRMmtkl8bAa-3Pw', 'version': {'number': '8.4.3', 'build_flavor': 'default', 'build_type': 'docker', 'build_hash': '42f05b9372a9a4a470db3b52817899b99a76ee73', 'build_date': '2022-10-04T07:17:24.662462378Z', 'build_snapshot': False, 'lucene_version': '9.3.0', 'minimum_wire_compatibility_version': '7.17.0', 'minimum_index_compatibility_version': '7.0.0'}, 'tagline': 'You Know, for Search'})

In [12]:
index_settings = {
    "settings": {
        "number_of_shards": 1,
        "number_of_replicas": 0
    },
    "mappings": {
        "properties": {
            "text": {"type": "text"},
            "section": {"type": "text"},
            "question": {"type": "text"},
            "course": {"type": "keyword"} 
        }
    }
}

index_name = "course-questions"

es_client.indices.create(index=index_name, body=index_settings)

ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'course-questions'})

In [13]:
## Add the data to elastic using the index created in the previous cell
for doc in tqdm(documents):
    es_client.index(index=index_name, document=doc)

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 948/948 [00:19<00:00, 48.46it/s]


**Answer:** the function used to add the data to elastic is **index()**

## Q3. Searching

In [14]:
query = "How do I execute a command in a running docker container?"

In [15]:
# Query using only the "question" and "text" fields (question with a boost of 4) and "type": "best_fields"
# The filtering has been removed
es_query = {
    "size": 5,
    "query": {
        "bool": {
            "must": {
                "multi_match": {
                    "query": query,
                    "fields": ["question^4", "text"],
                    "type": "best_fields"
                }
            }
        }
    }
}

In [16]:
# Perform search
response = es_client.search(index=index_name, body=es_query)

In [17]:
# Max score
response["hits"]["max_score"]

83.55175

In [18]:
# Answer with the max score
print(response["hits"]["hits"][0]["_source"]["text"])

Launch the container image in interactive mode and overriding the entrypoint, so that it starts a bash command.
docker run -it --entrypoint bash <image>
If the container is already running, execute a command in the specific container:
docker ps (find the container-id)
docker exec -it <container-id> bash
(Marcos MJD)


**Answer:** 84.05

## Q4. Filtering

In [19]:
# Add filtering to the query
es_query = {
    "size": 3,
    "query": {
        "bool": {
            "must": {
                "multi_match": {
                    "query": query,
                    "fields": ["question^4", "text"],
                    "type": "best_fields"
                }
            },
            "filter": {
                "term": {
                    "course": "machine-learning-zoomcamp"
                }
            }
        }
    }
}

# Perform search
response = es_client.search(index=index_name, body=es_query)

In [20]:
# Look at the third question returned by the search
answer = response["hits"]["hits"][2]["_source"]["question"]

**Answer:**

In [21]:
print(answer)

How do I copy files from a different folder into docker container’s working directory?


## Q5. Building a prompt

In [22]:
# Obtain the part of the results that we need
result_docs = []

for res in response["hits"]["hits"]:
    result_docs.append(res["_source"])

In [25]:
# Build context
context_template = """
Q: {question}
A: {text}
""".strip()

context = ""

for i, doc in enumerate(result_docs):
    context = context + context_template.format(question=doc["question"], text=doc["text"])
    if i < len(result_docs)-1:
        context = context + "\n\n"

In [31]:
# Build prompt
prompt_template = """
You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database.
Use only the facts from the CONTEXT when answering the QUESTION.

QUESTION: {question}

CONTEXT:
{context}
""".strip()

prompt = prompt_template.format(question=query, context=context)

# Show complete prompt
print(prompt)

You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database.
Use only the facts from the CONTEXT when answering the QUESTION.

QUESTION: How do I execute a command in a running docker container?

CONTEXT:
Q: How do I debug a docker container?
A: Launch the container image in interactive mode and overriding the entrypoint, so that it starts a bash command.
docker run -it --entrypoint bash <image>
If the container is already running, execute a command in the specific container:
docker ps (find the container-id)
docker exec -it <container-id> bash
(Marcos MJD)

Q: How do I copy files from my local machine to docker container?
A: You can copy files from your local machine into a Docker container using the docker cp command. Here's how to do it:
To copy a file or directory from your local machine into a running Docker container, you can use the `docker cp command`. The basic syntax is as follows:
docker cp /path/to/local/file_or_directory container_id:

**Answer:**

In [32]:
len(prompt)

1462

## Q6. Tokens

In [35]:
# Tokenize our prompt
encoding = tiktoken.encoding_for_model("gpt-4o")
tokenized_prompt = encoding.encode(prompt)

# Show number of tokens in prompt
len(tokenized_prompt)

322

In [40]:
# Decode back to our prompt
print(encoding.decode(tokenized_prompt))

You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database.
Use only the facts from the CONTEXT when answering the QUESTION.

QUESTION: How do I execute a command in a running docker container?

CONTEXT:
Q: How do I debug a docker container?
A: Launch the container image in interactive mode and overriding the entrypoint, so that it starts a bash command.
docker run -it --entrypoint bash <image>
If the container is already running, execute a command in the specific container:
docker ps (find the container-id)
docker exec -it <container-id> bash
(Marcos MJD)

Q: How do I copy files from my local machine to docker container?
A: You can copy files from your local machine into a Docker container using the docker cp command. Here's how to do it:
To copy a file or directory from your local machine into a running Docker container, you can use the `docker cp command`. The basic syntax is as follows:
docker cp /path/to/local/file_or_directory container_id:

**Answer:**

In [41]:
len(tokenized_prompt)

322