# Steps 3 & 4: Querying a Completion Model with a Custom Text Prompt

Add your API key to the cell below then run it.

In [1]:
import openai
import getpass


openai.api_base = "https://openai.vocareum.com/v1"
openai.api_key = getpass.getpass("Digite sua API Key: ").strip()

Digite sua API Key: ········


The code below loads in the data sorted by cosine distance that you previously created. Run it as-is.

In [2]:
import pandas as pd

df = pd.read_csv("distances.csv", index_col=0)
df

Unnamed: 0,text,embeddings,distances
96,Aftermath of the 2023 Turkey–Syria earthquakes,[-0.00593362 -0.01557161 -0.00971076 ... 0.00...,0.075461
98,Reactions to the 2023 Turkey–Syria earthquakes,[-0.00968734 -0.02823052 -0.00874092 ... 0.01...,0.078994
97,Humanitarian response to the 2023 Turkey–Syria...,[-0.01003118 -0.02768046 -0.01551542 ... 0.00...,0.089677
3,"The confirmed death toll in Turkey was 53,537;...",[ 0.00022446 -0.02535487 -0.01128911 ... 0.00...,0.092695
39,"By 23 February 2023, the Ministry of Environme...",[ 0.00526485 -0.01567967 -0.00376447 ... 0.00...,0.101008
...,...,...,...
86,The United Nations (UN) criticized Russia's at...,[-0.01369679 -0.00047191 -0.01180407 ... -0.01...,0.227743
87,"Raed al-Saleh, chief of the White Helmets, a v...",[-0.02008515 -0.02022861 -0.01374659 ... -0.01...,0.235111
46,The 13-storey apartment block consists of thre...,[-0.00040081 0.01254989 0.00743298 ... -0.00...,0.238896
88,"""Let me be clear: The White Helmets received n...",[-0.01511107 -0.01264502 0.00213483 ... -0.01...,0.254376


## TODO 1: Build the Custom Text Prompt

Run the cell below as-is:

In [3]:
import tiktoken
# Create a tokenizer that is designed to align with our embeddings
tokenizer = tiktoken.get_encoding("cl100k_base")

token_limit = 1000
USER_QUESTION = """What were the estimated damages of the 2023 \
Turkey-Syria earthquake?"""

Now your task is to compose the custom text prompt.

The overall structure of the prompt should look like this:

```
Answer the question based on the context below, and if the
question can't be answered based on the context, say "I don't
know"

Context:

{context}

---

Question: {question}
Answer:
```

In the place marked `context`, provide as much information from `df['text']` as possible without exceeding `token_limit`. In the place marked `question`, add `USER_QUESTION`.

Your overall goal is to create a string called `prompt` that contains all of the relevant information.

If you're getting stuck, you can click to reveal the solution then copy and paste this into the cell below.

---

<details>
    <summary style="cursor: pointer"><strong>Solution (click to show/hide)</strong></summary>

```python
# Count the number of tokens in the prompt template and question
prompt_template = """
Answer the question based on the context below, and if the 
question can't be answered based on the context, say 
"I don't know"

Context: 

{}

---

Question: {}
Answer:"""
token_count = len(tokenizer.encode(prompt_template)) + \
                        len(tokenizer.encode(USER_QUESTION))

# Create a list to store text for context
context_list = []

# Loop over rows of the sorted dataframe
for text in df["text"].values:
    
    # Append text to context_list if there is enough room
    token_count += len(tokenizer.encode(text))
    if token_count <= token_limit:
        context_list.append(text)
    else:
        # Break once we're over the token limit
        break

# Use string formatting to complete the prompt
prompt = prompt_template.format(
    "\n\n###\n\n".join(context_list),
    USER_QUESTION
)
print(prompt)
```

</details>

In [4]:
# Count the number of tokens in the prompt template and question
prompt_template = """
Answer the question based on the context below, and if the 
question can't be answered based on the context, say 
"I don't know"

Context: 

{}

---

Question: {}
Answer:"""
token_count = len(tokenizer.encode(prompt_template)) + \
                        len(tokenizer.encode(USER_QUESTION))

# Create a list to store text for context
context_list = []

# Loop over rows of the sorted dataframe
for text in df["text"].values:
    
    # Append text to context_list if there is enough room
    token_count += len(tokenizer.encode(text))
    if token_count <= token_limit:
        context_list.append(text)
    else:
        # Break once we're over the token limit
        break

# Use string formatting to complete the prompt
prompt = prompt_template.format(
    "\n\n###\n\n".join(context_list),
    USER_QUESTION
)
print(prompt)


Answer the question based on the context below, and if the 
question can't be answered based on the context, say 
"I don't know"

Context: 

Aftermath of the 2023 Turkey–Syria earthquakes

###

Reactions to the 2023 Turkey–Syria earthquakes

###

Humanitarian response to the 2023 Turkey–Syria earthquakes

###

The confirmed death toll in Turkey was 53,537; estimates of the number of dead in Syria were between 5,951 and 8,476. It is the deadliest earthquake in what is now present-day Turkey since the 526 Antioch earthquake and the deadliest natural disaster in its modern history. It is also the deadliest in present-day Syria since the 1822 Aleppo earthquake; the deadliest earthquake or natural disaster in general since the 2010 Haiti earthquake; and the fifth-deadliest earthquake of the 21st century. Damages were estimated at US$148.8 billion in Turkey, or nine-percent of the country's GDP, and US$9 billion in Syria.

###

By 23 February 2023, the Ministry of Environment, Urbanization 

## TODO 2: Send Custom Text Prompt to Completion Model

Using the `prompt` string you created, query an OpenAI `Completion` model to get an answer. Specify a `max_tokens` of 150.

If you're getting stuck, you can click to reveal the solution then copy and paste this into the cell below.

---

<details>
    <summary style="cursor: pointer"><strong>Solution (click to show/hide)</strong></summary>

```python
COMPLETION_MODEL_NAME = "gpt-3.5-turbo-instruct"
response = openai.Completion.create(
    model=COMPLETION_MODEL_NAME,
    prompt=prompt,
    max_tokens=150
)
answer = response["choices"][0]["text"].strip()
print(answer)
```

</details>

In [5]:
COMPLETION_MODEL_NAME = "gpt-3.5-turbo-instruct"
response = openai.Completion.create(
    model=COMPLETION_MODEL_NAME,
    prompt=prompt,
    max_tokens=150
)
answer = response["choices"][0]["text"].strip()
print(answer)

Damages were estimated at US$148.8 billion in Turkey, or nine-percent of the country's GDP, and US$9 billion in Syria.


## 🎉 Congratulations 🎉

You have now completed the prompt engineering process using unsupervised ML to get a custom answer from an OpenAI model!