# Steps 3 & 4: Querying a Completion Model with a Custom Text Prompt

Add your API key to the cell below then run it.

In [1]:
from dotenv import load_dotenv
load_dotenv()  # take environment variables

import os
OPENAI_KEY = os.getenv('OPENAI_KEY')

In [37]:
import openai

from openai import OpenAI
client = OpenAI(
    base_url = "https://openai.vocareum.com/v1",
    api_key = OPENAI_KEY
)
# openai.api_base = "https://openai.vocareum.com/v1"
# openai.api_key = OPENAI_KEY

The code below loads in the data sorted by cosine distance that you previously created. Run it as-is.

In [32]:
import pandas as pd

df = pd.read_csv("distances_my_exercise.csv", index_col=0)
df.sort_values(by='distances', ascending=True, inplace=True)
df

Unnamed: 0,text,embeddings,distances
12,Ground acceleration values recorded in some ar...,[ 0.00045228 -0.00060161 0.00135084 ... -0.01...,0.087880
15,"Despite an epicenter 90 km (56 mi) inland, a t...",[ 0.00297256 -0.02091205 0.00918703 ... -0.00...,0.102367
89,"Utkucu, Murat; Uzunca, Fatih; Durmuş, Hatice; ...",[-0.00888022 -0.02886584 0.00233474 ... 0.01...,0.116470
46,The United Nations Development Programme estim...,[ 0.01216849 -0.00288201 -0.00684733 ... -0.00...,0.116809
28,\t\t\t,[-0.01989811 -0.02768373 -0.02324662 ... -0.01...,0.122092
...,...,...,...
38,\t\t,[-1.56773869e-02 -2.09031831e-02 2.77037161e-...,0.296023
54,"On the morning of 7 February, Turkey accused t...",[-0.01837238 -0.00404908 -0.00291975 ... 0.01...,0.296023
94,"Zhe Jia et al., The complex dynamics of the 20...",[-0.01198426 -0.00932718 0.00403356 ... 0.00...,0.296138
3,Central southern Turkey and northwestern Syria...,[-0.00597619 -0.01147591 0.01190473 ... 0.02...,0.296138


## TODO 1: Build the Custom Text Prompt

Run the cell below as-is:

In [5]:
!pip install tiktoken -q


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [6]:
import tiktoken
# Create a tokenizer that is designed to align with our embeddings
tokenizer = tiktoken.get_encoding("cl100k_base")

token_limit = 1000
USER_QUESTION = """What were the estimated damages of the 2023 \
Turkey-Syria earthquake?"""

Now your task is to compose the custom text prompt.

The overall structure of the prompt should look like this:

```
Answer the question based on the context below, and if the
question can't be answered based on the context, say "I don't
know"

Context:

{context}

---

Question: {question}
Answer:
```

In the place marked `context`, provide as much information from `df['text']` as possible without exceeding `token_limit`. In the place marked `question`, add `USER_QUESTION`.

Your overall goal is to create a string called `prompt` that contains all of the relevant information.

If you're getting stuck, you can click to reveal the solution then copy and paste this into the cell below.

---

<details>
    <summary style="cursor: pointer"><strong>Solution (click to show/hide)</strong></summary>

```python
# Count the number of tokens in the prompt template and question
prompt_template = """
Answer the question based on the context below, and if the 
question can't be answered based on the context, say 
"I don't know"

Context: 

{}

---

Question: {}
Answer:"""
token_count = len(tokenizer.encode(prompt_template)) + \
                        len(tokenizer.encode(USER_QUESTION))

# Create a list to store text for context
context_list = []

# Loop over rows of the sorted dataframe
for text in df["text"].values:
    
    # Append text to context_list if there is enough room
    token_count += len(tokenizer.encode(text))
    if token_count <= token_limit:
        context_list.append(text)
    else:
        # Break once we're over the token limit
        break

# Use string formatting to complete the prompt
prompt = prompt_template.format(
    "\n\n###\n\n".join(context_list),
    USER_QUESTION
)
print(prompt)
```

</details>

In [33]:
# Count the number of tokens in the prompt template and question
prompt_template = """
Answer the question based on the context below, and if the 
question can't be answered based on the context, say 
"I don't know"

Context: 

{}

---

Question: {}
Answer:"""
token_count = len(tokenizer.encode(prompt_template)) + len(tokenizer.encode(USER_QUESTION))

# Create a list to store text for context
context_list = []

# Loop over rows of the sorted dataframe
for text in df["text"].values:
    print("text={}".format(text))
    # Append text to context_list if there is enough room
    token_count += len(tokenizer.encode(text))
    if token_count <= token_limit:
        context_list.append(text)
    else: break

# Use string formatting to complete the prompt
prompt = prompt_template.format(
    "\n\n###\n\n".join(context_list),
    USER_QUESTION
)
print(prompt)
token_count

text=Ground acceleration values recorded in some areas near the fault rupture were in excess of 1 g. Three USGS seismic installations, two at Antakya and one at Hassa, recorded large ground accelerations and velocities. The town of Hassa recorded 0.9082 g in ground acceleration (pga) and 215.34 cm/s (84.78 in/s) in ground velocity. The station data corresponded to a Modified Mercalli intensity of X (Extreme). A peak ground acceleration of 1.62 g was recorded by a station at Fevzipaşa. The peak ground acceleration generally exceeded 0.5 g in a large area around the epicenter, near Adiyaman and a large part of Hatay. High pga values of 2 g were recorded in Hatay. The maximum recorded pga was 2.212 g at a station in Pazarcık Belediyesi Parkı, Pazarcık; a government health facility nearby had limited damage but the surrounding town was devastated. The maximum recorded pga during the Mw  7.7 earthquake was 0.59 g at Göksun.According to Kandilli Observatory, the maximum Mercalli intensity (M

1239

## TODO 2: Send Custom Text Prompt to Completion Model

Using the `prompt` string you created, query an OpenAI `Completion` model to get an answer. Specify a `max_tokens` of 150.

If you're getting stuck, you can click to reveal the solution then copy and paste this into the cell below.

---

<details>
    <summary style="cursor: pointer"><strong>Solution (click to show/hide)</strong></summary>

```python
COMPLETION_MODEL_NAME = "gpt-3.5-turbo-instruct"
response = openai.Completion.create(
    model=COMPLETION_MODEL_NAME,
    prompt=prompt,
    max_tokens=150
)
answer = response["choices"][0]["text"].strip()
print(answer)
```

</details>

In [44]:
COMPLETION_MODEL_NAME = "gpt-3.5-turbo-instruct"
response = ""
answer = prompt
# print(answer)


ukraine_prompt = """
Question: "When did Russia invade Ukraine?"
Answer:
"""
response = client.chat.completions.create(
    model=COMPLETION_MODEL_NAME,
    messages=[
            {
            "role": "user",
            "content": USER_QUESTION
            }
    ],
    max_tokens=150
)
# ["choices"][0]["text"].strip()


print(response)

BadRequestError: Error code: 400 - {'error': {'code': None, 'message': 'Service budget exceeded. Please contact administration.', 'param': None, 'type': 'invalid_request_error'}}

## 🎉 Congratulations 🎉

You have now completed the prompt engineering process using unsupervised ML to get a custom answer from an OpenAI model!