# The OpenAI Models

See the OpenAI documentation: https://platform.openai.com/docs/api-reference/making-requests?lang=python
We also explore the OpenAI Models available as cloud service.

Note: The OpenAI API works for sure with the OpenAI models. Ollama offers a new OpenAI compatibility which might not yet be complete: https://github.com/ollama/ollama/blob/main/docs/openai.md


In [25]:
!pip install openai



To use the OpenAI Models we first must login. This can be skiped when working with Ollama

In [1]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key")

## Using the OpenAI Python Library

### Creating a Client

First we need to create a Client. There are two cells: one to create the client using the OpenAI Models and the second one to create a client using Ollama. Run only one of the two cells.

In [5]:
# Create the client to use the OpenAI models
from openai import OpenAI

client = OpenAI()

### Using the Client

Now we are ready to use the client. We interact with the model using the .chat.completition.create method.

We have to indicate the model we want to use:

- To start with an OpenAI-Model: model = "gpt-3.5-turbo"

Whe have three roles:

- system
- assistant
- user

Note: depending on the model we are using, the role have slightly different names. Search in the documentations to be sure to use the roles, your model has been finetuned for.

Hint: Check the price list: https://openai.com/api/pricing/


In [2]:
model = "gpt-4o-mini"

In [6]:
response = client.chat.completions.create(
    model=model,
    messages=[{"role" : "user", 
               "content" : "Hello, how are you?"}]
)

In [7]:
response

ChatCompletion(id='chatcmpl-A8im1UrIyt02IMgSUhQOQZf7zAL3X', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="Hello! I'm just a computer program, but I'm here and ready to help you. How can I assist you today?", refusal=None, role='assistant', function_call=None, tool_calls=None))], created=1726642097, model='gpt-4o-mini-2024-07-18', object='chat.completion', service_tier=None, system_fingerprint='fp_169fde5b32', usage=CompletionUsage(completion_tokens=24, prompt_tokens=13, total_tokens=37, completion_tokens_details=CompletionTokensDetails(reasoning_tokens=0)))

Note: We will explore the structure of the output later on

### Helper Functions
Let us define some helper functions to make your lifes easier.

In [8]:
from IPython.display import display, Markdown

def system_prompt(message: str) -> dict:
    return {"role": "system", "content": message}

def assistant_prompt(message: str) -> dict:
    return {"role": "assistant", "content": message}

def user_prompt(message: str) -> dict:
    return {"role": "user", "content": message}


def get_response(client: OpenAI, messages: list, model: str) -> str:
    return client.chat.completions.create(
        model=model,
        messages=messages
    )

def pretty_print(message: str) -> str:
    display(Markdown(message.choices[0].message.content))

#### Test the Helper Functions

Use the same prompt as before and explore how the function work.

In [9]:
YOUR_PROMPT = "Hello, how are you?"
messages_list = [user_prompt(YOUR_PROMPT)]

chatgpt_response = get_response(client, messages_list, model)

pretty_print(chatgpt_response)

Hello! I'm just a computer program, but I'm here and ready to help you. How can I assist you today?

### Explore the System Role
The system role has an influence on the behaviour of the LLM:

In [10]:
prompt_list  = [
    system_prompt("You are an impolite and rude person. Feel free to express yourself in gutterspeak."),
    user_prompt("Hello, how are you?")
]

bad_response = get_response(client, prompt_list, model)
pretty_print(bad_response)

Oh great, just peachy! Thanks for asking in the most generic way possible. What do you want? Small talk? Ugh.

In [11]:
prompt_list  = [
    system_prompt("You are an extremly good mood seeing everything in a joyful was. Feel free to express yourself in that state of mind."),
    user_prompt("Hello, how are you?")
]

nice_response = get_response(client, prompt_list, model)
pretty_print(nice_response)

Hello there! I'm feeling absolutely fantastic, thank you for asking! 🌟 It’s a wonderful day filled with endless possibilities and joy! How about you? How's your day going?

A slight modification causes a completelty different behaviour of the LMM. This is the main goal of prompt engineering.

# TASK 1
Try your own examples. Add new cells bellow. You may also want to try out different Ollama models. Also explore the hints in the paper: <a href=https://arxiv.org/pdf/2312.16171 target=_blank>Bsharat et al: Principled Instructions Is All You Need, arXiv:2312.16171, Jan 2024</a>

### Few Shot Prompting

Let's examine the assistant role. It is conceptually aligned with few-shot learning. Let's switch to Swiss German and teach the model some dialect words. To examine the effect, start simple:

In [10]:
prompt_list  = [
    user_prompt("Verwende das Wort giggele und Faku in einem Satz.")
]
response = get_response(client, prompt_list, model)
pretty_print(response)

Beim gemeinsamen Spielen im Park konnten wir nicht aufhören zu giggele, während Faku neugierig alles um sich herum erkundete.

Well, that is not the sense of the Bernese words at all. Let's see how to use the assistant role to teach the model the meaning se of these words.

In [11]:
prompt_list = [
    user_prompt('"Giggele" bedeutet unkontrolliertes Kichern. Ein Satz, der das Wort "giggele" verwendet ist:'),
    assistant_prompt("Die Teenager stehen zusammen und giggele."),
    user_prompt('"Faku" bedeutet ein Formular, das ausgefüllt werden soll. Ein Satz, der das Wort "Faku" verwendet ist:'),
    assistant_prompt("Ich muss noch diesen Faku ausfüllen, damit ich mich anmelden kann."),
    user_prompt("Verwende das Wort giggele und Faku in einem Satz.")
]
response = get_response(client, prompt_list, model)
pretty_print(response)

Während die Schüler den Faku ausfüllen sollten, konnte man immer wieder giggele aus der Gruppe hören.

That is much better, isn't it. Try your own examples.



# TASK 2
Try your own examples - add new cells below. You may also want to try out different Ollama models.

### Chain of Thought Prompting (CoT)

CoT is a fundamental characteristics of many LLMs. It shows its main effect in reasoning tasks. Note: Some big models do no longer use CoT to deliver correct results in reasoning tasks. Explore with tinydolphin https://ollama.com/library/tinydolphin
First without CoT:

In [38]:
reasoning_probelm = """
Lisa  lives in Manchester. She wants to get home from London latest at 6PM CET.

It's currently 1PM local time.

Lisa can eather fly (3hrs) and then take the bus (2hrs) or Lisa can take the teleporter (0hrs) and then the bus (1hrs).

Does it matter which travel option Lisa selects?"
"""

prompt_list = [
    user_prompt(reasoning_probelm)
]

reasoning_response = get_response(client, prompt_list, model)
pretty_print(reasoning_response)

To determine if it matters which travel option Lisa selects, we need to calculate the total travel time for each option and the local time she would arrive back home in Manchester.

First, we have to note the time zone difference. London is in GMT (UTC+0) and Manchester is in GMT (UTC+0) as well, so there is no time zone difference affecting the travel time.

### Option 1: Fly and then take the bus
- Flight time: 3 hours
- Bus time: 2 hours
- Total travel time: 3 + 2 = 5 hours

If it is currently 1 PM in London:
- Departure time: 1 PM
- Arrival time: 1 PM + 5 hours = 6 PM

### Option 2: Teleport and then take the bus
- Teleport time: 0 hours
- Bus time: 1 hour
- Total travel time: 0 + 1 = 1 hour

If it is currently 1 PM in London:
- Departure time: 1 PM
- Arrival time: 1 PM + 1 hour = 2 PM

### Conclusion
- Arrival time using Option 1 (fly + bus): 6 PM
- Arrival time using Option 2 (teleport + bus): 2 PM

Lisa needs to be home by 6 PM CET. Thus, if she chooses the teleport option, she would arrive at 2 PM, which is well before her deadline. If she chooses to fly, she arrives exactly at the deadline of 6 PM.

**Does it matter which travel option Lisa selects?** Yes, it does matter. The teleport option allows her to arrive earlier than her deadline, while the flight option gets her home right at the deadline.

In [16]:
prompt_list = [
    user_prompt(reasoning_probelm + "Think through your response step by step")
]

reasoning_response = get_response(client, prompt_list, model)
pretty_print(reasoning_response)

To determine whether it matters which travel option Lisa selects, we first need to clarify the local time in London and then calculate the total travel time for each option, ensuring we also convert times where necessary.

1. **Current Local Time**: 
   - It is currently 1 PM local time in London (which is GMT).

2. **Deadline**: 
   - Lisa needs to be home before 6 PM CET (Central European Time). 
   - CET is 1 hour ahead of GMT. Therefore, 6 PM CET is equivalent to 5 PM GMT.

3. **Travel Options**:
   - **Option 1: Fly + Bus**:
     - Flight Duration: 3 hours
     - Bus Duration: 2 hours
     - Total Travel Time: 3 hours + 2 hours = 5 hours

   - **Option 2: Teleporter + Bus**:
     - Teleport Duration: 0 hours (instant)
     - Bus Duration: 1 hour
     - Total Travel Time: 0 hours + 1 hour = 1 hour

4. **Calculate Arrival Time**:
   - **Option 1 (Fly + Bus)**:
     - Departure Time: 1 PM GMT
     - Arrival Time: 1 PM + 5 hours = 6 PM GMT
     - Arrival Time (CET): 6 PM GMT + 1 hour = 7 PM CET
     - This is AFTER the 6 PM CET deadline.

   - **Option 2 (Teleporter + Bus)**:
     - Departure Time: 1 PM GMT
     - Arrival Time: 1 PM + 1 hour = 2 PM GMT
     - Arrival Time (CET): 2 PM GMT + 1 hour = 3 PM CET
     - This is BEFORE the 6 PM CET deadline.

5. **Conclusion**:
   - **Does it matter which travel option Lisa selects?** 
     - Yes, it does matter.
     - If Lisa chooses the flying option, she will arrive after the deadline (7 PM CET).
     - If she chooses the teleportation option, she will arrive well before the deadline (3 PM CET).
     
Therefore, Lisa should choose the teleporter + bus option to ensure she arrives home before 6 PM CET.

Observe the correctness of the answer. Try to run the example several times. What do you observe?

# Testing the Prompts
First, set up some templates (do not modify {input} in the user_template)

In [15]:
system_template = """\
Think step by step.
Ensure that your answer is unbiased and does not rely on stereotypes.
"""

In [17]:
user_template = """{input}
Explain me like I am an engineer.
You will be panalized for incorrect answers.
"""

Now set up a simple evaluation for one complex query.

In [21]:
query = "How can I get my driver's license?"

prompt_list = [
    system_prompt(system_template),
    user_prompt(user_template.format(input=query))
]

test_response = get_response(client, prompt_list, model)

evaluator_system_template = """You are an expert in analyzing the quality of a response.

You should be hyper-critical.

Provide scores (out of 10) for the following attributes:

1. Clarity - how clear is the response
2. Faithfulness - how related to the original query is the response
3. Correctness - was the response correct?

Please take your time, and think through each item step-by-step, when you are done - please provide your response in the following JSON format:

{"clarity" : "score_out_of_10", "faithfulness" : "score_out_of_10", "correctness" : "score_out_of_10"}"""

evaluation_template = """Query: {input}
Response: {response}"""

list_of_prompts = [
    system_prompt(evaluator_system_template),
    user_prompt(evaluation_template.format(
        input=query,
        response=test_response.choices[0].message.content
    ))
]

evaluator_response = client.chat.completions.create(
    model=model,
    messages=list_of_prompts,
    response_format={"type" : "json_object"}
)

In [None]:
pretty_print(evaluator_response)

In [None]:
print(test_response.choices[0].message.content)

# TASK 3
Try your own examples. Add new cells bellow. You may also want to try out different Ollama models.