<a href="https://colab.research.google.com/github/novacellus/workshop_llms_25/blob/main/notebooks/Choosing_the_Right_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Setting up the environment / Python Primer / Colab Interface

In [1]:
from openai import OpenAI
from google.colab import userdata
api_key = userdata.get('OPENAI_API_KEY') # read in the secret API key to a variable

client = OpenAI(api_key=api_key)

# contact the API by specifying the model you want to work with and your input
# the response will be stored in a variable called... "response"
response = client.responses.create(
    model="gpt-4.1-nano",
    input="Write one sentence about the use of hateful language in ancient Roman literature."
)

# print the response by accessing the output_text key in the response
# print() is a built-in function and takes as argument the THING we wish to print
# response is a complex object storing a bunch of data, we can access it by using dot-notation:
# VARIABLE.SOME_KEY.ANOTHER_KEY ...
print(response.output_text)

Ancient Roman literature occasionally employed hateful language to mock or condemn enemies, exemplifying the era's use of rhetoric as a means of social and political expression.


In [2]:
## YOUR TURN: Print the entire response below and analyze its components
print(response)

Response(id='resp_68332ec4eff08191b66d578a315e86ee048928b11d19f549', created_at=1748184772.0, error=None, incomplete_details=None, instructions=None, metadata={}, model='gpt-4.1-nano-2025-04-14', object='response', output=[ResponseOutputMessage(id='msg_68332ec551d48191b80287c69b953f02048928b11d19f549', content=[ResponseOutputText(annotations=[], text="Ancient Roman literature occasionally employed hateful language to mock or condemn enemies, exemplifying the era's use of rhetoric as a means of social and political expression.", type='output_text')], role='assistant', status='completed', type='message')], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=1.0, max_output_tokens=None, previous_response_id=None, reasoning=Reasoning(effort=None, generate_summary=None, summary=None), service_tier='default', status='completed', text=ResponseTextConfig(format=ResponseFormatText(type='text')), truncation='disabled', usage=ResponseUsage(input_tokens=21, input_tokens_

In [3]:
## YOUR TURN: Print the usage information about this call. It's stored under "usage" key
print(response.usage)

ResponseUsage(input_tokens=21, input_tokens_details=InputTokensDetails(cached_tokens=0), output_tokens=32, output_tokens_details=OutputTokensDetails(reasoning_tokens=0), total_tokens=53)


## Defining API query function
During this hands on, we will contact OpenAI API numerous times sending it a prompt, example, and waiting for response. Since we don't want to repeat our code every time, we will define a function `query_models` which will be responsible for sending the call and other simple functionalities (e.g. computing the cost of our calls).

In [4]:
import time
def query_models(models, prompt, system_prompt="You are a helpful assistant.", max_tokens=250):
    """
    Query multiple models with the same prompt and track token usage and costs

    Args:
        models (list): List of model names to use
        prompt (str): The user prompt
        system_prompt (str): The system prompt
        max_tokens (int): Maximum response length

    Returns:
        dict: Results with model names as keys and responses as values
    """
    # Print the prompt
    print(f"PROMPT:\n{prompt}\n")
    print(f"SYSTEM:\n{system_prompt}\n")
    print("-" * 50)

    results = {}

    for model in models:
        print(f"\nQuerying {model}...")
        start_time = time.time()

        try:
            response = client.chat.completions.create(
                model=model,
                messages=[
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": prompt}
                ],
                max_tokens=max_tokens,
                temperature=0.2  # Lower temperature for more consistent results
            )

            elapsed_time = time.time() - start_time
            response_text = response.choices[0].message.content

            # Get token usage
            prompt_tokens = response.usage.prompt_tokens
            completion_tokens = response.usage.completion_tokens
            total_tokens = response.usage.total_tokens

            # Calculate costs
            if model in MODEL_PRICING:
                input_cost = (prompt_tokens / 1000) * MODEL_PRICING[model]["input"]
                output_cost = (completion_tokens / 1000) * MODEL_PRICING[model]["output"]
                total_cost = input_cost + output_cost
                cost_info = f"Cost: ${total_cost:.6f} (Input: ${input_cost:.6f}, Output: ${output_cost:.6f})"
            else:
                cost_info = "Cost: Unknown (pricing not available for this model)"

            # Print model name, token usage, cost, and response
            print(f"\nMODEL: {model} (took {elapsed_time:.2f} seconds)")
            print(f"RESPONSE:\n{response_text}\n\n")
            print(f"TOKENS: {total_tokens} total (Prompt: {prompt_tokens}, Completion: {completion_tokens})")
            print(f"{cost_info}")
            print("-" * 50)

            # Store results including token usage and cost
            results[model] = {
                "response": response_text,
                "prompt_tokens": prompt_tokens,
                "completion_tokens": completion_tokens,
                "total_tokens": total_tokens,
                "time_seconds": elapsed_time,
                "estimated_cost": total_cost if model in MODEL_PRICING else None
            }

        except Exception as e:
            print(f"\nMODEL: {model}")
            print(f"ERROR: {str(e)}")
            print("-" * 50)
            results[model] = {"error": str(e)}

    # Print a summary of token usage and costs
    print("\nSUMMARY:")
    for model, result in results.items():
        if "error" not in result:
            cost = result.get("estimated_cost")
            cost_str = f"${cost:.6f}" if cost is not None else "Unknown"
            print(f"{model}: {result['total_tokens']} tokens, Cost: {cost_str}")

    return results


# Practice. Sentiment analysis with OpenAI models

Let's start by defining the models we'll be using in this exercise and their current pricing.

In [5]:
# Selected models
MODELS = [
    "gpt-4.1",      # "Flagship GPT model for complex tasks": $2.00 / $8.00
    "gpt-4.1-nano",    # "Fastest, most cost-effective GPT-4.1 model": $0.10 / $0.40
]
MODEL_PRICING = {
    "gpt-4.1": {"input": 0.02, "output": 0.08},
    "gpt-4.1-nano": {"input": 0.0001, "output": 0.0008}
}

Now it's time to define variables which store our prompts and data we wish to classify.

In [6]:
# Texts for analysis
sentence_la = "Quo usque tandem abutere, Catilina, patientia nostra?"
sentence_en = "How long, Catiline, will you abuse our patience?"

# Task prompts
sentiment_prompt = f"Classify the sentiment of this Latin text as positive, negative, or neutral.\n\n{sentence_la}"
sentiment_system = "You are an expert in classical Latin literature, rhetoric, and linguistics."

# Query the models
results = query_models(MODELS, sentiment_prompt, sentiment_system)

PROMPT:
Classify the sentiment of this Latin text as positive, negative, or neutral.

Quo usque tandem abutere, Catilina, patientia nostra?

SYSTEM:
You are an expert in classical Latin literature, rhetoric, and linguistics.

--------------------------------------------------

Querying gpt-4.1...

MODEL: gpt-4.1 (took 2.16 seconds)
RESPONSE:
The sentiment of the Latin text **"Quo usque tandem abutere, Catilina, patientia nostra?"** is **negative**.

**Reasoning:**  
This famous line, spoken by Cicero in his First Catilinarian Oration, is an accusatory and exasperated question directed at Catiline. Cicero is expressing frustration and indignation at Catiline's continued abuse of the Senate's patience. The tone is reproachful and critical, indicating a negative sentiment.


TOKENS: 160 total (Prompt: 59, Completion: 101)
Cost: $0.009260 (Input: $0.001180, Output: $0.008080)
--------------------------------------------------

Querying gpt-4.1-nano...

MODEL: gpt-4.1-nano (took 0.89 second

In [7]:
# YOUR TURN: Modify the code to evaluate the English translation of the Cicero's phrase. Compare results.
# Texts for analysis
sentence_la = "Quo usque tandem abutere, Catilina, patientia nostra?"
sentence_en = "How long, Catiline, will you abuse our patience?"

# Sample task prompts
sentiment_prompt = f"Classify the sentiment of this Latin text as positive, negative, or neutral.\n\n{sentence_en}"
sentiment_system = "You are an expert in classical Latin literature, rhetoric, and linguistics."

# Example call
results = query_models(MODELS, sentiment_prompt, sentiment_system)

PROMPT:
Classify the sentiment of this Latin text as positive, negative, or neutral.

How long, Catiline, will you abuse our patience?

SYSTEM:
You are an expert in classical Latin literature, rhetoric, and linguistics.

--------------------------------------------------

Querying gpt-4.1...

MODEL: gpt-4.1 (took 1.14 seconds)
RESPONSE:
The sentiment of the Latin text "How long, Catiline, will you abuse our patience?" (Latin: *Quousque tandem abutere, Catilina, patientia nostra?*) is **negative**.

This sentence expresses frustration, accusation, and exasperation toward Catiline, indicating a negative sentiment.


TOKENS: 119 total (Prompt: 54, Completion: 65)
Cost: $0.006280 (Input: $0.001080, Output: $0.005200)
--------------------------------------------------

Querying gpt-4.1-nano...

MODEL: gpt-4.1-nano (took 0.57 seconds)
RESPONSE:
The sentiment of the Latin text "How long, Catiline, will you abuse our patience?" is negative.


TOKENS: 76 total (Prompt: 54, Completion: 22)
Cost: