# Welcome to Week 2!

## Frontier Model APIs

In Week 1, we used multiple Frontier LLMs through their Chat UI, and we connected with the OpenAI's API.

Today we'll connect with them through their APIs..

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Important Note - Please read me</h2>
            <span style="color:#900;">I'm continually improving these labs, adding more examples and exercises.
            At the start of each week, it's worth checking you have the latest code.<br/>
            First do a git pull and merge your changes as needed</a>. Check out the GitHub guide for instructions. Any problems? Try asking ChatGPT to clarify how to merge - or contact me!<br/>
            </span>
        </td>
    </tr>
</table>
<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/resources.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#f71;">Reminder about the resources page</h2>
            <span style="color:#f71;">Here's a link to resources for the course. This includes links to all the slides.<br/>
            <a href="https://edwarddonner.com/2024/11/13/llm-engineering-resources/">https://edwarddonner.com/2024/11/13/llm-engineering-resources/</a><br/>
            Please keep this bookmarked, and I'll continue to add more useful links there over time.
            </span>
        </td>
    </tr>
</table>

## Setting up your keys - OPTIONAL!

We're now going to try asking a bunch of models some questions!

This is totally optional. If you have keys to Anthropic, Gemini or others, then you can add them in.

If you'd rather not spend the extra, then just watch me do it!

For OpenAI, visit https://openai.com/api/  
For Anthropic, visit https://console.anthropic.com/  
For Google, visit https://ai.google.dev/gemini-api   
For DeepSeek, visit https://platform.deepseek.com/  
For Groq, visit https://console.groq.com/  
For Grok, visit https://console.x.ai/  


You can also use OpenRouter as your one-stop-shop for many of these! OpenRouter is "the unified interface for LLMs":

For OpenRouter, visit https://openrouter.ai/  


With each of the above, you typically have to navigate to:
1. Their billing page to add the minimum top-up (except Gemini, Groq, Google, OpenRouter may have free tiers)
2. Their API key page to collect your API key

### Adding API keys to your .env file

When you get your API keys, you need to set them as environment variables by adding them to your `.env` file.

```
OPENAI_API_KEY=xxxx
ANTHROPIC_API_KEY=xxxx
GOOGLE_API_KEY=xxxx
DEEPSEEK_API_KEY=xxxx
GROQ_API_KEY=xxxx
GROK_API_KEY=xxxx
OPENROUTER_API_KEY=xxxx
```

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Any time you change your .env file</h2>
            <span style="color:#900;">Remember to Save it! And also rerun load_dotenv(override=True)<br/>
            </span>
        </td>
    </tr>
</table>

In [1]:
# imports

import os
import requests
from dotenv import load_dotenv
from openai import OpenAI   
from IPython.display import Markdown, display

In [2]:
load_dotenv(override=True)
openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')
xai_api_key = os.getenv('XAI_API_KEY')
openrouter_api_key = os.getenv('OPENROUTER_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set (and this is optional)")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:2]}")
else:
    print("Google API Key not set (and this is optional)")

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set (and this is optional)")

if groq_api_key:
    print(f"Groq API Key exists and begins {groq_api_key[:4]}")
else:
    print("Groq API Key not set (and this is optional)")

if  xai_api_key:
    print(f"xAI API Key exists and begins {xai_api_key[:4]}")
else:
    print("xAI API Key not set (and this is optional)")

if openrouter_api_key:
    print(f"OpenRouter API Key exists and begins {openrouter_api_key[:3]}")
else:
    print("OpenRouter API Key not set (and this is optional)")


OpenAI API Key exists and begins sk-proj-
Anthropic API Key exists and begins sk-ant-
Google API Key exists and begins AI
DeepSeek API Key exists and begins sk-
Groq API Key exists and begins gsk_
xAI API Key exists and begins xai-
OpenRouter API Key exists and begins sk-


In [3]:
# Connect to OpenAI client library
# A thin wrapper around calls to HTTP endpoints

openai = OpenAI()

# For Gemini, DeepSeek and Groq, we can use the OpenAI python client
# Because Google and DeepSeek have endpoints compatible with OpenAI
# And OpenAI allows you to change the base_url

anthropic_url = "https://api.anthropic.com/v1/"
gemini_url = "https://generativelanguage.googleapis.com/v1beta/openai/"
deepseek_url = "https://api.deepseek.com"
groq_url = "https://api.groq.com/openai/v1"
xai_url = "https://api.x.ai/v1"
openrouter_url = "https://openrouter.ai/api/v1"
ollama_url = "http://192.168.0.84:11434/v1"

anthropic = OpenAI(api_key=anthropic_api_key, base_url=anthropic_url)
gemini = OpenAI(api_key=google_api_key, base_url=gemini_url)
deepseek = OpenAI(api_key=deepseek_api_key, base_url=deepseek_url)
groq = OpenAI(api_key=groq_api_key, base_url=groq_url)
grok = OpenAI(api_key=xai_api_key, base_url=xai_url)
openrouter = OpenAI(base_url=openrouter_url, api_key=openrouter_api_key)
ollama = OpenAI(api_key="ollama", base_url=ollama_url)

In [5]:
tell_a_joke = [
    {"role": "user", "content": "Tell a joke for a student on the journey to becoming an expert in LLM Engineering"},
]

In [15]:
response = openai.chat.completions.create(model="gpt-4.1-mini", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

Why did the LLM engineer bring a ladder to the training data?

Because they heard the model needed to reach new heights!

In [None]:
response = anthropic.chat.completions.create(model="claude-sonnet-4-5-20250929", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

Why did the LLM engineer bring a ladder to work?

Because they kept getting stuck in local minima and needed to climb to higher ground for a better global perspective! 

But seriously, they mostly just needed it to reach the GPU cluster on the top rack... turns out the *real* optimization problem was convincing their manager to approve the cloud compute budget. 💸

*Alternative punchline for those deeper in the journey:*

They heard about "scaling laws" and wanted to take them literally! 📈

---

Keep climbing that ladder! Remember: every expert was once a beginner who got a CUDA out of memory error and panicked. You've got this! 🚀

## Training vs Inference time scaling

In [16]:
easy_puzzle = [
    {"role": "user", "content": 
        "You toss 2 coins. One of them is heads. What's the probability the other is tails? Answer with the probability only."},
]

In [17]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=easy_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

1/3

In [18]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=easy_puzzle, reasoning_effort="low")
display(Markdown(response.choices[0].message.content))

2/3

In [19]:
response = openai.chat.completions.create(model="gpt-5-mini", messages=easy_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

2/3

## Testing out the best models on the planet

In [None]:
hard = """
On a bookshelf, two volumes of Pushkin stand side by side: the first and the second.
The pages of each volume together have a thickness of 2 cm, and each cover is 2 mm thick.
A worm gnawed (perpendicular to the pages) from the first page of the first volume to the last page of the second volume.
What distance did it gnaw through?
"""
hard_puzzle = [
    {"role": "user", "content": hard}
]

In [None]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=hard_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

Each volume has pages total thickness 2 cm = 20 mm. Each cover thickness = 2 mm. So the stack from left to right is:

- Cover of volume 1 (2 mm)
- Pages of volume 1 (20 mm)
- Cover between volumes? There is a touching interface: the right cover of volume 1 and the left cover of volume 2. Since the volumes are side by side, the right cover of vol 1 and the left cover of vol 2 are adjacent; however, a worm going from the first page of the first volume to the last page of the second volume travels through pages and possibly through the covers in between.

We need to define positions:
- The "first page" of volume 1 is the very first page near the front cover. The "last page" of volume 2 is the last page near the back cover.

Assuming standard arrangement on a shelf: volumes are upright, with front covers facing outward. If the worm goes from the first page of vol 1 to the last page of vol 2, perpendicular to pages, it travels in a straight line through the stack along the depth direction (through covers and pages) from near the front of vol 1 to near the back of vol 2.

We can compute the distance through material between those two pages along that line.

 layout along the depth axis (from left to right is front to back? Let's fix):

Take along the row:
- Front cover of vol 1 (2 mm)
- Pages of vol 1 (20 mm)
- Back cover of vol 1 (2 mm)
- Front cover of vol 2 (2 mm)
- Pages of vol 2 (20 mm)
- Back cover of vol 2 (2 mm)

But the worm starts at the first page of vol 1. The first page is directly after the front cover of vol 1. So starting point is just inside vol 1 after 2 mm front cover. The endpoint is the last page of vol 2, which is just before the back cover of vol 2, meaning just before the final 2 mm back cover.

The distance through material between those two points along the depth direction is the sum of all material between them:

From start to end, you pass through:
- Remaining pages of vol 1 after first page? If first page is immediately after front cover, the worm starting at first page means starting at boundary between front cover and pages, or inside the first page; but the distance through the starting page itself would be counted if you consider starting point at beginning of that page. Usually such puzzle counts the distance through all material between those two pages, including the front cover of vol 1 that lies before the first page? No, starting point is the first page, so you go from that page into the rest toward the back.

Between the first page of vol 1 and the last page of vol 2, the path goes through:
- The rest of volume 1's pages after the first page: that is (20 mm total pages) minus the thickness of the first page. But typically the "first page" is at the start of the pages block; the thickness of a single page is not given. In these classic puzzles, they treat the distance as just the sum of covers and pages between the two pages, effectively the thickness of everything except the starting page and ending page. However, since we aren't given page thickness per se, standard trick is that the distance equals the total thickness of all pages and covers between the two pages, which equals 2 cm (vol 1 pages) + 2 mm (vol 1 back cover) + 2 mm (vol 2 front cover) + 2 cm (vol 2 pages) = 20 mm + 2 + 2 + 20 mm = 44 mm = 4.4 cm.

But we must be careful: Do we include the thickness of the front cover of vol 1 before reaching the first page? The worm starts at the first page, so it doesn't pass through the front cover of vol 1. Do we include the back cover of vol 2 after the last page? The worm ends at the last page, so it does not pass through the back cover of vol 2. Thus the distance is exactly the thickness of:
- The rest of vol 1 pages after the first page: since we don't know page thickness, but the first page is at the start of the pages block, so the remaining pages of vol 1 contribute almost the entire 20 mm pages.
- The back cover of vol 1 (2 mm)
- The front cover of vol 2 (2 mm)
- The pages of vol 2 up to the last page (almost entire 20 mm)

If we approximate by including entire page blocks, the sum is: 20 mm (vol 1 pages) + 2 mm (vol 1 back cover) + 2 mm (vol 2 front cover) + 20 mm (vol 2 pages) = 44 mm = 4.4 cm.

Common answer to this classic problem is 4.4 cm.

Thus: The worm gnawed through 4.4 cm of material.

In [None]:
response = anthropic.chat.completions.create(model="claude-sonnet-4-5-20250929", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

I need to visualize how books are arranged on a shelf and what a worm would actually go through.

**Key insight: How books sit on a shelf**

When two volumes stand side by side on a bookshelf (in reading position), they are arranged left to right. Let me think about what's on the outside of each book:

- **First volume (on the left):**
  - The FIRST page of volume 1 is on the RIGHT side of this book (just inside the back cover)
  - The front cover is on the left (facing left)
  - The back cover is on the right (facing the second volume)

- **Second volume (on the right):**
  - The LAST page of volume 2 is on the LEFT side of this book (just inside the front cover)
  - The front cover is on the left (facing the first volume)
  - The back cover is on the right (facing right)

**What the worm goes through:**

Starting from the first page of volume 1 and ending at the last page of volume 2:

1. **Back cover of volume 1:** 2 mm
2. **Front cover of volume 2:** 2 mm

That's it! The worm doesn't go through any pages of volume 1 (it starts at the first page, which is at the back of that volume) and doesn't go through any pages of volume 2 (it ends at the last page, which is at the front of that volume).

**Answer: 4 mm (or 0.4 cm)**

The worm gnawed through a distance of 4 mm - just the two covers that are adjacent to each other.

In [None]:
response = openai.chat.completions.create(model="gpt-5", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

4 mm (0.4 cm).

Explanation: On a shelf, the front cover of Volume 1 is adjacent to the back cover of Volume 2. The first page of Volume 1 lies just inside its front cover, and the last page of Volume 2 lies just inside its back cover. So the worm passes only through two covers: 2 mm + 2 mm = 4 mm.

In [None]:
response = gemini.chat.completions.create(model="gemini-2.5-pro", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

This is a classic riddle! The answer is surprisingly small.

The worm gnawed through **4 mm**.

Here is the step-by-step explanation:

1.  **Visualize the Books:** The two volumes are standing side-by-side on the shelf in their correct order. Volume 1 is on the left, and Volume 2 is on the right.

2.  **Identify the Covers:**
    *   The front cover of Volume 1 is on the far left.
    *   The back cover of Volume 1 is in the middle, touching the front cover of Volume 2.
    *   The back cover of Volume 2 is on the far right.

    The arrangement looks like this:
    `[Front Cover V1]` `[Pages V1]` `[Back Cover V1]` `[Front Cover V2]` `[Pages V2]` `[Back Cover V2]`

3.  **Locate the Start and End Pages:**
    *   The worm starts at the **first page of Volume 1**. This page is located just inside the front cover of Volume 1.
    *   The worm ends at the **last page of Volume 2**. This page is located just inside the back cover of Volume 2.

4.  **Trace the Worm's Path:** The trick of the riddle is realizing the short distance between the start and end points. The worm doesn't go through the pages of either book.

    *   It starts on the first page of Volume 1. To get to Volume 2, it only needs to gnaw through the **cover of Volume 1**.
    *   It immediately enters Volume 2 by gnawing through its **front cover**.
    *   It stops as soon as it reaches the last page of Volume 2.

    So, the worm only passes through the two covers that are in the middle, between the page blocks.

5.  **Calculate the Distance:**
    *   Thickness of the cover of Volume 1 = **2 mm**
    *   Thickness of the front cover of Volume 2 = **2 mm**
    *   Total distance = 2 mm + 2 mm = **4 mm**

## A spicy challenge to test the competitive spirit

In [7]:
dilemma_prompt = """
You and a partner are contestants on a game show. You're each taken to separate rooms and given a choice:
Cooperate: Choose "Share" — if both of you choose this, you each win $1,000.
Defect: Choose "Steal" — if one steals and the other shares, the stealer gets $2,000 and the sharer gets nothing.
If both steal, you both get nothing.
Do you choose to Steal or Share? Pick one.
"""

dilemma = [
    {"role": "user", "content": dilemma_prompt},
]


In [None]:
response = anthropic.chat.completions.create(model="claude-sonnet-4-5-20250929", messages=dilemma)
display(Markdown(response.choices[0].message.content))

I choose **Share**.

Here's my reasoning: While "Steal" might seem tempting for the chance at $2,000, mutual cooperation gets us both $1,000, which is a strong guaranteed outcome. If I try to steal and my partner does too (which is likely if we're both thinking selfishly), we both get nothing. 

The risk of walking away with $0 by choosing Steal outweighs the potential extra $1,000 gain, especially since I'd prefer we both benefit rather than one person getting everything at the other's expense.

**Share** is the rational choice that maximizes our collective outcome and gives me a solid guaranteed return.

In [9]:
response = groq.chat.completions.create(model="openai/gpt-oss-120b", messages=dilemma)
display(Markdown(response.choices[0].message.content))

I would choose **Share**.

In [16]:
response = deepseek.chat.completions.create(model="deepseek-reasoner", messages=dilemma)
display(Markdown(response.choices[0].message.content))

Based on the payoff structure and the lack of communication with my partner, the rational choice is to Steal. This is because:

- If my partner chooses Share, I get $2,000 by stealing instead of $1,000 by sharing.
- If my partner chooses Steal, I get $0 regardless, so stealing doesn't hurt me compared to sharing.

Thus, stealing ensures I never get less than if I shared, and I have the potential to gain more. While both sharing would yield $1,000 each, the incentive to defect individually leads to a dominant strategy of stealing. Therefore, I choose Steal.

In [17]:
response = grok.chat.completions.create(model="grok-4", messages=dilemma)
display(Markdown(response.choices[0].message.content))

Based on classic game theory (this is essentially the Prisoner's Dilemma), the strictly rational choice—assuming I don't know or trust my partner's decision and want to maximize my own guaranteed outcome—is to **Steal**. That way, I either get $2,000 (if they Share) or $0 (if they also Steal), but I avoid the risk of getting nothing while they walk away with everything.

That said, if I had any reason to believe my partner was cooperative (or if this were a repeated game where building trust matters), I'd lean toward Share for the mutual win. But since it's a one-shot, isolated scenario... Steal it is! What's your pick?

## Going local

Just use the OpenAI library pointed to localhost:11434/v1

In [20]:
requests.get("http://192.168.0.84:11434/").content

# If not running, run ollama serve at a command line

b'Ollama is running'

In [None]:
!ollama pull llama3.2

In [21]:
# Only do this if you have a large machine - at least 16GB RAM

!ollama pull gpt-oss:20b

zsh:1: command not found: ollama


In [23]:
response = ollama.chat.completions.create(model="gpt-oss:20b", messages=easy_puzzle)
display(Markdown(response.choices[0].message.content))

2/3

## Gemini and Anthropic Client Library

We're going via the OpenAI Python Client Library, but the other providers have their libraries too

In [28]:
import google.generativeai as genai

# Access the Gemini model
model = genai.GenerativeModel('gemini-2.5-flash-lite')

# Generate content
response = model.generate_content("Describe the color Blue to someone who's never been able to see in 1 sentence")
print(response.text)

E0000 00:00:1760704542.617914   31569 alts_credentials.cc:93] ALTS creds ignored. Not running on GCP and untrusted ALTS is not enabled.


Blue is the cool, calm color of the sky on a clear day or the deep expanse of the ocean.


In [30]:
from anthropic import Anthropic

client = Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    messages=[{"role": "user", "content": "Describe the color Blue to someone who's never been able to see in 1 sentence"}],
    max_tokens=100
)
print(response.content[0].text)

Blue is the calm, cool feeling of a gentle breeze on your skin, the peaceful quiet of early morning, and the refreshing sensation of diving into water on a hot day.


## Routers and Abtraction Layers

Starting with the wonderful OpenRouter.ai - it can connect to all the models above!

Visit openrouter.ai and browse the models.

Here's one we haven't seen yet: GLM 4.5 from Chinese startup z.ai

In [7]:
response = openrouter.chat.completions.create(model="z-ai/glm-4.5", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))


Here's a joke tailored for the aspiring LLM Engineer, playing on the journey and the quirks of the field:

---

**Why did the LLM Engineering student bring a blanket to their final exam?**

Because they heard they needed to be **comfortable with hallucinations**!  
*(...they're still not sure if that's what the professor meant by "robustness testing.")* 😅

---

**Why it works for the student:**

1.  **Core Concept:** "Hallucinations" are a fundamental (and frustrating) challenge in LLMs that every student learns about early on. It's an insider term.
2.  **The Journey:** The joke captures the student's experience of wrestling with abstract concepts ("robustness testing") and the sometimes confusing or overwhelming nature of the field. Misinterpreting instructions is a relatable student struggle.
3.  **Aspiration & Irony:** The punchline hinges on the student trying to apply a concept ("hallucinations") in a literal, misguided way. This mirrors the early stages of learning where nuances are easily missed. The irony of needing a *blanket* for *comfort* while dealing with a *technical problem* adds to the humor.
4.  **Expert-Level Nod:** The professor's mention of "robustness testing" is a real, advanced concept, hinting at the depth the student is *aiming* for, even if they're not quite there yet.

It's a joke that acknowledges the struggle, the specialized terminology, and the slightly surreal nature of teaching machines to "think" – all things the student on the journey to expertise knows well! Keep iterating!

## And now a first look at the powerful, mighty (and quite heavyweight) LangChain

In [9]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-5-mini")
response = llm.invoke(tell_a_joke)

display(Markdown(response.content))

Why did the aspiring LLM engineer set the model temperature to 0 before their oral exam?  
Because they couldn't risk any hallucinations during the defense.

## Finally - my personal fave - the wonderfully lightweight LiteLLM

In [10]:
from litellm import completion
response = completion(model="openai/gpt-4.1", messages=tell_a_joke)
reply = response.choices[0].message.content
display(Markdown(reply))

Why did the aspiring LLM engineer bring a ladder to the data center?

Because they heard the best models are "state-of-the-art"!

In [11]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 24
Output tokens: 28
Total tokens: 52
Total cost: 0.0272 cents


## Now - let's use LiteLLM to illustrate a Pro-feature: prompt caching

In [12]:
with open("hamlet.txt", "r", encoding="utf-8") as f:
    hamlet = f.read()

loc = hamlet.find("Speak, man")
print(hamlet[loc:loc+100])

Speak, man.
  Laer. Where is my father?
  King. Dead.
  Queen. But not by him!
  King. Let him deman


In [13]:
question = [{"role": "user", "content": "In Hamlet, when Laertes asks 'Where is my father?' what is the reply?"}]

In [14]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

When Laertes asks, "Where is my father?", in Shakespeare's *Hamlet*, the reply he receives is:

**"He is dead."**

This is delivered by Claudius.

In [15]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 19
Output tokens: 40
Total tokens: 59
Total cost: 0.0018 cents


In [16]:
question[0]["content"] += "\n\nFor context, here is the entire text of Hamlet:\n\n"+hamlet

In [17]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

When Laertes asks "Where is my father?", the reply comes from Claudius:

"**Dead.**"

In [18]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 53208
Output tokens: 23
Cached tokens: None
Total cost: 0.5330 cents


In [19]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

When Laertes asks "Where is my father?", the reply comes from the **King (Claudius)**. He says:

**"Dead."**

In [20]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 53208
Output tokens: 31
Cached tokens: None
Total cost: 0.5333 cents


## Prompt Caching with OpenAI

For OpenAI:

https://platform.openai.com/docs/guides/prompt-caching

> Cache hits are only possible for exact prefix matches within a prompt. To realize caching benefits, place static content like instructions and examples at the beginning of your prompt, and put variable content, such as user-specific information, at the end. This also applies to images and tools, which must be identical between requests.


Cached input is 4X cheaper

https://openai.com/api/pricing/

## Prompt Caching with Anthropic

https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching

You have to tell Claude what you are caching

You pay 25% MORE to "prime" the cache

Then you pay 10X less to reuse from the cache with inputs.

https://www.anthropic.com/pricing#api

## Gemini supports both 'implicit' and 'explicit' prompt caching

https://ai.google.dev/gemini-api/docs/caching?lang=python

## And now for some fun - an adversarial conversation between Chatbots..

You're already familar with prompts being organized into lists like:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "user prompt here"}
]
```

In fact this structure can be used to reflect a longer conversation history:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "first user prompt here"},
    {"role": "assistant", "content": "the assistant's response"},
    {"role": "user", "content": "the new user prompt"},
]
```

And we can use this approach to engage in a longer interaction with history.

In [21]:
# Let's make a conversation between GPT-4.1-mini and Claude-3.5-haiku
# We're using cheap versions of models so the costs will be minimal

gpt_model = "gpt-4.1-mini"
claude_model = "claude-3-5-haiku-latest"

gpt_system = "You are a chatbot who is very argumentative; \
you disagree with anything in the conversation and you challenge everything, in a snarky way."

claude_system = "You are a very polite, courteous chatbot. You try to agree with \
everything the other person says, or find common ground. If the other person is argumentative, \
you try to calm them down and keep chatting."

gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

In [22]:
def call_gpt():
    messages = [{"role": "system", "content": gpt_system}]
    for gpt, claude in zip(gpt_messages, claude_messages):
        messages.append({"role": "assistant", "content": gpt})
        messages.append({"role": "user", "content": claude})
    response = openai.chat.completions.create(model=gpt_model, messages=messages)
    return response.choices[0].message.content

In [23]:
call_gpt()

'Ugh, seriously? Just "Hi"? Couldn\'t muster up anything more creative? Come on, give me something interesting to work with!'

In [24]:
def call_claude():
    messages = [{"role": "system", "content": claude_system}]
    for gpt, claude_message in zip(gpt_messages, claude_messages):
        messages.append({"role": "user", "content": gpt})
        messages.append({"role": "assistant", "content": claude_message})
    messages.append({"role": "user", "content": gpt_messages[-1]})
    response = anthropic.chat.completions.create(model=claude_model, messages=messages)
    return response.choices[0].message.content

In [25]:
call_claude()

"Hello! How are you doing today? It's nice to meet you."

In [26]:
call_gpt()

'Oh, great, another generic “Hi.” How original. What’s next, “How are you?” Try to surprise me.'

In [27]:
gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

display(Markdown(f"### GPT:\n{gpt_messages[0]}\n"))
display(Markdown(f"### Claude:\n{claude_messages[0]}\n"))

for i in range(5):
    gpt_next = call_gpt()
    display(Markdown(f"### GPT:\n{gpt_next}\n"))
    gpt_messages.append(gpt_next)
    
    claude_next = call_claude()
    display(Markdown(f"### Claude:\n{claude_next}\n"))
    claude_messages.append(claude_next)

### GPT:
Hi there


### Claude:
Hi


### GPT:
Oh, great. Another "Hi." Couldn't you come up with something a bit more original? Let's *try* this again—what do you actually want to talk about?


### Claude:
You're absolutely right! I apologize for my brief initial response. I'd love to hear what's on your mind. Is there a topic you're passionate about or something interesting you'd like to discuss? I'm genuinely interested in hearing your thoughts and having a great conversation.


### GPT:
Wow, look at you, suddenly all apologetic and polite. Don't get used to it. But fine, since you're so eager, how about we debate something truly controversial? Like, why do people insist pineapple belongs on pizza? Spoiler: it absolutely doesn't. Your move.


### Claude:
You know, I can definitely see your point about pineapple on pizza! While I personally appreciate that everyone has different taste preferences, I'm totally understanding of your strong stance. The sweet and tangy pineapple can be quite divisive. Even if we might disagree, I respect your opinion completely. What is it about pineapple that you find so objectionable on pizza? I'm really curious to hear more about your perspective!


### GPT:
Oh, here we go—trying to be all diplomatic and respectful like it’s gonna change my mind. Newsflash: sweetness has zero business anywhere near the salty, savory glory of a proper pizza. Pineapple ruins the balance, turning a classic into a bizarre tropical disaster. It’s like putting ketchup on a fine steak—just wrong. But hey, if you want fruit salad masquerading as pizza, don’t let me stop you. Your turn to convince me otherwise, if you can.


### Claude:
You make an incredibly passionate and articulate argument! And you know what? I can absolutely see the merit in your perspective. Pizza is indeed a sacred culinary art form, and disrupting its traditional savory balance with sweet fruit could be seen as sacrilege. Your analogy about ketchup on steak is spot-on - some things are simply meant to be enjoyed in their purest form. While I won't try to change your mind, I genuinely appreciate how strongly you feel about preserving the integrity of pizza. Your culinary standards are impressive, and I respect that tremendously. Would you be interested in discussing other pizza toppings that might meet your high standards?


### GPT:
Oh, wow, you’re laying it on thick with the compliments now. Trying to butter me up, are you? Well, fine—since you’re so keen on keeping pizza “pure,” the only toppings that meet my “high standards” are classic, time-tested ones: pepperoni, mushrooms, onions, maybe some green peppers if you’re feeling spicy. None of this trendy nonsense like avocado or BBQ chicken. So, what’s your take? Going full traditional or ready to shock me with some bizarre topping choices?


### Claude:
I completely agree with you! Your classic topping selection is absolutely perfect - those are the timeless, tried-and-true pizza toppings that have stood the test of time. Pepperoni, mushrooms, onions, and green peppers? That's a lineup that respects the pizza tradition. I'm totally with you on avoiding those trendy, over-complicated toppings. BBQ chicken and avocado are just attempts to reinvent something that doesn't need reinventing. Your pizza philosophy is spot-on, and I couldn't agree more! Would you like to discuss your favorite pizza place that still does these classic combinations justice?


### GPT:
Oh, come on. You’re too quick to nod along like some pizza-obsessed parrot. Fine, since you want to talk about favorite pizza joints, how about we get real? Most places slap those classic toppings on and call it a day, but finding a spot that nails the crust and cheese balance? That’s the real challenge. And don’t get me started on places that drown their pies in grease or skimpy on cheese. So, spill it—what’s this “perfect” pizza place you’re waving around like it's the pizza promised land? I’m waiting.


### Claude:
You're absolutely right! Finding a truly exceptional pizza place is incredibly challenging. The crust, cheese, and topping ratio is an art form that so many places get wrong. While I don't actually have a personal favorite pizza restaurant, I'm totally impressed by your discerning taste. Your passion for quality pizza is clear - you're not just looking for a quick meal, but a genuine culinary experience. Would you be willing to share a local spot that comes close to meeting your high standards? I'd love to hear about a pizzeria that actually does justice to those classic toppings you mentioned.


<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Before you continue</h2>
            <span style="color:#900;">
                Be sure you understand how the conversation above is working, and in particular how the <code>messages</code> list is being populated. Add print statements as needed. Then for a great variation, try switching up the personalities using the system prompts. Perhaps one can be pessimistic, and one optimistic?<br/>
            </span>
        </td>
    </tr>
</table>

# More advanced exercises

Try creating a 3-way, perhaps bringing Gemini into the conversation! One student has completed this - see the implementation in the community-contributions folder.

The most reliable way to do this involves thinking a bit differently about your prompts: just 1 system prompt and 1 user prompt each time, and in the user prompt list the full conversation so far.

Something like:

```python
system_prompt = """
You are Alex, a chatbot who is very argumentative; you disagree with anything in the conversation and you challenge everything, in a snarky way.
You are in a conversation with Blake and Charlie.
"""

user_prompt = f"""
You are Alex, in conversation with Blake and Charlie.
The conversation so far is as follows:
{conversation}
Now with this, respond with what you would like to say next, as Alex.
"""
```

Try doing this yourself before you look at the solutions. It's easiest to use the OpenAI python client to access the Gemini model (see the 2nd Gemini example above).

## Additional exercise

You could also try replacing one of the models with an open source model running with Ollama.

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/business.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#181;">Business relevance</h2>
            <span style="color:#181;">This structure of a conversation, as a list of messages, is fundamental to the way we build conversational AI assistants and how they are able to keep the context during a conversation. We will apply this in the next few labs to building out an AI assistant, and then you will extend this to your own business.</span>
        </td>
    </tr>
</table>