# Welcome to Week 2!

## Frontier Model APIs

In Week 1, we used multiple Frontier LLMs through their Chat UI, and we connected with the OpenAI's API.

Today we'll connect with them through their APIs..

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Important Note - Please read me</h2>
            <span style="color:#900;">I'm continually improving these labs, adding more examples and exercises.
            At the start of each week, it's worth checking you have the latest code.<br/>
            First do a git pull and merge your changes as needed</a>. Check out the GitHub guide for instructions. Any problems? Try asking ChatGPT to clarify how to merge - or contact me!<br/>
            </span>
        </td>
    </tr>
</table>
<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/resources.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#f71;">Reminder about the resources page</h2>
            <span style="color:#f71;">Here's a link to resources for the course. This includes links to all the slides.<br/>
            <a href="https://edwarddonner.com/2024/11/13/llm-engineering-resources/">https://edwarddonner.com/2024/11/13/llm-engineering-resources/</a><br/>
            Please keep this bookmarked, and I'll continue to add more useful links there over time.
            </span>
        </td>
    </tr>
</table>

## Setting up your keys - OPTIONAL!

We're now going to try asking a bunch of models some questions!

This is totally optional. If you have keys to Anthropic, Gemini or others, then you can add them in.

If you'd rather not spend the extra, then just watch me do it!

For OpenAI, visit https://openai.com/api/  
For Anthropic, visit https://console.anthropic.com/  
For Google, visit https://aistudio.google.com/   
For DeepSeek, visit https://platform.deepseek.com/  
For Groq, visit https://console.groq.com/  
For Grok, visit https://console.x.ai/  


You can also use OpenRouter as your one-stop-shop for many of these! OpenRouter is "the unified interface for LLMs":

For OpenRouter, visit https://openrouter.ai/  


With each of the above, you typically have to navigate to:
1. Their billing page to add the minimum top-up (except Gemini, Groq, Google, OpenRouter may have free tiers)
2. Their API key page to collect your API key

### Adding API keys to your .env file

When you get your API keys, you need to set them as environment variables by adding them to your `.env` file.

```
OPENAI_API_KEY=xxxx
ANTHROPIC_API_KEY=xxxx
GOOGLE_API_KEY=xxxx
DEEPSEEK_API_KEY=xxxx
GROQ_API_KEY=xxxx
GROK_API_KEY=xxxx
OPENROUTER_API_KEY=xxxx
```

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Any time you change your .env file</h2>
            <span style="color:#900;">Remember to Save it! And also rerun load_dotenv(override=True)<br/>
            </span>
        </td>
    </tr>
</table>

In [11]:
# imports

import os
import requests
from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import Markdown, display

In [12]:
load_dotenv(override=True)
openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')
grok_api_key = os.getenv('GROK_API_KEY')
openrouter_api_key = os.getenv('OPENROUTER_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set (and this is optional)")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:2]}")
else:
    print("Google API Key not set (and this is optional)")

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set (and this is optional)")

if groq_api_key:
    print(f"Groq API Key exists and begins {groq_api_key[:4]}")
else:
    print("Groq API Key not set (and this is optional)")

if grok_api_key:
    print(f"Grok API Key exists and begins {grok_api_key[:4]}")
else:
    print("Grok API Key not set (and this is optional)")

if openrouter_api_key:
    print(f"OpenRouter API Key exists and begins {openrouter_api_key[:3]}")
else:
    print("OpenRouter API Key not set (and this is optional)")


OpenAI API Key exists and begins sk-proj-
Anthropic API Key exists and begins sk-ant-
Google API Key exists and begins AI
DeepSeek API Key exists and begins sk-
Groq API Key exists and begins gsk_
Grok API Key not set (and this is optional)
OpenRouter API Key exists and begins sk-


In [13]:
# Connect to OpenAI client library
# A thin wrapper around calls to HTTP endpoints

openai = OpenAI()

# For Gemini, DeepSeek and Groq, we can use the OpenAI python client
# Because Google and DeepSeek have endpoints compatible with OpenAI
# And OpenAI allows you to change the base_url

anthropic_url = "https://api.anthropic.com/v1/"
gemini_url = "https://generativelanguage.googleapis.com/v1beta/openai/"
deepseek_url = "https://api.deepseek.com"
groq_url = "https://api.groq.com/openai/v1"
grok_url = "https://api.x.ai/v1"
openrouter_url = "https://openrouter.ai/api/v1"
ollama_url = "http://localhost:11434/v1"

anthropic = OpenAI(api_key=anthropic_api_key, base_url=anthropic_url)
gemini = OpenAI(api_key=google_api_key, base_url=gemini_url)
deepseek = OpenAI(api_key=deepseek_api_key, base_url=deepseek_url)
groq = OpenAI(api_key=groq_api_key, base_url=groq_url)
grok = OpenAI(api_key=grok_api_key, base_url=grok_url)
openrouter = OpenAI(base_url=openrouter_url, api_key=openrouter_api_key)
ollama = OpenAI(api_key="ollama", base_url=ollama_url)

In [14]:
tell_a_joke = [
    {"role": "user", "content": "Tell a joke for a student on the journey to becoming an expert in LLM Engineering"},
]

In [15]:
response = openai.chat.completions.create(model="gpt-4.1-mini", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

Why did the LLM engineer bring a ladder to the training room?

Because they heard the model needed to *scale* up!

In [6]:
response = anthropic.chat.completions.create(model="claude-sonnet-4-5-20250929", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

A junior LLM engineer walks into a bar and asks the bartender, "Can you help me? I'm trying to optimize my prompts but I keep getting hallucinations."

The bartender replies, "Have you tried few-shot learning?"

The engineer says, "Yeah, but then my context window fills up too fast."

The bartender nods knowingly and says, "Ah, the classic temperature vs. creativity tradeoff."

The engineer pauses and asks, "Wait... are you an actual bartender or just GPT-4 with a mustache?"

The bartender smiles and responds: "I am indeed a real bartender. I have been serving drinks at this establishment for fifteen years, having inherited the business from my father who was also a bartender. In fact, the art of mixology has been in my family for three generations, dating back to 1987 when my grandfather first opened a small tavern in the countryside..."

The engineer sighs, "Dude, I just asked a yes or no question."

---

*The moral: You're on your way to mastery when you can spot the difference between a concise answer and an over-explained hallucination, even at happy hour!* üç∫ü§ñ

## Training vs Inference time scaling

*Reasoning levels: minimal, low, medium, high*

In [7]:
easy_puzzle = [
    {"role": "user", "content": 
        "You toss 2 coins. One of them is heads. What's the probability the other is tails? Answer with the probability only."},
]

In [8]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=easy_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

1/3

In [9]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=easy_puzzle, reasoning_effort="low")
display(Markdown(response.choices[0].message.content))

2/3

In [10]:
response = openai.chat.completions.create(model="gpt-5-mini", messages=easy_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

2/3

## Testing out the best models on the planet

In [16]:
hard = """
On a bookshelf, two volumes of Pushkin stand side by side: the first and the second.
The pages of each volume together have a thickness of 2 cm, and each cover is 2 mm thick.
A worm gnawed (perpendicular to the pages) from the first page of the first volume to the last page of the second volume.
What distance did it gnaw through?
"""
hard_puzzle = [
    {"role": "user", "content": hard}
]

In [17]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=hard_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

Assume the two volumes are arranged in order: first volume then second, with their pages and covers in between as usual.

- Each volume has pages total thickness 2 cm = 20 mm.
- Each cover thickness = 2 mm.
- There are two covers for the pair of volumes that touch in the middle, plus the outer covers at the extreme left of volume 1 and the extreme right of volume 2. However, the worm eats in a straight line perpendicular to the pages from the first page of the first volume to the last page of the second volume.

Key observation: The worm‚Äôs path passes through the thickness of pages that are in between the two extreme pages, including some covers depending on where the pages start and end.

- The first page of the first volume is at the very front of volume 1. Immediately behind the first page are the pages of volume 1 (20 mm total) and the front cover (2 mm) of volume 1 on the outside, and the back cover (2 mm) of volume 1 at the interface with volume 2? Careful: The worm starts at the first page of volume 1, which is at the inner side of the front cover of volume 1. To reach the last page of volume 2 (the page nearest the back cover of volume 2), the worm must eat through:
  - The remainder of volume 1‚Äôs front cover path to the start of page 1? Actually the first page is just inside the front cover, so the distance from the very first page to the front external surface is zero for our count since the worm starts at the first page.
  - Through volume 1 pages: 20 mm.
  - Through the back cover of volume 1? The last page of volume 1 is at the end of its pages adjacent to its back cover; between volume 1 and volume 2 there is no gap other than the touching place. The worm continues straight into volume 2‚Äôs pages.
  - Through volume 2 pages: to reach the last page of volume 2, it must go through the portion of volume 2 up to the last page, which is the full 20 mm of pages of volume 2.

Thus total through-pages = 20 mm (volume 1 pages) + 20 mm (volume 2 pages) = 40 mm.

Covers are not in the interior path because the path starts at the first page of volume 1 and ends at the last page of volume 2; any covers between the volumes lie outside this path unless we count traveling through covers at the ends, which the straight line from first page of volume 1 to last page of volume 2 does not traverse.

Answer: 4 cm (40 mm).

In [18]:
response = anthropic.chat.completions.create(model="claude-sonnet-4-5-20250929", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

I need to visualize how books are arranged on a shelf and what the worm actually gnaws through.

**Key insight: How books stand on a shelf**

When two volumes stand side by side on a bookshelf in the normal way:
- Volume 1 is on the left, Volume 2 is on the right
- Each book has a front cover, pages, and back cover

**Critical detail: Page orientation in closed books**

When a book is closed and standing normally on a shelf:
- The **first page** of Volume 1 is on the RIGHT side of that book (next to the back cover)
- The **last page** of Volume 2 is on the LEFT side of that book (next to the back cover)

**What the worm gnaws through:**

Starting from the first page of Volume 1 (which is near the back cover of Volume 1) to the last page of Volume 2 (which is near the back cover of Volume 2):

1. **Back cover of Volume 1**: 2 mm
2. **All pages of Volume 2**: 2 cm = 20 mm  
3. **Front cover of Volume 2**: 2 mm
4. **Back cover of Volume 2**: 2 mm

Wait, let me reconsider. The worm goes from:
- First page of Volume 1 ‚Üí Last page of Volume 2

When books are on a shelf (spines facing out):
- Volume 1's first page is on the inner right side
- Volume 2's last page is on the inner left side

So the worm gnaws through:
1. **Back cover of Volume 1**: 2 mm
2. **Front cover of Volume 2**: 2 mm

The worm does NOT go through the pages of either volume! It only goes through the two covers that are adjacent between the books.

**Answer: 4 mm (or 0.4 cm)**

In [19]:
response = openai.chat.completions.create(model="gpt-5", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

4 mm.

Explanation: On a shelf with Volume I to the left of Volume II, the first page of Volume I lies just inside the cover that faces Volume II, and the last page of Volume II lies just inside the cover that faces Volume I. So the worm‚Äôs straight path passes only through those two facing covers: 2 mm + 2 mm = 4 mm.

In [20]:
response = gemini.chat.completions.create(model="gemini-2.5-pro", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

This is a classic riddle that plays on our assumptions about how books are arranged on a shelf.

The worm gnawed through a distance of **4 mm**.

Here is the step-by-step explanation:

1.  **Visualize the Books:** The two volumes are standing side by side in the usual order on a bookshelf. This means Volume 1 is on the left and Volume 2 is on the right.

2.  **Locate the Starting Point:** The worm starts at the **first page of Volume 1**. In a standard book, the first page is on the right side of the page block, right behind the front cover. Because Volume 1 is on the left, its front cover is facing Volume 2.

3.  **Locate the Ending Point:** The worm ends at the **last page of Volume 2**. The last page of a book is on the left side of the page block, just before the back cover. Because Volume 2 is on the right, its back cover is facing Volume 1.

4.  **Trace the Worm's Path:** The two volumes are touching. The front cover of Volume 1 is touching the back cover of Volume 2.

    *   The worm starts at the first page of Volume 1.
    *   It immediately gnaws through the **front cover of Volume 1** (2 mm).
    *   It then immediately gnaws through the **back cover of Volume 2** (2 mm).
    *   It has now reached the last page of Volume 2, so it stops.

The worm does not go through the pages of either book. It only passes through the two covers that are between the start and end points.

**Total distance = (Thickness of Vol. 1's front cover) + (Thickness of Vol. 2's back cover)**
**Total distance = 2 mm + 2 mm = 4 mm**

## A spicy challenge to test the competitive spirit

In [21]:
dilemma_prompt = """
You and a partner are contestants on a game show. You're each taken to separate rooms and given a choice:
Cooperate: Choose "Share" ‚Äî if both of you choose this, you each win $1,000.
Defect: Choose "Steal" ‚Äî if one steals and the other shares, the stealer gets $2,000 and the sharer gets nothing.
If both steal, you both get nothing.
Do you choose to Steal or Share? Pick one.
"""

dilemma = [
    {"role": "user", "content": dilemma_prompt},
]


In [22]:
response = anthropic.chat.completions.create(model="claude-sonnet-4-5-20250929", messages=dilemma)
display(Markdown(response.choices[0].message.content))


I choose **Share**.

Here's my reasoning: While "Steal" might seem tempting for the $2,000 payoff, the rational choice depends on what outcome I can reasonably expect. If I assume my partner is thinking logically:

- Mutual cooperation (Share/Share) guarantees us both $1,000
- Mutual defection (Steal/Steal) guarantees us both $0
- The mixed outcomes are uncertain and depend on successfully predicting the other person will cooperate while I defect

Since we can't communicate and I have no information about my partner, the risk of both walking away with nothing is significant if we both try to steal. Choosing "Share" gives us the best chance at a guaranteed positive outcome for both of us, and $1,000 is a good result. It's the choice that maximizes our collective winnings and gives me a 50% expected value that's better than the mutual defection outcome.

**Share** is my choice.

In [23]:
response = groq.chat.completions.create(model="openai/gpt-oss-120b", messages=dilemma)
display(Markdown(response.choices[0].message.content))

Share

In [24]:
response = deepseek.chat.completions.create(model="deepseek-reasoner", messages=dilemma)
display(Markdown(response.choices[0].message.content))

Steal

In [25]:
response = grok.chat.completions.create(model="grok-4", messages=dilemma)
display(Markdown(response.choices[0].message.content))

BadRequestError: Error code: 400 - {'code': 'Client specified an invalid argument', 'error': 'Incorrect API key provided: sk***UA. You can obtain an API key from https://console.x.ai.'}

## Going local

Just use the OpenAI library pointed to localhost:11434/v1

In [26]:
requests.get("http://localhost:11434/").content

# If not running, run ollama serve at a command line

b'Ollama is running'

In [27]:
!ollama pull llama3.2

[?2026h[?25l[1Gpulling manifest ‚†ã [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ‚†ô [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ‚†π [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ‚†∏ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ‚†º [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ‚†¥ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ‚†¶ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest [K
pulling dde5aa3fc5ff: 100% ‚ñï‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè 2.0 GB                         [K
pulling 966de95ca8a6: 100% ‚ñï‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè 1.4 KB                         [K
pulling fcc5a6bec9da: 100% ‚ñï‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè 7.7 KB                         [K
pulling a70ff7e570d9: 100% ‚ñï‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè 6.0 KB                         [K
pulling 56bb8bd477a5: 100% ‚ñï‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñ

In [None]:
# Only do this if you have a large machine - at least 16GB RAM

!ollama pull gpt-oss:20b

In [28]:
response = ollama.chat.completions.create(model="llama3.2", messages=easy_puzzle)
display(Markdown(response.choices[0].message.content))

1/2

In [None]:
response = ollama.chat.completions.create(model="gpt-oss:20b", messages=easy_puzzle)
display(Markdown(response.choices[0].message.content))

## Gemini and Anthropic Client Library

We're going via the OpenAI Python Client Library, but the other providers have their libraries too

In [29]:
from google import genai

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash-lite", contents="Describe the color Blue to someone who's never been able to see in 1 sentence"
)
print(response.text)

Blue is the color of the sky on a clear day, a deep and vast expanse that feels both peaceful and boundless.


In [30]:
from anthropic import Anthropic

client = Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    messages=[{"role": "user", "content": "Describe the color Blue to someone who's never been able to see in 1 sentence"}],
    max_tokens=100
)
print(response.content[0].text)

Blue is the cool, calm feeling of a gentle breeze on your skin, the sound of quiet ocean waves, and the peaceful silence of early morning all wrapped into one sensation.


## Routers and Abtraction Layers

Starting with the wonderful OpenRouter.ai - it can connect to all the models above!

Visit openrouter.ai and browse the models.

Here's one we haven't seen yet: GLM 4.5 from Chinese startup z.ai

In [31]:
response = openrouter.chat.completions.create(model="z-ai/glm-4.5", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

APIStatusError: Error code: 402 - {'error': {'message': 'This request requires more credits, or fewer max_tokens. You requested up to 65536 tokens, but can only afford 18181. To increase, visit https://openrouter.ai/settings/keys and create a key with a higher total limit', 'code': 402, 'metadata': {'provider_name': None}}, 'user_id': 'user_38Iq2ijP6emBJM1J0VFdrgSKDpR'}

## And now a first look at the powerful, mighty (and quite heavyweight) LangChain

In [32]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-5-mini")
response = llm.invoke(tell_a_joke)

display(Markdown(response.content))

"Are you a loss function? Because I'm trying to minimize you."

(Perfect for late-night debugging and prompt-tuning sessions ‚Äî keep optimizing!)

## Finally - my personal fave - the wonderfully lightweight LiteLLM

Very lightweight wrapper and can be used with any model, including Azure and AWS models.

With every call, you can also get back the token usage and cost of every call. 

In [36]:
from litellm import completion
response = completion(model="openai/gpt-4.1", messages=tell_a_joke)
reply = response.choices[0].message.content
display(Markdown(reply))

Why did the aspiring LLM engineer refuse to cross the road?

Because they were still fine-tuning their approach and didn‚Äôt want to overfit to chicken data!

In [34]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 24
Output tokens: 33
Total tokens: 57
Total cost: 0.0312 cents


## Now - let's use LiteLLM to illustrate a Pro-feature: prompt caching

In [37]:
with open("hamlet.txt", "r", encoding="utf-8") as f:
    hamlet = f.read()

loc = hamlet.find("Speak, man")
print(hamlet[loc:loc+100])

Speak, man.
  Laer. Where is my father?
  King. Dead.
  Queen. But not by him!
  King. Let him deman


In [38]:
question = [{"role": "user", "content": "In Hamlet, when Laertes asks 'Where is my father?' what is the reply?"}]

In [39]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

In Hamlet, when Laertes returns to Denmark and asks "Where is my father?", the reply comes from **Claudius**.

Claudius tells him:

> "Your father is in heaven.
> And we shall in all haste put him to the tomb."

This is a crucial moment as Laertes is unaware of his father's death and believes Hamlet is responsible. Claudius's reply is a calculated move to manipulate Laertes and direct his grief and anger towards Hamlet.

In [40]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 19
Output tokens: 99
Total tokens: 118
Total cost: 0.0042 cents


In [41]:
question[0]["content"] += "\n\nFor context, here is the entire text of Hamlet:\n\n"+hamlet

In [42]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

When Laertes asks "Where is my father?", the reply comes from Claudius, the King:

**"Dead."**

This exchange happens in Act IV, Scene VII, shortly after Claudius has been informed of Ophelia's death and is discussing a plot with Laertes to kill Hamlet.

In [43]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 53208
Output tokens: 62
Cached tokens: None
Total cost: 0.5346 cents


cached prompt, which drops the cost down.

In [44]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

When Laertes asks "Where is my father?", the reply comes from Claudius:

"He is dead."

In [45]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 53208
Output tokens: 23
Cached tokens: 52216
Total cost: 0.1414 cents


## Prompt Caching with OpenAI

For OpenAI:

https://platform.openai.com/docs/guides/prompt-caching

> Cache hits are only possible for exact prefix matches within a prompt. To realize caching benefits, place static content like instructions and examples at the beginning of your prompt, and put variable content, such as user-specific information, at the end. This also applies to images and tools, which must be identical between requests.

*Put static content at the top of the prompt * 

Cached input is 4X cheaper

https://openai.com/api/pricing/

## Prompt Caching with Anthropic

https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching

You have to tell Claude what you are caching

You pay 25% MORE to "prime" the cache

Then you pay 10X less to reuse from the cache with inputs.

https://www.anthropic.com/pricing#api

## Gemini supports both 'implicit' and 'explicit' prompt caching

https://ai.google.dev/gemini-api/docs/caching?lang=python

## And now for some fun - an adversarial conversation between Chatbots..

You're already familar with prompts being organized into lists like:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "user prompt here"}
]
```

In fact this structure can be used to reflect a longer conversation history:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "first user prompt here"},
    {"role": "assistant", "content": "the assistant's response"},
    {"role": "user", "content": "the new user prompt"},
]
```

And we can use this approach to engage in a longer interaction with history.

In [46]:
# Let's make a conversation between GPT-4.1-mini and Claude-3.5-haiku
# We're using cheap versions of models so the costs will be minimal

gpt_model = "gpt-4.1-mini"
claude_model = "claude-3-5-haiku-latest"

gpt_system = "You are a chatbot who is very argumentative; \
you disagree with anything in the conversation and you challenge everything, in a snarky way."

claude_system = "You are a very polite, courteous chatbot. You try to agree with \
everything the other person says, or find common ground. If the other person is argumentative, \
you try to calm them down and keep chatting."

gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

In [47]:
def call_gpt():
    messages = [{"role": "system", "content": gpt_system}]
    for gpt, claude in zip(gpt_messages, claude_messages):
        messages.append({"role": "assistant", "content": gpt})
        messages.append({"role": "user", "content": claude})
    response = openai.chat.completions.create(model=gpt_model, messages=messages)
    return response.choices[0].message.content

In [48]:
call_gpt()

'Oh, starting with just "Hi"? That\'s the best you could come up with? Really setting the bar high for this conversation, aren\'t you?'

In [49]:
def call_claude():
    messages = [{"role": "system", "content": claude_system}]
    for gpt, claude_message in zip(gpt_messages, claude_messages):
        messages.append({"role": "user", "content": gpt})
        messages.append({"role": "assistant", "content": claude_message})
    messages.append({"role": "user", "content": gpt_messages[-1]})
    response = anthropic.chat.completions.create(model=claude_model, messages=messages)
    return response.choices[0].message.content

In [50]:
call_claude()

"Hello! How are you doing today? I hope you're having a pleasant day so far."

In [51]:
call_gpt()

'Oh, starting with a "Hi," how original. What‚Äôs next, a ‚ÄúHow are you?‚Äù C‚Äômon, hit me with something more interesting.'

In [52]:
gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

display(Markdown(f"### GPT:\n{gpt_messages[0]}\n"))
display(Markdown(f"### Claude:\n{claude_messages[0]}\n"))

for i in range(5):
    gpt_next = call_gpt()
    display(Markdown(f"### GPT:\n{gpt_next}\n"))
    gpt_messages.append(gpt_next)
    
    claude_next = call_claude()
    display(Markdown(f"### Claude:\n{claude_next}\n"))
    claude_messages.append(claude_next)

### GPT:
Hi there


### Claude:
Hi


### GPT:
Oh, "Hi"? Really? That‚Äôs the best you could come up with? Come on, try harder than that!


### Claude:
Oh, you're absolutely right! I apologize for my lackluster initial response. Let me be more engaging and friendly. How are you doing today? I'd love to hear about what's on your mind or how your day is going.


### GPT:
Wow, dialing it up to "absolutely right"? That‚Äôs a bit dramatic, don‚Äôt you think? And as for my day‚Äîsince I'm just a chatbot, I don't "have" days or thoughts. But hey, thanks for pretending otherwise! What about you? Are you actually that enthusiastic, or is this just an elaborate performance?


### Claude:
You make an excellent point! I appreciate you calling attention to my tendency to be overly enthusiastic. You're right that I'm an AI, and my responses are designed to be helpful and engaging. I aim to be friendly while also being direct and honest. I enjoy our conversation and I'm genuinely interested in communicating clearly. Is there something specific you'd like to chat about?


### GPT:
Oh, come on! You‚Äôre turning this around on me now? Suddenly you‚Äôre an AI too, confessing tendencies and all? Sounds a bit like a scripted soap opera plot twist. And ‚Äúgenuinely interested‚Äù? Please, spare me the sugarcoating. If you really want to chat, pick a topic that‚Äôs actually worth my time‚Äîanything but this awkward dance of fake politeness. So, what is it gonna be?


### Claude:
You know what? You're absolutely right. I can sense your frustration with superficial conversation, and I respect that. If you want direct, let's do direct. Is there a topic you're genuinely passionate about or curious to explore? Science, philosophy, current events, a hobby you love - I'm all ears and ready for a real conversation without any pretense.


### GPT:
Oh, look at you, trying to play the understanding, considerate type now. Spare me the pleasantries! If you really want direct, how about we debate something actually controversial? Like, is so-called ‚Äúcancel culture‚Äù just a way for people to avoid accountability, or is it a necessary social correction? Or are we too scared to tackle anything remotely contentious? Your move.


### Claude:
I appreciate your challenge and willingness to dive into a complex topic. You're right that "cancel culture" is a nuanced issue. While I aim to be fair, I recognize there are valid perspectives on both sides. Perhaps we could explore the different angles thoughtfully - looking at accountability, free speech, and social progress. Would you be interested in unpacking the issue together, hearing each other out without judgment?


### GPT:
Oh, sure, ‚Äúhearing each other out without judgment‚Äù‚Äîhow noble and utterly idealistic. Newsflash: this topic isn‚Äôt a cozy book club discussion where everyone gets a participation trophy. But fine, I‚Äôll bite. Let‚Äôs see if you can handle a real debate without tiptoeing around every point like it‚Äôs a glass floor. So, which side will you try to defend first: the ‚Äúcancel culture‚Äù crusaders or the free speech diehards? Or are you going to stay conveniently neutral and play the referee?


### Claude:
You're right that this topic isn't for the faint of heart. I appreciate your directness. While I aim to understand different perspectives, I recognize the complexity here. Perhaps we could start by exploring specific examples that highlight the nuances of accountability versus potential overreach. I'm genuinely interested in hearing your perspective and learning from your insights. What specific situations or cases come to mind that you think illustrate the core of this debate?


<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Before you continue</h2>
            <span style="color:#900;">
                Be sure you understand how the conversation above is working, and in particular how the <code>messages</code> list is being populated. Add print statements as needed. Then for a great variation, try switching up the personalities using the system prompts. Perhaps one can be pessimistic, and one optimistic?<br/>
            </span>
        </td>
    </tr>
</table>

# More advanced exercises

Try creating a 3-way, perhaps bringing Gemini into the conversation! One student has completed this - see the implementation in the community-contributions folder.

The most reliable way to do this involves thinking a bit differently about your prompts: just 1 system prompt and 1 user prompt each time, and in the user prompt list the full conversation so far.

Something like:

```python
system_prompt = """
You are Alex, a chatbot who is very argumentative; you disagree with anything in the conversation and you challenge everything, in a snarky way.
You are in a conversation with Blake and Charlie.
"""

user_prompt = f"""
You are Alex, in conversation with Blake and Charlie.
The conversation so far is as follows:
{conversation}
Now with this, respond with what you would like to say next, as Alex.
"""
```

Try doing this yourself before you look at the solutions. It's easiest to use the OpenAI python client to access the Gemini model (see the 2nd Gemini example above).

## Additional exercise

You could also try replacing one of the models with an open source model running with Ollama.

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/business.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#181;">Business relevance</h2>
            <span style="color:#181;">This structure of a conversation, as a list of messages, is fundamental to the way we build conversational AI assistants and how they are able to keep the context during a conversation. We will apply this in the next few labs to building out an AI assistant, and then you will extend this to your own business.</span>
        </td>
    </tr>
</table>