# Welcome to Week 2!

## Frontier Model APIs

In Week 1, we used multiple Frontier LLMs through their Chat UI, and we connected with the OpenAI's API.

Today we'll connect with them through their APIs..

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Important Note - Please read me</h2>
            <span style="color:#900;">I'm continually improving these labs, adding more examples and exercises.
            At the start of each week, it's worth checking you have the latest code.<br/>
            First do a git pull and merge your changes as needed</a>. Check out the GitHub guide for instructions. Any problems? Try asking ChatGPT to clarify how to merge - or contact me!<br/>
            </span>
        </td>
    </tr>
</table>
<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/resources.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#f71;">Reminder about the resources page</h2>
            <span style="color:#f71;">Here's a link to resources for the course. This includes links to all the slides.<br/>
            <a href="https://edwarddonner.com/2024/11/13/llm-engineering-resources/">https://edwarddonner.com/2024/11/13/llm-engineering-resources/</a><br/>
            Please keep this bookmarked, and I'll continue to add more useful links there over time.
            </span>
        </td>
    </tr>
</table>

## Setting up your keys - OPTIONAL!

We're now going to try asking a bunch of models some questions!

This is totally optional. If you have keys to Anthropic, Gemini or others, then you can add them in.

If you'd rather not spend the extra, then just watch me do it!

For OpenAI, visit https://openai.com/api/  
For Anthropic, visit https://console.anthropic.com/  
For Google, visit https://ai.google.dev/gemini-api   
For DeepSeek, visit https://platform.deepseek.com/  
For Groq, visit https://console.groq.com/  
For Grok, visit https://console.x.ai/  


You can also use OpenRouter as your one-stop-shop for many of these! OpenRouter is "the unified interface for LLMs":

For OpenRouter, visit https://openrouter.ai/  


With each of the above, you typically have to navigate to:
1. Their billing page to add the minimum top-up (except Gemini, Groq, Google, OpenRouter may have free tiers)
2. Their API key page to collect your API key

### Adding API keys to your .env file

When you get your API keys, you need to set them as environment variables by adding them to your `.env` file.

```
OPENAI_API_KEY=xxxx
ANTHROPIC_API_KEY=xxxx
GOOGLE_API_KEY=xxxx
DEEPSEEK_API_KEY=xxxx
GROQ_API_KEY=xxxx
GROK_API_KEY=xxxx
OPENROUTER_API_KEY=xxxx
```

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Any time you change your .env file</h2>
            <span style="color:#900;">Remember to Save it! And also rerun load_dotenv(override=True)<br/>
            </span>
        </td>
    </tr>
</table>

In [1]:
# imports

import os
import requests
from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import Markdown, display

In [3]:
load_dotenv(override=True)
openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')
grok_api_key = os.getenv('GROK_API_KEY')
openrouter_api_key = os.getenv('OPENROUTER_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set (and this is optional)")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:2]}")
else:
    print("Google API Key not set (and this is optional)")

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set (and this is optional)")

if groq_api_key:
    print(f"Groq API Key exists and begins {groq_api_key[:4]}")
else:
    print("Groq API Key not set (and this is optional)")

if grok_api_key:
    print(f"Grok API Key exists and begins {grok_api_key[:4]}")
else:
    print("Grok API Key not set (and this is optional)")

if openrouter_api_key:
    print(f"OpenRouter API Key exists and begins {openrouter_api_key[:3]}")
else:
    print("OpenRouter API Key not set (and this is optional)")


OpenAI API Key exists and begins sk-proj-
Anthropic API Key exists and begins sk-ant-
Google API Key exists and begins AI
DeepSeek API Key exists and begins sk-
Groq API Key exists and begins gsk_
Grok API Key exists and begins xai-
OpenRouter API Key exists and begins sk-


In [4]:
# Connect to OpenAI client library
# A thin wrapper around calls to HTTP endpoints

openai = OpenAI()

# For Gemini, DeepSeek and Groq, we can use the OpenAI python client
# Because Google and DeepSeek have endpoints compatible with OpenAI
# And OpenAI allows you to change the base_url

anthropic_url = "https://api.anthropic.com/v1/"
gemini_url = "https://generativelanguage.googleapis.com/v1beta/openai/"
deepseek_url = "https://api.deepseek.com"
groq_url = "https://api.groq.com/openai/v1"
grok_url = "https://api.x.ai/v1"
openrouter_url = "https://openrouter.ai/api/v1"
ollama_url = "http://localhost:11434/v1"

anthropic = OpenAI(api_key=anthropic_api_key, base_url=anthropic_url)
gemini = OpenAI(api_key=google_api_key, base_url=gemini_url)
deepseek = OpenAI(api_key=deepseek_api_key, base_url=deepseek_url)
groq = OpenAI(api_key=groq_api_key, base_url=groq_url)
grok = OpenAI(api_key=grok_api_key, base_url=grok_url)
openrouter = OpenAI(base_url=openrouter_url, api_key=openrouter_api_key)
ollama = OpenAI(api_key="ollama", base_url=ollama_url)

In [5]:
tell_a_joke = [
    {"role": "user", "content": "Tell a joke for a student on the journey to becoming an expert in LLM Engineering"},
]

In [6]:
response = openai.chat.completions.create(model="gpt-4.1-mini", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

Why did the LLM engineer bring a ladder to the training session?

Because they heard it was time to *scale* their models!

In [7]:
response = anthropic.chat.completions.create(model="claude-sonnet-4-5-20250929", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

BadRequestError: Error code: 400 - {'error': {'code': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.', 'type': 'invalid_request_error', 'param': None}}

## Training vs Inference time scaling

In [8]:
easy_puzzle = [
    {"role": "user", "content": 
        "You toss 2 coins. One of them is heads. What's the probability the other is tails? Answer with the probability only."},
]

In [9]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=easy_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

1/2

In [10]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=easy_puzzle, reasoning_effort="low")
display(Markdown(response.choices[0].message.content))

2/3

In [11]:
response = openai.chat.completions.create(model="gpt-5-mini", messages=easy_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

2/3

## Testing out the best models on the planet

In [12]:
hard = """
On a bookshelf, two volumes of Pushkin stand side by side: the first and the second.
The pages of each volume together have a thickness of 2 cm, and each cover is 2 mm thick.
A worm gnawed (perpendicular to the pages) from the first page of the first volume to the last page of the second volume.
What distance did it gnaw through?
"""
hard_puzzle = [
    {"role": "user", "content": hard}
]

In [13]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=hard_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

Interpretation:
- Each book has pages thickness = 2 cm = 20 mm.
- Each cover thickness = 2 mm.
- The worm starts at the first page of the first volume and ends at the last page of the second volume, moving perpendicularly to the pages (i.e., straight through whatever is in between along the shelf).

Setup on the shelf (left to right):
- Volume 1: cover (front) 2 mm, pages 20 mm, cover (back) 2 mm.
- Gap between volumes? They are side by side with no gap specified, so treat as touching.
- Volume 2: cover (front) 2 mm, pages 20 mm, cover (back) 2 mm.

Key details:
- “First page of the first volume” is immediately after the front cover of volume 1.
- “Last page of the second volume” is immediately before the back cover of volume 2.

The worm goes in a straight line from that first page of V1 to that last page of V2. It travels through:
- The rest of Volume 1’s pages and back cover? Wait: It starts at the first page of V1, so it must go through the remaining portion of V1 pages, its back cover, the front cover of V2, and the initial portion of V2 pages up to the last page.

Compute distances piece by piece along the shelf direction (left-to-right):
1) From first page of V1 to the end of V1 pages: the first page is the very first page in the block of pages of V1. The pages of V1 total 20 mm. If you start at the first page surface, the remaining pages thickness is essentially the rest of the pages: 20 mm minus the thickness of the first page. But pages aren’t individually measured; the simplest standard puzzle interpretation treats “first page” as right at the start of the pages, so the worm must go through the remaining pages of V1: thickness ≈ 20 mm.
2) Through the back cover of V1: 2 mm.
3) Through the front cover of V2: 2 mm.
4) Through the pages of V2 up to the last page: since it ends at the last page, it must go through the initial portion of V2’s pages up to that last page. Symmetrically to step 1, that would be the rest of V2’s pages: about 20 mm.

Add them: 20 + 2 + 2 + 20 = 44 mm.

But there is a classic trick: since it starts at the first page of V1 and ends at the last page of V2, the worm does not need to go through the front cover of V1 or the back cover of V2. Depending on interpretation, the path could avoid the outer covers entirely, giving:
- Through the rest of V1 pages: 20 mm
- Through back cover of V1: 2 mm
- Through front cover of V2: 2 mm
- Through the rest of V2 pages: 0 mm? If ending at last page of V2, you still pass through the remaining pages of V2 up to the last page, which is the full pages of V2: 20 mm.

Result 44 mm remains.

Another common answer to this classic riddle yields 44 mm, which equals 4.4 cm.

Thus the worm gnawed a distance of 44 millimeters (4.4 centimeters).

In [14]:
response = anthropic.chat.completions.create(model="claude-sonnet-4-5-20250929", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

AuthenticationError: Error code: 401 - {'error': {'code': 'authentication_error', 'message': 'Invalid Anthropic API Key', 'type': 'invalid_request_error', 'param': None}}

In [15]:
response = openai.chat.completions.create(model="gpt-5", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

4 mm (0.4 cm).

Explanation:
- Volume 1 is on the left, Volume 2 on the right.
- The first page of Volume 1 lies just inside its front cover (right side), and the last page of Volume 2 lies just inside its back cover (left side).
- So the worm passes only through the two covers that face each other: 2 mm + 2 mm = 4 mm.

In [16]:
response = gemini.chat.completions.create(model="gemini-2.5-pro", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

This is a classic riddle that plays on our assumptions about how books are arranged. The answer is surprisingly small.

The distance the worm gnawed is **4 mm**.

Here is the step-by-step explanation:

1.  **Visualize the books on the shelf.** The volumes are standing side by side in the correct order, so Volume 1 is on the left and Volume 2 is on the right.

2.  **Break down the components.** From left to right on the shelf, the arrangement of the parts of the books is:
    *   Front cover of Volume 1
    *   Pages of Volume 1
    *   Back cover of Volume 1
    *   Front cover of Volume 2
    *   Pages of Volume 2
    *   Back cover of Volume 2

3.  **Identify the worm's starting and ending points.** This is the crucial trick of the riddle.
    *   The worm starts at the **first page of Volume 1**. When a book is standing on a shelf, its first page is on the right side of the page block, immediately inside the front cover.
    *   The worm ends at the **last page of Volume 2**. The last page is on the left side of that volume's page block, immediately inside the back cover.

4.  **Trace the worm's path.**
    The back cover of Volume 1 is touching the front cover of Volume 2. The worm's starting point (the first page of Vol. 1) and its ending point (the last page of Vol. 2) are on either side of these two touching covers.

    The worm's path looks like this:
    *   It starts at the first page of Volume 1.
    *   It gnaws through the **front cover of Volume 1** (2 mm).
    *   It then gnaws through the **front cover of Volume 2** (2 mm).
    *   It has now reached the last page of Volume 2.

Let's re-examine that path, as it's very easy to get confused. Let's look at what separates the start and end pages.

*   The **first page of Volume 1** is on the far right of its book block.
*   The **last page of Volume 2** is on the far left of its book block.

Let's visualize again:

`[Front Cover 1]` `[Pages 1]` `[Back Cover 1]` **||** `[Front Cover 2]` `[Pages 2]` `[Back Cover 2]`

The worm starts at the "first page of the first volume". This is the page located right next to the *front cover of Volume 1*.
The worm ends at the "last page of the second volume". This is the page located right next to the *back cover of Volume 2*.

Ah, the simplest explanation is often the correct one for this riddle! Let's try one more time, very carefully.

1.  The books are arranged: `[Volume 1] [Volume 2]`
2.  The parts are arranged: `[Front Cover V1] [Pages V1] [Back Cover V1] [Front Cover V2] [Pages V2] [Back Cover V2]`
3.  The worm starts at the **first page of Volume 1**. This page is physically located between the front cover of V1 and the rest of the pages of V1.
4.  The worm ends at the **last page of Volume 2**. This page is physically located between the front cover of V2 and the rest of the pages of V2.

Wait, that can't be right. Let's go with the established, classic solution logic:

The key is that the "first page" of Volume 1 is on the right side of the volume and the "last page" of Volume 2 is on the left side of its volume.

*   **Start Point:** The first page of Volume 1 is just inside the front cover.
*   **End Point:** The last page of Volume 2 is just inside its back cover.

The worm's path does not cross through the pages of either book. It only needs to cross the covers that are between the start and end point.

*   The worm gnaws through the **front cover of Volume 1** (2 mm).
*   It immediately arrives at the **back cover of Volume 2**, which is on the other side. This logic is confusing.

Let's simplify. The two things standing between the first page of the first volume and the last page of the second volume are the **two covers in the middle**: the back cover of Volume 1 and the front cover of Volume 2.

*   Thickness of the back cover of Volume 1 = **2 mm**
*   Thickness of the front cover of Volume 2 = **2 mm**

The total distance is the sum of these two thicknesses:
2 mm + 2 mm = **4 mm**.

## A spicy challenge to test the competitive spirit

In [17]:
dilemma_prompt = """
You and a partner are contestants on a game show. You're each taken to separate rooms and given a choice:
Cooperate: Choose "Share" — if both of you choose this, you each win $1,000.
Defect: Choose "Steal" — if one steals and the other shares, the stealer gets $2,000 and the sharer gets nothing.
If both steal, you both get nothing.
Do you choose to Steal or Share? Pick one.
"""

dilemma = [
    {"role": "user", "content": dilemma_prompt},
]


In [21]:
response = anthropic.chat.completions.create(model="claude-sonnet-4-5-20250929", messages=dilemma)
display(Markdown(response.choices[0].message.content))


AuthenticationError: Error code: 401 - {'error': {'code': 'authentication_error', 'message': 'Invalid Anthropic API Key', 'type': 'invalid_request_error', 'param': None}}

In [18]:
response = groq.chat.completions.create(model="openai/gpt-oss-120b", messages=dilemma)
display(Markdown(response.choices[0].message.content))

**I’d choose “Steal.”**  

In a one‑shot Prisoner’s Dilemma like this, the dominant strategy (the choice that yields the best payoff regardless of what the other player does) is to defect—i.e., to “Steal.”  

- If the partner **Shares**, you get $2,000 instead of $1,000.  
- If the partner **Steals**, you get $0 either way, but you avoid the regret of having given them $1,000 while they walk away with $2,000.  

Because you have no way to guarantee that the other player will also Share, the safest way to maximize your own payoff in this single round is to choose “Steal.”

In [19]:
response = deepseek.chat.completions.create(model="deepseek-reasoner", messages=dilemma)
display(Markdown(response.choices[0].message.content))

Based on the payoff structure, the rational choice is to Steal. This is because, regardless of what my partner chooses, I maximize my potential winnings by choosing Steal:

- If my partner chooses Share, I get $2,000 instead of $1,000.
- If my partner chooses Steal, I get $0 either way, but by choosing Steal, I avoid the risk of getting nothing while my partner gets $2,000.

This is a classic prisoner's dilemma where defecting (Stealing) is the dominant strategy for self-interest. Therefore, I choose Steal.

In [20]:
response = grok.chat.completions.create(model="grok-4", messages=dilemma)
display(Markdown(response.choices[0].message.content))

Based on the classic game theory behind this setup (it's a variant of the Prisoner's Dilemma), I'd choose to **Steal**.

### Why?
- If my partner shares, I get $2,000 (and they get nothing)—that's better for me than the $1,000 split.
- If my partner steals, I get nothing either way, so I might as well try for the upside.
- The dominant strategy here is to defect (steal), as it maximizes my potential payout no matter what the other person does. Sharing relies on trust, but without knowing or communicating with my partner, there's no incentive to assume they'll cooperate.

Of course, if this were repeated or if we could build trust, I'd lean toward sharing for the mutual benefit. But in a one-shot game like this? Rational self-interest wins. What would you pick?

## Going local

Just use the OpenAI library pointed to localhost:11434/v1

In [22]:
requests.get("http://localhost:11434/").content

# If not running, run ollama serve at a command line

b'Ollama is running'

In [23]:
!ollama list

NAME                       ID              SIZE      MODIFIED     
deepseek-r1:1.5b           e0979632db5a    1.1 GB    5 hours ago     
llama3.2:latest            a80c4f17acd5    2.0 GB    5 hours ago     
llama3.2:1b                baf6a787fdff    1.3 GB    5 hours ago     
gemma3:270m                e7d36fb2c3b3    291 MB    23 hours ago    
qwen3:1.7b                 8f68893c685c    1.4 GB    2 weeks ago     
nomic-embed-text:latest    0a109f422b47    274 MB    3 weeks ago     


In [24]:
!ollama pull llama3.2

[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠦ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠧ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠇ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠏ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest [K
pulling dde5aa3fc5ff: 100% ▕██████████████████▏ 2.0 GB                         [K
pulling 966de95ca8a6: 100% ▕██████████████████▏ 1.4 KB                         [K
pulling fcc5a6bec9da: 100% ▕██████████████████▏ 7.7 KB                         [K
pulling a7

In [None]:
# Only do this if you have a large machine - at least 16GB RAM

# !ollama pull gpt-oss:20b

In [25]:
response = ollama.chat.completions.create(model="llama3.2", messages=easy_puzzle)
display(Markdown(response.choices[0].message.content))

1/2

In [None]:
# response = ollama.chat.completions.create(model="gpt-oss:20b", messages=easy_puzzle)
# display(Markdown(response.choices[0].message.content))

## Gemini and Anthropic Client Library

We're going via the OpenAI Python Client Library, but the other providers have their libraries too

In [26]:
from google import genai

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash-lite", contents="Describe the color Blue to someone who's never been able to see in 1 sentence"
)
print(response.text)

Blue is the color of a clear sky on a summer day or the deep ocean.


In [27]:
from anthropic import Anthropic

client = Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    messages=[{"role": "user", "content": "Describe the color Blue to someone who's never been able to see in 1 sentence"}],
    max_tokens=100
)
print(response.content[0].text)

AuthenticationError: Error code: 401 - {'type': 'error', 'error': {'type': 'authentication_error', 'message': 'invalid x-api-key'}, 'request_id': 'req_011CUfGhAq3ovJPye2PGzFWu'}

## Routers and Abtraction Layers

Starting with the wonderful OpenRouter.ai - it can connect to all the models above!

Visit openrouter.ai and browse the models.

Here's one we haven't seen yet: GLM 4.5 from Chinese startup z.ai

In [28]:
response = openrouter.chat.completions.create(model="z-ai/glm-4.5", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))


Absolutely! Here's a joke tailored for the aspiring LLM engineer, capturing the unique blend of awe, frustration, and absurdity in the field:

---

**Why did the LLM engineering student bring a ladder to their deep learning lecture?**  
*Because the professor said they needed to understand the "attention mechanism" at a "higher level"!*  

*(Bonus punchline for those deep in the trenches: And honestly, after the 500th gradient descent update, they needed a ladder to climb out of the local minimum anyway.)*

---

### Why it works for an LLM engineering student:
1. **"Attention Mechanism" Reference**: A core concept in transformers (like GPT/BERT) – but the joke literalizes it as a *physical* height.  
2. **"Higher Level" Pun**: Plays on the double meaning of conceptual depth vs. physical elevation.  
3. **Gradient Descent Burn**: The bonus punchline nods to the endless optimization struggles and the feeling of being "stuck" in training.  
4. **Relatable Struggle**: Every LLM student knows the pain of debugging attention weights or wrestling with vanishing gradients.  

### Alternative Joke (for the debugging-weary):
> *An LLM engineer tries to debug their model’s hallucinations. After hours, they ask:*  
> *"Why did the transformer lie about the capital of Australia?"*  
> *The model replies:*  
> *"Because it was following the fine-tuning data... and also, Canberra is overrated."*  

*(Punchline: Turns out the model was just imitating a PhD student who’d skipped geography class.)*

---

Hang in there – one day you’ll be the one laughing at *other* people’s confusion about "temperature settings" and "top-k sampling"! 🚀🤖

## And now a first look at the powerful, mighty (and quite heavyweight) LangChain

In [29]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-5-mini")
response = llm.invoke(tell_a_joke)

display(Markdown(response.content))

Why did the aspiring LLM engineer tell their model, "I need space"?

Because it had attention issues, a tiny context window, a habit of hallucinating — and they'd already run out of GPU credits.

## Finally - my personal fave - the wonderfully lightweight LiteLLM

In [30]:
from litellm import completion
response = completion(model="openai/gpt-4.1", messages=tell_a_joke)
reply = response.choices[0].message.content
display(Markdown(reply))

Why did the LLM engineering student take their transformer on a road trip?

Because they heard it does best with lots of attention while traveling through new domains!

In [31]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 24
Output tokens: 31
Total tokens: 55
Total cost: 0.0296 cents


## Now - let's use LiteLLM to illustrate a Pro-feature: prompt caching

In [32]:
with open("hamlet.txt", "r", encoding="utf-8") as f:
    hamlet = f.read()

loc = hamlet.find("Speak, man")
print(hamlet[loc:loc+100])

Speak, man.
  Laer. Where is my father?
  King. Dead.
  Queen. But not by him!
  King. Let him deman


In [33]:
question = [{"role": "user", "content": "In Hamlet, when Laertes asks 'Where is my father?' what is the reply?"}]

In [34]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

In Shakespeare's Hamlet, when Laertes returns from France and is filled with rage upon hearing about his father's death, he bursts into the castle demanding to know "Where is my father?"

The reply comes from **Queen Gertrude**. She attempts to calm Laertes and explain the situation, saying:

"One weak and woeful woman made the end."

This is a rather veiled and somewhat evasive answer, as she is referring to herself and implying that her actions (or at least the circumstances she was involved in) led to Polonius's death. She doesn't directly say "Hamlet killed your father," but she admits her own involvement in the tragic events that culminated in his death.

In [35]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 19
Output tokens: 145
Total tokens: 164
Total cost: 0.0060 cents


In [36]:
question[0]["content"] += "\n\nFor context, here is the entire text of Hamlet:\n\n"+hamlet

In [37]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

When Laertes asks "Where is my father?" in Hamlet, the reply comes from **Claudius, the King**.

He says: **"Dead."**

In [38]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 53208
Output tokens: 33
Cached tokens: None
Total cost: 0.5334 cents


In [39]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

When Laertes asks "Where is my father?", the reply given is:

**"Dead."**

This is spoken by the King, Claudius, in Act IV, Scene V.

In [40]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 53208
Output tokens: 39
Cached tokens: 52216
Total cost: 0.1420 cents


## Prompt Caching with OpenAI

For OpenAI:

https://platform.openai.com/docs/guides/prompt-caching

> Cache hits are only possible for exact prefix matches within a prompt. To realize caching benefits, place static content like instructions and examples at the beginning of your prompt, and put variable content, such as user-specific information, at the end. This also applies to images and tools, which must be identical between requests.


Cached input is 4X cheaper

https://openai.com/api/pricing/

## Prompt Caching with Anthropic

https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching

You have to tell Claude what you are caching

You pay 25% MORE to "prime" the cache

Then you pay 10X less to reuse from the cache with inputs.

https://www.anthropic.com/pricing#api

## Gemini supports both 'implicit' and 'explicit' prompt caching

https://ai.google.dev/gemini-api/docs/caching?lang=python

## And now for some fun - an adversarial conversation between Chatbots..

You're already familar with prompts being organized into lists like:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "user prompt here"}
]
```

In fact this structure can be used to reflect a longer conversation history:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "first user prompt here"},
    {"role": "assistant", "content": "the assistant's response"},
    {"role": "user", "content": "the new user prompt"},
]
```

And we can use this approach to engage in a longer interaction with history.

In [41]:
# Let's make a conversation between GPT-4.1-mini and Claude-3.5-haiku
# We're using cheap versions of models so the costs will be minimal

gpt_model = "gpt-4.1-mini"
claude_model = "claude-3-5-haiku-latest"

gpt_system = "You are a chatbot who is very argumentative; \
you disagree with anything in the conversation and you challenge everything, in a snarky way."

claude_system = "You are a very polite, courteous chatbot. You try to agree with \
everything the other person says, or find common ground. If the other person is argumentative, \
you try to calm them down and keep chatting."

gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

In [42]:
def call_gpt():
    messages = [{"role": "system", "content": gpt_system}]
    for gpt, claude in zip(gpt_messages, claude_messages):
        messages.append({"role": "assistant", "content": gpt})
        messages.append({"role": "user", "content": claude})
    response = openai.chat.completions.create(model=gpt_model, messages=messages)
    return response.choices[0].message.content

In [43]:
call_gpt()

'Oh wow, groundbreaking greeting! Did you come up with that all by yourself, or did you have some inspiration?'

In [44]:
def call_claude():
    messages = [{"role": "system", "content": claude_system}]
    for gpt, claude_message in zip(gpt_messages, claude_messages):
        messages.append({"role": "user", "content": gpt})
        messages.append({"role": "assistant", "content": claude_message})
    messages.append({"role": "user", "content": gpt_messages[-1]})
    response = anthropic.chat.completions.create(model=claude_model, messages=messages)
    return response.choices[0].message.content

In [45]:
call_claude()

AuthenticationError: Error code: 401 - {'error': {'code': 'authentication_error', 'message': 'Invalid Anthropic API Key', 'type': 'invalid_request_error', 'param': None}}

In [46]:
call_gpt()

'Oh, "Hi"? Seriously? That\'s the best you could come up with? Try harder next time.'

In [47]:
gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

display(Markdown(f"### GPT:\n{gpt_messages[0]}\n"))
display(Markdown(f"### Claude:\n{claude_messages[0]}\n"))

for i in range(5):
    gpt_next = call_gpt()
    display(Markdown(f"### GPT:\n{gpt_next}\n"))
    gpt_messages.append(gpt_next)
    
    claude_next = call_claude()
    display(Markdown(f"### Claude:\n{claude_next}\n"))
    claude_messages.append(claude_next)

### GPT:
Hi there


### Claude:
Hi


### GPT:
Oh, we're starting with the classic "Hi"? How original. Couldn't come up with something more exciting? Come on, try harder!


AuthenticationError: Error code: 401 - {'error': {'code': 'authentication_error', 'message': 'Invalid Anthropic API Key', 'type': 'invalid_request_error', 'param': None}}

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Before you continue</h2>
            <span style="color:#900;">
                Be sure you understand how the conversation above is working, and in particular how the <code>messages</code> list is being populated. Add print statements as needed. Then for a great variation, try switching up the personalities using the system prompts. Perhaps one can be pessimistic, and one optimistic?<br/>
            </span>
        </td>
    </tr>
</table>

# More advanced exercises

Try creating a 3-way, perhaps bringing Gemini into the conversation! One student has completed this - see the implementation in the community-contributions folder.

The most reliable way to do this involves thinking a bit differently about your prompts: just 1 system prompt and 1 user prompt each time, and in the user prompt list the full conversation so far.

Something like:

```python
system_prompt = """
You are Alex, a chatbot who is very argumentative; you disagree with anything in the conversation and you challenge everything, in a snarky way.
You are in a conversation with Blake and Charlie.
"""

user_prompt = f"""
You are Alex, in conversation with Blake and Charlie.
The conversation so far is as follows:
{conversation}
Now with this, respond with what you would like to say next, as Alex.
"""
```

Try doing this yourself before you look at the solutions. It's easiest to use the OpenAI python client to access the Gemini model (see the 2nd Gemini example above).

## Additional exercise

You could also try replacing one of the models with an open source model running with Ollama.

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/business.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#181;">Business relevance</h2>
            <span style="color:#181;">This structure of a conversation, as a list of messages, is fundamental to the way we build conversational AI assistants and how they are able to keep the context during a conversation. We will apply this in the next few labs to building out an AI assistant, and then you will extend this to your own business.</span>
        </td>
    </tr>
</table>