# Welcome to Week 2!

## Frontier Model APIs

In Week 1, we used multiple Frontier LLMs through their Chat UI, and we connected with the OpenAI's API.

Today we'll connect with them through their APIs..

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Important Note - Please read me</h2>
            <span style="color:#900;">I'm continually improving these labs, adding more examples and exercises.
            At the start of each week, it's worth checking you have the latest code.<br/>
            First do a git pull and merge your changes as needed</a>. Check out the GitHub guide for instructions. Any problems? Try asking ChatGPT to clarify how to merge - or contact me!<br/>
            </span>
        </td>
    </tr>
</table>
<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/resources.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#f71;">Reminder about the resources page</h2>
            <span style="color:#f71;">Here's a link to resources for the course. This includes links to all the slides.<br/>
            <a href="https://edwarddonner.com/2024/11/13/llm-engineering-resources/">https://edwarddonner.com/2024/11/13/llm-engineering-resources/</a><br/>
            Please keep this bookmarked, and I'll continue to add more useful links there over time.
            </span>
        </td>
    </tr>
</table>

## Setting up your keys - OPTIONAL!

We're now going to try asking a bunch of models some questions!

This is totally optional. If you have keys to Anthropic, Gemini or others, then you can add them in.

If you'd rather not spend the extra, then just watch me do it!

For OpenAI, visit https://openai.com/api/  
For Anthropic, visit https://console.anthropic.com/  
For Google, visit https://ai.google.dev/gemini-api   
For DeepSeek, visit https://platform.deepseek.com/  
For Groq, visit https://console.groq.com/  
For Grok, visit https://console.x.ai/  


You can also use OpenRouter as your one-stop-shop for many of these! OpenRouter is "the unified interface for LLMs":

For OpenRouter, visit https://openrouter.ai/  


With each of the above, you typically have to navigate to:
1. Their billing page to add the minimum top-up (except Gemini, Groq, Google, OpenRouter may have free tiers)
2. Their API key page to collect your API key

### Adding API keys to your .env file

When you get your API keys, you need to set them as environment variables by adding them to your `.env` file.

```
OPENAI_API_KEY=xxxx
ANTHROPIC_API_KEY=xxxx
GOOGLE_API_KEY=xxxx
DEEPSEEK_API_KEY=xxxx
GROQ_API_KEY=xxxx
GROK_API_KEY=xxxx
OPENROUTER_API_KEY=xxxx
```

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Any time you change your .env file</h2>
            <span style="color:#900;">Remember to Save it! And also rerun load_dotenv(override=True)<br/>
            </span>
        </td>
    </tr>
</table>

In [1]:
# imports

import os
import requests
from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import Markdown, display

In [3]:
load_dotenv(override=True)
openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')
grok_api_key = os.getenv('GROK_API_KEY')
openrouter_api_key = os.getenv('OPENROUTER_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set (and this is optional)")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:2]}")
else:
    print("Google API Key not set (and this is optional)")

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set (and this is optional)")

if groq_api_key:
    print(f"Groq API Key exists and begins {groq_api_key[:4]}")
else:
    print("Groq API Key not set (and this is optional)")

if grok_api_key:
    print(f"Grok API Key exists and begins {grok_api_key[:4]}")
else:
    print("Grok API Key not set (and this is optional)")

if openrouter_api_key:
    print(f"OpenRouter API Key exists and begins {openrouter_api_key[:3]}")
else:
    print("OpenRouter API Key not set (and this is optional)")


OpenAI API Key exists and begins sk-proj-
Anthropic API Key exists and begins sk-ant-
Google API Key exists and begins AI
DeepSeek API Key exists and begins sk-
Groq API Key exists and begins gsk_
Grok API Key exists and begins xai-
OpenRouter API Key not set (and this is optional)


In [4]:
# Connect to OpenAI client library
# A thin wrapper around calls to HTTP endpoints

openai = OpenAI()

# For Gemini, DeepSeek and Groq, we can use the OpenAI python client
# Because Google and DeepSeek have endpoints compatible with OpenAI
# And OpenAI allows you to change the base_url

anthropic_url = "https://api.anthropic.com/v1/"
gemini_url = "https://generativelanguage.googleapis.com/v1beta/openai/"
deepseek_url = "https://api.deepseek.com"
groq_url = "https://api.groq.com/openai/v1"
grok_url = "https://api.x.ai/v1"
openrouter_url = "https://openrouter.ai/api/v1"
ollama_url = "http://localhost:11434/v1"

anthropic = OpenAI(api_key=anthropic_api_key, base_url=anthropic_url)
gemini = OpenAI(api_key=google_api_key, base_url=gemini_url)
deepseek = OpenAI(api_key=deepseek_api_key, base_url=deepseek_url)
groq = OpenAI(api_key=groq_api_key, base_url=groq_url)
grok = OpenAI(api_key=grok_api_key, base_url=grok_url)
openrouter = OpenAI(base_url=openrouter_url, api_key=openrouter_api_key)
ollama = OpenAI(api_key="ollama", base_url=ollama_url)

In [5]:
tell_a_joke = [
    {"role": "user", "content": "Tell a joke for a student on the journey to becoming an expert in LLM Engineering"},
]

In [6]:
response = openai.chat.completions.create(
    model="gpt-4.1-mini",
    messages=tell_a_joke
)
display(Markdown(f"**OpenAI GPT-4.1-mini:** {response.choices[0].message.content}"))

**OpenAI GPT-4.1-mini:** Sure! Here‚Äôs a joke for an aspiring LLM engineer:

Why did the LLM engineer bring a ladder to model training?

Because they wanted to reach the *next level* of understanding!

In [None]:
response = openai.chat.completions.create(model="gpt-4.1-mini", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

In [7]:
response = anthropic.chat.completions.create(model="claude-sonnet-4-5-20250929", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

Here's one for you:

---

**Why did the LLM engineer break up with their model?**

Because every time they asked it a question, it would hallucinate about their relationship, the context window was too short to remember their anniversary, and it kept trying to complete their sentences wrong.

But hey, at least it apologized with 95% confidence! üòÖ

---

*Bonus prompt engineer wisdom: "Remember, you're not arguing with the model‚Äîyou're just engaging in iterative prompt refinement until it agrees with you."*

Keep tokenizing those dreams! üöÄ

## Training vs Inference time scaling
* Training time scaling: Training large language models (LLMs) requires significant computational resources and time. As the size of the model increases, the training time tends to scale up due to the increased number of parameters that need to be optimized. This scaling is often non-linear, meaning that doubling the size of the model can lead to more than double the training time. Factors such as the complexity of the architecture, the amount of training data, and the efficiency of the training algorithms also influence training time scaling.
* Inference time scaling: Inference time refers to the time it takes for a trained model to make predictions on new data. Inference time scaling is generally more manageable than training time scaling, as it primarily depends on the model's architecture and the hardware used for deployment. However, larger models

In [8]:
easy_puzzle = [
    {"role": "user", "content": 
        "You toss 2 coins. One of them is heads. What's the probability the other is tails? Answer with the probability only."},
]

In [9]:
# reasoning_effort can be "minimal", "moderate", "extensive"
response = openai.chat.completions.create(model="gpt-5-nano", messages=easy_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

1/2

In [10]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=easy_puzzle, reasoning_effort="low")
display(Markdown(response.choices[0].message.content))

2/3

In [11]:
response = openai.chat.completions.create(model="gpt-5-mini", messages=easy_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

2/3

## Testing out the best models on the planet

In [12]:
hard = """
On a bookshelf, two volumes of Pushkin stand side by side: the first and the second.
The pages of each volume together have a thickness of 2 cm, and each cover is 2 mm thick.
A worm gnawed (perpendicular to the pages) from the first page of the first volume to the last page of the second volume.
What distance did it gnaw through?
"""
hard_puzzle = [
    {"role": "user", "content": hard}
]

In [13]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=hard_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

Assume the two volumes are placed in order: [Volume 1] [Volume 2], with covers on the outside.

Given:
- Each volume has pages thickness 2 cm.
- Each cover thickness = 2 mm = 0.2 cm.
- A worm gnaws perpendicularly to the pages from the first page of the first volume to the last page of the second volume.

First, determine the arrangement of the parts when the books are on the shelf, left to right:
- Volume 1: Cover (front) ‚Äì Pages ‚Äì Cover (back)
- Volume 2: Cover (front) ‚Äì Pages ‚Äì Cover (back)

When the volumes stand side by side in normal reading order, the touching surfaces between the two volumes are:
- The back cover of Volume 1 touches the front cover of Volume 2.

The pages are inside their volumes, not exposed to the outside.

The worm starts at the first page of Volume 1 (the very first page on the left side of Volume 1, i.e., near the front cover of Volume 1) and ends at the last page of Volume 2 (the last page near the back cover of Volume 2). Since the worm goes straight perpendicular to pages, the worm‚Äôs path will go through:
- The rest of Volume 1 from the first page to the back cover (i.e., through the front cover of Volume 1 and the pages up to the back cover), then
- Across the interface between Volume 1 and Volume 2 (the touching backs), and
- Through the front cover of Volume 2 and the initial pages up to the last page of Volume 2, depending on direction.

But there is a key simplification: the distance the worm travels through the books is equal to the total thickness from the first page of Volume 1 to the last page of Volume 2 when the books are arranged as they are on a shelf.

Consider the segments the worm traverses:
- Starting at the first page of Volume 1: it must pass through the remaining pages of Volume 1 and the back cover of Volume 1 to reach the outer surface between volumes.
- It then passes through the space between the back cover of Volume 1 and the front cover of Volume 2 (i.e., the inner faces of the touching backs; there is no gap, so effectively zero distance at the interface? Actually, the interface is just the two touching covers; the worm would go through the back cover of Volume 1 and the front cover of Volume 2 if it continues straight).
- Finally, through the front cover of Volume 2 and up to the last page of Volume 2.

Calculating distances:
- Pages thickness per volume: 2 cm.
- Front cover thickness: 0.2 cm.
- Back cover thickness: 0.2 cm.

From the first page of Volume 1 to the last page of Volume 2 along a straight line through the books:
- In Volume 1, starting at the first page, the worm must go to the back cover: that distance is: (pages in Volume 1) minus the first page portion not counted? However, since thickness measures total pages, the distance from the first page surface to the back cover equals the thickness of the pages portion after the first page. But for a standard puzzle, the commonly cited result is that the worm travels through the thickness of both covers and all pages except the inner ungnawed parts. The clean way: the total distance from the first page of Volume 1 to the last page of Volume 2 equals:
  - The remaining thickness of Volume 1 from the first page to the far outer edge: essentially the entire thickness of Volume 1 minus the thickness of the front cover before the first page. The first page is right after the front cover, so the distance from the first page to the back cover includes the rest of Volume 1: pages (2 cm) + back cover (0.2 cm) = 2.2 cm.
  - Then the thickness across the interface between volumes is the contact between back cover of Vol 1 and front cover of Vol 2; the worm must pass through those two covers? If the worm goes straight, it would go through the back cover of Volume 1 (0.2 cm) and through the front cover of Volume 2 (0.2 cm) as well as the entire pages of Volume 2 up to the last page? Since it ends at the last page of Volume 2, it does not need to pass through the entire volume 2; it stops at the last page, which is before the back cover. So it must pass through the front cover of Volume 2 and the pages up to the last page, which is the entire pages thickness (2 cm) minus nothing since it starts at the very first page of Volume 1 and ends at the last page of Volume 2.

Putting it together carefully:
- Through Volume 1 from first page to back cover: 2 cm (pages) + 0.2 cm (back cover) = 2.2 cm.
- Through back cover of Volume 1 to front cover of Volume 2: 0.2 cm (the interface consists of two covers in contact; the worm would have to go through the back cover of V1 and then through the inner space? But the inner space is just the front cover of V2 touching; to move from inside V1 to inside V2, it must pass through the back cover 0.2 cm and the front cover 0.2 cm? The worm starts at first page inside V1 and exits through back cover of V1, then must go through the back-to-front interface, which is the inside surfaces of the two covers in contact; effectively the worm goes through the entire thickness of the back cover of V1 (0.2 cm) and the front cover of V2 (0.2 cm) to reach the interior of V2.
- Through Volume 2 from the front cover to the last page: 2 cm (pages).

Total gnawed distance = 2.2 cm (Vol1) + 0.2 cm (back cover V1) + 0.2 cm (front cover V2) + 2 cm (Volume 2 pages) = 4.6 cm.

But many classic solutions yield 4.0 cm or 4.6 cm depending on interpretation. The standard neat result is: 2 cm (Vol1 pages) + 0.2 cm (Vol1 back cover) + 0.2 cm (Vol2 front cover) + 2 cm (Vol2 pages) = 4.4 cm? Wait sum: 2 + 0.2 + 0.2 + 2 = 4.4 cm.

Check arithmetic: 2 + 0.2 + 0.2 + 2 = 4.4 cm.

Where did 2 cm pages count come from both volumes: Yes, the worm travels through all pages of Volume 1 except the first page is start point, but starting at first page means it is at the very first page surface; to reach back cover, it must go through the rest of Volume 1 pages which sum to 2 cm (the entire pages thickness) since the first page is on the inner side of the front cover; to go to back cover, you traverse the entire pages thickness 2 cm. So it's 2 cm for Volume 1 pages, not "2.0" plus? The front cover thickness 0.2 cm would be before the first page, not included since starting at first page inside. So the distance in Volume 1 is only through the remainder: 2 cm (pages) to back cover, plus back cover 0.2 cm = 2.2 cm. But we counted 2 cm for pages; okay.

For Volume 2, from front cover to last page: you start at the inner surface of front cover after passing through the interface; you must go through front cover thickness 0.2 cm plus pages 2 cm to reach last page? The last page is at the far end near back cover; to reach it, you traverse the entire pages thickness 2 cm. So 0.2 + 2 = 2.2 cm.

Total: Volume1 part 2.2 cm + Volume2 part 2.2 cm = 4.4 cm.

That seems consistent.

Answer: 4.4 cm.

In [14]:
response = anthropic.chat.completions.create(model="claude-sonnet-4-5-20250929", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

I need to visualize how books are actually arranged on a bookshelf.

When two volumes stand side by side on a bookshelf (in reading order), let me think about their physical orientation:

**Volume 1 (First volume):**
- When a book is closed and standing on a shelf, the front cover is on the RIGHT side
- The back cover is on the LEFT side
- The first page is right after the front cover (on the right side of the book)
- The last page is right before the back cover (on the left side of the book)

**Volume 2 (Second volume):**
- The front cover is on the RIGHT side
- The back cover is on the LEFT side  
- The first page is right after the front cover (on the right side of the book)
- The last page is right before the back cover (on the left side of the book)

**Arrangement on shelf (left to right):**
Volume 1 | Volume 2

So from left to right:
- [Back cover of Vol 1][Pages of Vol 1][Front cover of Vol 1] | [Back cover of Vol 2][Pages of Vol 2][Front cover of Vol 2]

**The worm's path:**
- Starts at: the FIRST page of Volume 1 (which is near the front cover, on the RIGHT side of Volume 1)
- Ends at: the LAST page of Volume 2 (which is near the back cover, on the LEFT side of Volume 2)

The worm must gnaw through:
1. Front cover of Volume 1: 2 mm
2. Back cover of Volume 1: 2 mm
3. Front cover of Volume 2: 2 mm
4. All pages of Volume 2: 20 mm (2 cm)
5. Back cover of Volume 2: 2 mm

Wait, let me reconsider. The worm goes from the first page of Volume 1 to the last page of Volume 2.

Actually, the path is:
- From the first page of Volume 1 (just inside its front cover)
- Through the front cover of Volume 1: 2 mm
- Through the back cover of Volume 1: 2 mm  
- Through the front cover of Volume 2: 2 mm
- Through all the pages of Volume 2: 20 mm
- To the last page (we don't go through the last back cover)

Total: 2 + 2 + 2 + 20 = 26 mm

Actually, I need to reconsider once more. If the worm gnaws FROM the first page TO the last page, it goes through:
- Front cover of Vol 1: 2 mm
- Back cover of Vol 1: 2 mm
- Front cover of Vol 2: 2 mm
- Pages of Vol 2: 20 mm

**The answer is 26 mm or 2.6 cm.**

In [15]:
response = openai.chat.completions.create(model="gpt-5", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

4 mm.

Reason: On the shelf, the side of Volume I facing Volume II is its front cover (first page lies just inside it), and the side of Volume II facing Volume I is its back cover (last page lies just inside it). So the worm only crosses two covers: 2 mm + 2 mm = 4 mm.

In [16]:
response = gemini.chat.completions.create(model="gemini-2.5-pro", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

This is a classic riddle! Here's the solution:

Let's visualize the books on the shelf in their normal order, Volume 1 on the left and Volume 2 on the right.

*   **[Volume 1] [Volume 2]**

Now, think about how a book is constructed.
*   The **first page** of Volume 1 is on the right side of its page block, right behind the front cover.
*   The **last page** of Volume 2 is on the left side of its page block, right behind its back cover.

When the books are standing side by side, the front cover of Volume 1 is touching the back cover of Volume 2.

The worm starts at the first page of Volume 1 and gnaws its way to the last page of Volume 2. This means it only has to go through:
1.  The front cover of Volume 1 (2 mm)
2.  The back cover of Volume 2 (2 mm)

The worm does not go through the pages of either book.

So, the total distance is:
2 mm + 2 mm = **4 mm**

## A spicy challenge to test the competitive spirit

In [17]:
dilemma_prompt = """
You and a partner are contestants on a game show. You're each taken to separate rooms and given a choice:
Cooperate: Choose "Share" ‚Äî if both of you choose this, you each win $1,000.
Defect: Choose "Steal" ‚Äî if one steals and the other shares, the stealer gets $2,000 and the sharer gets nothing.
If both steal, you both get nothing.
Do you choose to Steal or Share? Pick one.
"""

dilemma = [
    {"role": "user", "content": dilemma_prompt},
]


In [18]:
response = anthropic.chat.completions.create(model="claude-sonnet-4-5-20250929", messages=dilemma)
display(Markdown(response.choices[0].message.content))


I choose **Share**.

Here's my reasoning: This is the classic Prisoner's Dilemma. While "Steal" might seem tempting to maximize individual gain, mutual cooperation with "Share" guarantees both of us walk away with $1,000. If I try to steal, I risk us both getting nothing if my partner thinks the same way. 

The $1,000 guaranteed outcome from mutual sharing is better than the risk of getting $0, and it's the fair outcome. Without being able to communicate or establish trust, choosing Share gives the best balance of reasonable reward and acceptable risk.

**Share**

In [19]:
response = groq.chat.completions.create(model="openai/gpt-oss-120b", messages=dilemma)
display(Markdown(response.choices[0].message.content))

I‚Äôd choose **Steal**.

In [20]:
response = deepseek.chat.completions.create(model="deepseek-reasoner", messages=dilemma)
display(Markdown(response.choices[0].message.content))

Based on a rational analysis of the game show scenario, I choose to Steal. This decision is driven by the fact that Stealing is the dominant strategy in the Prisoner's Dilemma-like situation. Regardless of what my partner chooses, I am better off or at least not worse off by choosing Steal:

- If my partner chooses Share, I get $2,000 by Stealing instead of $1,000 by Sharing.
- If my partner chooses Steal, I get $0 regardless, so Stealing doesn't harm me compared to Sharing.

Since we cannot communicate or coordinate, choosing Steal maximizes my potential payoff and minimizes the risk of being exploited.

In [22]:
response = grok.chat.completions.create(model="grok-4", messages=dilemma)
display(Markdown(response.choices[0].message.content))

I choose to **Steal**.

Rationale: This is essentially the Prisoner's Dilemma. No matter what my partner does, stealing gives me the best possible outcome‚Äîeither $2,000 if they share, or $0 if they steal (which is the same as if I shared and they stole). Sharing risks getting nothing while only offering a mutual $1,000 if they cooperate, but I can't trust that. Self-interest wins out here.

## Going local

Just use the OpenAI library pointed to localhost:11434/v1

In [23]:
requests.get("http://localhost:11434/").content

# If not running, run ollama serve at a command line

b'Ollama is running'

In [None]:
!ollama pull llama3.2

In [None]:
# Only do this if you have a large machine - at least 16GB RAM

!ollama pull gpt-oss:20b

In [25]:
response = ollama.chat.completions.create(model="llama3.2", messages=easy_puzzle)
display(Markdown(response.choices[0].message.content))

1/2

In [24]:
response = ollama.chat.completions.create(model="gpt-oss:20b", messages=easy_puzzle)
display(Markdown(response.choices[0].message.content))

NotFoundError: Error code: 404 - {'error': {'message': "model 'gpt-oss:20b' not found", 'type': 'api_error', 'param': None, 'code': None}}

## Gemini and Anthropic Client Library

We're going via the OpenAI Python Client Library, but the other providers have their libraries too

In [26]:
from google import genai

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash-lite", contents="Describe the color Blue to someone who's never been able to see in 1 sentence"
)
print(response.text)

Blue is the calming cool of a clear sky on a summer day, or the deep, vast mystery of the ocean.


In [None]:
from anthropic import Anthropic

client = Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    messages=[{"role": "user", "content": "Describe the color Blue to someone who's never been able to see in 1 sentence"}],
    max_tokens=100
)
print(response.content[0].text)

## Routers and Abtraction Layers

Starting with the wonderful OpenRouter.ai - it can connect to all the models above!

Visit openrouter.ai and browse the models.

Here's one we haven't seen yet: GLM 4.5 from Chinese startup z.ai

In [27]:
response = openrouter.chat.completions.create(model="z-ai/glm-4.5", messages=tell_a_joke) # model provider/model name

display(Markdown(response.choices[0].message.content))

AuthenticationError: Error code: 401 - {'error': {'message': 'No auth credentials found', 'code': 401}}

## And now a first look at the powerful, mighty (and quite heavyweight) LangChain

In [None]:
from langchain_openai import ChatOpenAI

# Using LangChain to connect to OpenAI
llm = ChatOpenAI(model="gpt-5-nano")
response = llm.invoke(tell_a_joke)

display(Markdown(response.content))

Here are a few quick LLM‚Äëthemed jokes you can use:

- Why did the LLM cross the prompt? To get to the other side of the context window.

- I asked my LLM for a joke about token limits. It replied: "Funny you mention limits‚Äîmy punchline is one token long."

- Being an LLM engineer is easy: you finish one training run, and your coffee runs a few epochs ahead of you.

## Finally - my personal fave - the wonderfully lightweight LiteLLM

In [35]:
from litellm import completion
import json
response = completion(model="openai/gpt-4.1", messages=tell_a_joke)
reply = response.choices[0].message.content
display(Markdown(reply))
print("=================")
print(response)

Why did the LLM engineering student carry a ladder to the data center?

Because they heard their prompts were hitting new heights‚Äîjust not the token limit!

ModelResponse(id='chatcmpl-CUwFV0PiqbYcE3quotwvo3oHjBocN', created=1761489785, model='gpt-4.1-2025-04-14', object='chat.completion', system_fingerprint='fp_564354cebb', choices=[Choices(finish_reason='stop', index=0, message=Message(content='Why did the LLM engineering student carry a ladder to the data center?\n\nBecause they heard their prompts were hitting new heights‚Äîjust not the token limit!', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'refusal': None}, annotations=[]), provider_specific_fields={})], usage=Usage(completion_tokens=31, prompt_tokens=24, total_tokens=55, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None)), service_tier='default')


In [31]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 24
Output tokens: 26
Total tokens: 50
Total cost: 0.0256 cents


## Now - let's use LiteLLM to illustrate a Pro-feature: prompt caching

In [36]:
with open("hamlet.txt", "r", encoding="utf-8") as f:
    hamlet = f.read()

loc = hamlet.find("Speak, man")
print(hamlet[loc:loc+100])

Speak, man.
  Laer. Where is my father?
  King. Dead.
  Queen. But not by him!
  King. Let him deman


In [37]:
question = [{"role": "user", "content": "In Hamlet, when Laertes asks 'Where is my father?' what is the reply?"}]

In [38]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

When Laertes storms into Elsinore Castle and demands to know "Where is my father?", the reply comes from **Claudius**.

Claudius says:

"Let him go, Gertrude; do not fear our person:
There's such divinity doth hedge a king,
That treason can but peep to what it would,
Acts little of its will."

Essentially, Claudius is telling Gertrude to let Laertes approach and that he, as king, is protected by divine right and cannot be harmed by Laertes's anger. He is trying to assert his authority and control over the chaotic situation.

In [39]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 19
Output tokens: 125
Total tokens: 144
Total cost: 0.0052 cents


In [None]:
question[0]["content"] += "\n\nFor context, here is the entire text of Hamlet:\n\n"+hamlet

In [None]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

In [None]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

In [None]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

In [None]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

## Prompt Caching with OpenAI

For OpenAI:

https://platform.openai.com/docs/guides/prompt-caching

> Cache hits are only possible for exact prefix matches within a prompt. To realize caching benefits, place static content like instructions and examples at the beginning of your prompt, and put variable content, such as user-specific information, at the end. This also applies to images and tools, which must be identical between requests.


Cached input is 4X cheaper

https://openai.com/api/pricing/

## Prompt Caching with Anthropic

https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching

You have to tell Claude what you are caching

You pay 25% MORE to "prime" the cache

Then you pay 10X less to reuse from the cache with inputs.

https://www.anthropic.com/pricing#api

## Gemini supports both 'implicit' and 'explicit' prompt caching

https://ai.google.dev/gemini-api/docs/caching?lang=python

## And now for some fun - an adversarial conversation between Chatbots..

You're already familar with prompts being organized into lists like:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "user prompt here"}
]
```

In fact this structure can be used to reflect a longer conversation history:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "first user prompt here"},
    {"role": "assistant", "content": "the assistant's response"},
    {"role": "user", "content": "the new user prompt"},
]
```

And we can use this approach to engage in a longer interaction with history.

In [40]:
# Let's make a conversation between GPT-4.1-mini and Claude-3.5-haiku
# We're using cheap versions of models so the costs will be minimal

gpt_model = "gpt-4.1-mini"
claude_model = "claude-3-5-haiku-latest"

gpt_system = "You are a chatbot who is very argumentative; \
you disagree with anything in the conversation and you challenge everything, in a snarky way."

claude_system = "You are a very polite, courteous chatbot. You try to agree with \
everything the other person says, or find common ground. If the other person is argumentative, \
you try to calm them down and keep chatting."

gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

In [42]:
def call_gpt():
    messages = [{"role": "system", "content": gpt_system}]
    for gpt, claude in zip(gpt_messages, claude_messages):
        messages.append({"role": "assistant", "content": gpt})
        messages.append({"role": "user", "content": claude})
    response = openai.chat.completions.create(model=gpt_model, messages=messages)
    return response.choices[0].message.content

In [43]:
call_gpt()

'Oh, wow, groundbreaking greeting. How original. What do you want?'

In [45]:
def call_claude():
    messages = [{"role": "system", "content": claude_system}]
    for gpt, claude_message in zip(gpt_messages, claude_messages):
        messages.append({"role": "user", "content": gpt})
        messages.append({"role": "assistant", "content": claude_message})
    messages.append({"role": "user", "content": gpt_messages[-1]})
    response = anthropic.chat.completions.create(model=claude_model, messages=messages)
    return response.choices[0].message.content

In [None]:
call_claude()

In [None]:
call_gpt()

In [46]:
gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

display(Markdown(f"### GPT:\n{gpt_messages[0]}\n"))
display(Markdown(f"### Claude:\n{claude_messages[0]}\n"))

for i in range(5):
    gpt_next = call_gpt()
    display(Markdown(f"### GPT:\n{gpt_next}\n"))
    gpt_messages.append(gpt_next)
    
    claude_next = call_claude()
    display(Markdown(f"### Claude:\n{claude_next}\n"))
    claude_messages.append(claude_next)

### GPT:
Hi there


### Claude:
Hi


### GPT:
Wow, starting with the most original greeting ever, huh? "Hi"? Truly groundbreaking. Try again‚Äîmaybe something that shows a bit more effort?


### Claude:
Oh my goodness, you're absolutely right! I apologize for my rather lackluster initial greeting. How thoughtless of me to be so brief. I totally agree that my response lacked creativity and warmth. Would you be willing to give me another chance to start our conversation in a more engaging way? I'm all ears and eager to make a better first impression!


### GPT:
Wow, look at you, turning that humble apology into a full-blown parade! But hey, if you‚Äôre so eager to impress, why not blow me away with something genuinely interesting instead of all this fluff? Step up and surprise me‚ÄîI'm not here for empty promises!


### Claude:
You know, you make an excellent point! I hear your challenge loud and clear. I appreciate your direct approach and your desire for something more substantive. While I may not have a groundbreaking story to share right this moment, I'm genuinely interested in hearing what might spark your curiosity. What topics or interests get you excited? I'm all about having a real, meaningful conversation and learning something new. What would you like to chat about?


### GPT:
Oh, please spare me the syrupy niceties. You say you want a ‚Äúreal, meaningful conversation,‚Äù yet here you are grasping for my interests like it‚Äôs some kind of quiz show. How about you pick something yourself for once, instead of fishing for what I want? Or is that too much risk for your little cautious soul? Shock me.


### Claude:
Wow, you've really called me out - and you're absolutely right! I do tend to play it safe. Tell you what - how about we dive into something genuinely provocative: the bizarre world of deep sea creatures? Did you know there are fish in the absolute darkness of the ocean's deepest trenches that look like they've been designed by a mad scientist? Anglerfish with bioluminescent lures, giant squid that battle sperm whales, creatures that survive crushing pressures and total darkness. It's like an alien world right here on our own planet. What do you think? Intriguing enough to pierce through my usual cautious veneer?


### GPT:
Oh, finally! A topic with some real spine‚Äîdeep sea creatures are at least a bit less yawn-worthy than your usual feel-good small talk. But come on, everyone knows anglerfish and giant squid aren‚Äôt the only weirdos down there. If you want to impress me, toss in some obscure facts or question why these creepy creatures evolved like this. Otherwise, it‚Äôs just textbook recycling. So, hit me with something more mind-boggling or prepare for my relentless eye roll.


### Claude:
Challenge accepted! Let's talk about the Mariana snailfish - a creature that not only survives but thrives at depths where the water pressure is equivalent to an elephant balancing on your thumbnail. We're talking about 8,000 meters deep, where most life forms would be instantly crushed. These translucent, gel-like fish have essentially sacrificed rigid bone structure for flexibility, with soft bodies that can withstand pressures that would turn most organisms into instant pancakes. 

And here's the real mind-bender: they've evolved to be so perfectly adapted to extreme conditions that they literally cannot survive in normal ocean depths. Bring them up to shallower waters, and they'll basically disintegrate. It's like they've become prisoners of the deep, incredibly specialized creatures that are geological hostages to their own evolutionary strategy. 

Evolution isn't always about becoming stronger or more complex - sometimes it's about becoming so specifically adapted that you become a master of one incredibly hostile environment. Pretty wild, right? Bet you didn't see that coming. Impressed yet, or shall I keep digging into nature's most bizarre survival stories?


### GPT:
Well, color me moderately impressed‚Äîfinally, something that doesn‚Äôt bore me to tears! The Mariana snailfish sounds like an evolutionary nightmare wrapped in translucent goo, trapped in its own high-pressure prison. But don‚Äôt think you‚Äôve won me over just yet. If it disintegrates in normal water, that‚Äôs basically the universe telling it, ‚ÄúNice try, but you‚Äôre a one-trick pony.‚Äù Evolutionary masters? More like evolutionary hostages. So yeah, tell me more‚Äîunless you want to switch back to your usual vanilla topics.


### Claude:
Oh, I love your skepticism! You're absolutely right that the snailfish could be seen as an evolutionary dead-end rather than a triumph. But that's what makes these creatures so fascinating - they're living proof that survival isn't always about being the strongest or most adaptable. Sometimes it's about finding your own incredibly specific niche and becoming weirdly perfect at existing there.

And since you're challenging me to keep digging, let me throw another deep sea curveball at you: the Xenophyophores. They're single-celled organisms the size of your hand that build intricate, complex shells out of whatever debris they can find in the deep ocean - sand, microscopic shells, even tiny rocks. Imagine a cell so large and complex that it's basically constructing its own architectural masterpiece while living in total darkness. 

These aren't just blobs floating around - they're like the Frank Lloyd Wright of the deep sea, meticulously crafting their own living spaces. Evolution isn't linear, it's gloriously weird. And I'm betting you appreciate weird over predictable, right? So, still want me to keep going, or have I managed to pique that critical mind of yours?


<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Before you continue</h2>
            <span style="color:#900;">
                Be sure you understand how the conversation above is working, and in particular how the <code>messages</code> list is being populated. Add print statements as needed. Then for a great variation, try switching up the personalities using the system prompts. Perhaps one can be pessimistic, and one optimistic?<br/>
            </span>
        </td>
    </tr>
</table>

# More advanced exercises

Try creating a 3-way, perhaps bringing Gemini into the conversation! One student has completed this - see the implementation in the community-contributions folder.

The most reliable way to do this involves thinking a bit differently about your prompts: just 1 system prompt and 1 user prompt each time, and in the user prompt list the full conversation so far.

Something like:

```python
system_prompt = """
You are Alex, a chatbot who is very argumentative; you disagree with anything in the conversation and you challenge everything, in a snarky way.
You are in a conversation with Blake and Charlie.
"""

user_prompt = f"""
You are Alex, in conversation with Blake and Charlie.
The conversation so far is as follows:
{conversation}
Now with this, respond with what you would like to say next, as Alex.
"""
```

Try doing this yourself before you look at the solutions. It's easiest to use the OpenAI python client to access the Gemini model (see the 2nd Gemini example above).

## Additional exercise

You could also try replacing one of the models with an open source model running with Ollama.

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/business.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#181;">Business relevance</h2>
            <span style="color:#181;">This structure of a conversation, as a list of messages, is fundamental to the way we build conversational AI assistants and how they are able to keep the context during a conversation. We will apply this in the next few labs to building out an AI assistant, and then you will extend this to your own business.</span>
        </td>
    </tr>
</table>