# Lab 1

# STRAIGHT TO ACTION!

Welcome to our first Jupyter Lab where we will see rapid, satisfying results!

I will leave with you to try out leading LLMs through their Chat Interfaces

Together, we will call them using their APIs

Please see the README for instructions on setting this up and getting your API key

# If this is your first time in a Notebook..

Welcome to the world of Data Science experimentation. Warning: Jupyter Notebooks are very addictive and you may find it hard to go back to IDEs afterwards!!

Simply click in each cell with code and press `Shift + Enter` to execute the code and print the results.

There's a notebook called "Guide to Jupyter" in the parent directory that will give you a handy tutorial on all things Jupyter Lab.

## Part 1: For you to experiment: Frontier models through their Chat UI

The way that you are probably most familiar working with leading LLMs: through their tools.  
Some questions you can try asking them:
1. What kinds of business problem are most suitable for an LLM solution?
2. How many words are there in your answer to this prompt?
3. How many rainbows does it take to jump from Hawaii to seventeen?
4. What does it feel like to be jealous?

**ChatGPT** from OpenAI needs no introduction.

Let's try some hard questions, and use the new o1 model as well as GPT-4o. Also try GPT-4o with canvas.

https://chatgpt.com/?model=gpt-4o

**Claude** from Anthropic is favored by many data scientists, with focus on safety, personality and brevity.

https://claude.ai/new

**Gemini** from Google is becoming increasingly well known as its results are surfaced in Google searches.

https://gemini.google.com/app

**Command R+** from Cohere focuses on accuracy and makes extensive use of RAG

https://coral.cohere.com/

**Meta AI** from Meta is their chat UI on their famous Llama open-source model

https://www.meta.ai/

**Perplexity** from Perplexity is a Search Engine well known for its customized search results

https://www.perplexity.ai/

**LeChat** from Mistral is the Web UI from the French AI powerhouse

https://chat.mistral.ai/

**DeepSeek** from DeepSeek AI needs no introduction! Deepseek-R1 is the Reasoning model, V3 is their Chat model.

https://chat.deepseek.com/


## Conclusions and Takeways from exploring the Chat UIs

- These models are astonishing
- Reasoning vs Chat models - different capabilities and use cases. Reasoning better for research but not for conversation.
- Price is highly competitive

You'll find cost and other comparisons at this very useful leaderboard:

https://www.vellum.ai/llm-leaderboard

## PART 2: Calling Frontier Models through APIs

## Setting up your keys

If you haven't done so already, you'll need to create API keys from OpenAI, Anthropic and Google, and also DeepSeek and Groq if you wish.

For OpenAI, visit https://openai.com/api/  
For Anthropic, visit https://console.anthropic.com/  
For Google, visit https://ai.google.dev/gemini-api  

When you get your API keys, you need to set them as environment variables.

EITHER (recommended) create a file called `.env` in this project root directory, and set your keys there:

```
OPENAI_API_KEY=xxxx
ANTHROPIC_API_KEY=xxxx
GOOGLE_API_KEY=xxxx
DEEPSEEK_API_KEY=xxxx
GROQ_API_KEY=xxxx
```

OR enter the keys directly in the cells below.

## Two purposes of these APIs:

1. Illustrate how to use the APIs and switch between LLMs
2. Experiment with some LLMs and understand their strengths and weaknesses

In [1]:
# imports

import os
import json
import requests
from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import Markdown, display, update_display

In [2]:
# Load environment variables in a file called .env
# Print the key prefixes to help with any debugging

load_dotenv(override=True)
openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')
grok_api_key = os.getenv('GROK_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set (and this is optional)")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:2]}")
else:
    print("Google API Key not set (and this is optional)")

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set (and this is optional)")

if groq_api_key:
    print(f"Groq API Key exists and begins {groq_api_key[:4]}")
else:
    print("Groq API Key not set (and this is optional)")

if grok_api_key:
    print(f"Grok API Key exists and begins {grok_api_key[:4]}")
else:
    print("Grok API Key not set (and this is optional)")

OpenAI API Key exists and begins sk-proj-
Anthropic API Key exists and begins sk-ant-
Google API Key exists and begins AI
DeepSeek API Key exists and begins sk-
Groq API Key exists and begins gsk_
Grok API Key exists and begins xai-


## Connecting to Python Client libraries

We call Cloud APIs by making REST calls to an HTTP endpoint, passing in our keys.

For convenience, the labs like OpenAI have provided lightweight python client libraries that make the HTTP calls for us.

In [3]:
# Connect to OpenAI client library
# A thin wrappes around calls to REST endpoints

openai = OpenAI()

# We can use the OpenAI python client for all the others, because everyone has produced endpoints compatible with OpenAI
# And OpenAI allows you to change the base_url

anthropic_url = "https://api.anthropic.com/v1/"
gemini_url = "https://generativelanguage.googleapis.com/v1beta/openai/"
deepseek_url = "https://api.deepseek.com"
groq_url = "https://api.groq.com/openai/v1"
grok_url = "https://api.x.ai/v1"

anthropic = OpenAI(api_key=anthropic_api_key, base_url=anthropic_url)
gemini = OpenAI(api_key=google_api_key, base_url=gemini_url)
deepseek = OpenAI(api_key=deepseek_api_key, base_url=deepseek_url)
groq = OpenAI(api_key=groq_api_key, base_url=groq_url)
grok = OpenAI(api_key=grok_api_key, base_url=grok_url)

## Asking LLMs a hard question that will put them to the test and illustrate their power

We will come up with a challenging question to test out model performance with language and nuance.

### What information is included in the API

Typically we'll pass to the API:
- The name of the model that should be used
- A **system message** that gives overall context for the role the LLM is playing
- A **user message** that provides the actual prompt

There are other parameters that can be used, including **temperature** which is typically between 0 and 1; higher for more random output; lower for more focused and deterministic.

### The standard format of messages with an LLM, first used by OpenAI in its API and now adopted more widely

Conversations use this format:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "user prompt here"}
]
```


In [4]:
# The hardest question I could come up with

system_message = "You are able to explain abstract concepts clearly and concisely, with powerful analogies"

user_prompt = "In 1 sentence, describe a rainbow to someone who's never been able to see. \
Then in 1 sentence, describe the imaginary number i to someone who doesn't understand math. \
Then in 1 sentence, find a connection between rainbows and imaginary numbers. \
Then end by stating how many words are in your answer."

In [5]:
challenge = [
    {"role": "system", "content": system_message},
    {"role": "user", "content": user_prompt}
  ]

In [6]:
models = []
answers = []

def record(model, reply):
    display(Markdown(f"### Response from {model}:\n\n{reply}\n\n### Actual word count: {len(reply.split())}"))
    models.append(model)
    answers.append(reply)

In [7]:
# GPT-4.1-mini

model_name = "gpt-4.1-mini"

response = openai.chat.completions.create(model=model_name, messages=challenge)
reply = response.choices[0].message.content

record(model_name, reply)

### Response from gpt-4.1-mini:

A rainbow is like a gentle arc in the sky where sunlight is split into vibrant bands of color, much like a hidden spectrum painted by light itself; the imaginary number \(i\) is like a mathematical tool that lets us explore dimensions beyond the usual counting numbers, opening a door to numbers that, when squared, give a negative result—a concept that stretches our understanding like a bridge to an unseen world; both rainbows and imaginary numbers reveal hidden layers of reality beyond everyday experience, showing us beauty and patterns that ordinary senses or arithmetic alone cannot grasp. (70 words)

### Actual word count: 99

In [8]:
# GPT-4.1-nano

model_name = "gpt-4.1-nano"

response = openai.chat.completions.create(model=model_name, messages=challenge)
reply = response.choices[0].message.content

record(model_name, reply)

### Response from gpt-4.1-nano:

A rainbow is like a cosmic bridge of shimmering colors that appears in the sky after rain, revealing a hidden spectrum when sunlight is split by water droplets.  
The imaginary number i is like a secret key that unlocks new directions in the world of numbers, allowing us to explore shapes and patterns beyond what we see every day.  
Both rainbows and imaginary numbers open the door to worlds beyond ordinary perception—rainbows through visible light and colors, and imaginary numbers through mathematical dimensions unseen but fundamentally powerful.  
There are 67 words in this answer.

### Actual word count: 94

In [9]:
# GPT-4.1

model_name = "gpt-4.1"

response = openai.chat.completions.create(model=model_name, messages=challenge)
reply = response.choices[0].message.content

record(model_name, reply)

### Response from gpt-4.1:

A rainbow is like a gentle, multi-note melody painted across the sky with each color shaped by different notes of sunlight after the rain.  
The imaginary number i is like a doorway to a hidden room that can’t be entered by walking forward or backward, but only by stepping sideways into an entirely new direction.  
Both rainbows and imaginary numbers reveal hidden dimensions of the world—rainbows showing the unseen beauty in light, and i opening new paths in the landscape of numbers.  
63 words.

### Actual word count: 84

In [10]:
# o1
# o1 is a "reasoning" model that has been trained to think through it's answer before it replies..

model_name = "o1"

response = openai.chat.completions.create(model=model_name, messages=challenge)
reply = response.choices[0].message.content

record(model_name, reply)

### Response from o1:

Imagine a gentle dance of sensations across your skin, each shift representing a different hue, arching together in a moment of wonder. The imaginary number i is a special symbol that means we can explore a new direction in mathematics beyond the real line. Rainbows and imaginary numbers both hint at hidden depths, reminding us that there is more to reality than what we initially perceive. Altogether, this answer has 72 words.

### Actual word count: 72

In [11]:
# o3

model_name = "o3"

response = openai.chat.completions.create(model=model_name, messages=challenge)
reply = response.choices[0].message.content

record(model_name, reply)

### Response from o3:

A rainbow is like a gentle seven-note chord played by sunlight on the sky's wet strings, with each 'note' felt as a cool, distinct breath rather than seen as a color.  
The imaginary number i is like a special turn of a key that doesn't open a door but rotates you into a new hallway you never knew existed.  
Both the rainbow and i remind us that hidden dimensions—one in light, one in arithmetic—unfurl only when we let nature move beyond our usual straight lines.  
Total words: 88

### Actual word count: 88

In [12]:
# Claude 4.0 Sonnet
# API needs system message provided separately from user prompt
# Also adding max_tokens

model_name = "claude-sonnet-4-20250514"

response = anthropic.chat.completions.create(model=model_name, messages=challenge)
reply = response.choices[0].message.content

record(model_name, reply)

### Response from claude-sonnet-4-20250514:

A rainbow is like nature's musical scale made visible, where sunlight splits into bands of different "colors" that each have their own unique wavelength, creating an arc that feels as organized and beautiful as hearing notes flow from low to high. The imaginary number i is like a magical mathematical tool that lets you rotate numbers in a completely new direction—imagine if you could only move forward and backward on a straight line, but suddenly discovered you could also step sideways into a whole new dimension of movement. Both rainbows and imaginary numbers reveal hidden dimensions that were always there but invisible to us—rainbows show the secret spectrum hidden within ordinary white light, while imaginary numbers unveil the secret rotational space hidden within ordinary counting. This answer contains exactly 128 words.

### Actual word count: 131

In [13]:
# Gemini 2.5 Flash

model_name = "gemini-2.5-flash-preview-05-20"

response = gemini.chat.completions.create(model=model_name, messages=challenge)
reply = response.choices[0].message.content

record(model_name, reply)

### Response from gemini-2.5-flash-preview-05-20:

It's like a single, unified song of light gently unfurling into an ordered chorus of distinct energies, each a unique sensation or warmth momentarily felt. Imagine the imaginary number *i* as the mathematical instruction that lets numbers 'turn' into a new dimension, allowing them to spin and describe cycles instead of just moving along a straight line. Both illustrate how underlying wave-like properties, whether of light or abstract numbers, can separate and unfold into a richer, multi-dimensional order of distinct, patterned components.

92 words

### Actual word count: 84

In [14]:
# Gemini 2.5 Pro

model_name = "gemini-2.5-pro-preview-06-05"

response = gemini.chat.completions.create(model=model_name, messages=challenge)
reply = response.choices[0].message.content

record(model_name, reply)

RateLimitError: Error code: 429 - [{'error': {'code': 429, 'message': "Gemini 2.5 Pro Preview doesn't have a free quota tier. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.", 'status': 'RESOURCE_EXHAUSTED', 'details': [{'@type': 'type.googleapis.com/google.rpc.QuotaFailure', 'violations': [{'quotaMetric': 'generativelanguage.googleapis.com/generate_content_free_tier_requests', 'quotaId': 'GenerateRequestsPerDayPerProjectPerModel-FreeTier', 'quotaDimensions': {'location': 'global', 'model': 'gemini-2.5-pro-exp'}}, {'quotaMetric': 'generativelanguage.googleapis.com/generate_content_free_tier_requests', 'quotaId': 'GenerateRequestsPerMinutePerProjectPerModel-FreeTier', 'quotaDimensions': {'location': 'global', 'model': 'gemini-2.5-pro-exp'}}, {'quotaMetric': 'generativelanguage.googleapis.com/generate_content_free_tier_input_token_count', 'quotaId': 'GenerateContentInputTokensPerModelPerMinute-FreeTier', 'quotaDimensions': {'location': 'global', 'model': 'gemini-2.5-pro-exp'}}, {'quotaMetric': 'generativelanguage.googleapis.com/generate_content_free_tier_input_token_count', 'quotaId': 'GenerateContentInputTokensPerModelPerDay-FreeTier', 'quotaDimensions': {'location': 'global', 'model': 'gemini-2.5-pro-exp'}}]}, {'@type': 'type.googleapis.com/google.rpc.Help', 'links': [{'description': 'Learn more about Gemini API quotas', 'url': 'https://ai.google.dev/gemini-api/docs/rate-limits'}]}, {'@type': 'type.googleapis.com/google.rpc.RetryInfo', 'retryDelay': '41s'}]}}]

In [15]:
# Deepseek-V3

model_name = "deepseek-chat"

response = deepseek.chat.completions.create(model=model_name, messages=challenge)
reply = response.choices[0].message.content

record(model_name, reply)

### Response from deepseek-chat:

A rainbow is like nature’s melody painted across the sky, where sunlight and raindrops blend into a gentle arc of colors you can almost hear; *i* is a mathematical whisper that turns "impossible" sideways, multiplying by itself to create -1; both are hidden harmonies—one in light, the other in logic—revealing beauty where the ordinary eye sees none. (50 words.)

### Actual word count: 59

In [None]:
# Deepseek-R1
# This takes too long! It can get stuck in a loop 

model_name = "deepseek-reasoner"

response = deepseek.chat.completions.create(model=model_name, messages=challenge)
reply = response.choices[0].message.content

record(model_name, reply)

In [16]:
# Groq - llama-3.3-70b-versatile

model_name = "llama-3.3-70b-versatile"

response = groq.chat.completions.create(model=model_name, messages=challenge)
reply = response.choices[0].message.content

record(model_name, reply)

### Response from llama-3.3-70b-versatile:

A rainbow is like a majestic, ever-changing symphony of sounds, where each note blends seamlessly into the next, evoking a sense of wonder and beauty that transcends visual boundaries. 
The imaginary number i is like a secret door in a story that, when opened, reveals a new and hidden world of possibilities, where the usual rules of reality no longer apply. 
Just as a rainbow's vibrant colors are created by the interaction of light and water, the concept of imaginary numbers like i can be thought of as a harmony between real and imaginary components that come together to create a beautiful, balanced whole. 
There are 96 words in my answer.

### Actual word count: 111

In [18]:
# Groq - deepseek-r1-distill-llama-70b

model_name = "deepseek-r1-distill-llama-70b"

response = groq.chat.completions.create(model=model_name, messages=challenge)
reply = str(response.choices[0].message.content)

if '</think>' in reply:
    reply = reply.split('</think>')[1]

record(model_name, reply)

### Response from deepseek-r1-distill-llama-70b:



A rainbow is like a symphony of colors, each one a unique note blending into a harmonious whole, evoking emotions and beauty beyond sight. The imaginary number \( i \) is like a hidden direction in mathematics, a concept you can't see but can feel the effects of, allowing us to navigate new dimensions of thought. Both rainbows and imaginary numbers connect the visible with the invisible, showing how the unseen can create something beautiful or functional. 

This answer contains 76 words.

### Actual word count: 82

In [17]:
# Grok

model_name = "grok-3-latest"

response = grok.chat.completions.create(model=model_name, messages=challenge)
reply = response.choices[0].message.content

record(model_name, reply)

### Response from grok-3-latest:

A rainbow is like a melody of warmth and coolness in the air, where each color feels like a different note—red as a deep, cozy hum, violet as a sharp, tingling whisper—arching across the sky after rain. The imaginary number i is like an invisible friend who helps solve puzzles that seem impossible, standing for the square root of -1, a concept that exists beyond regular numbers to unlock new ways of thinking. Interestingly, just as a rainbow reveals a spectrum of unseen beauty in light, imaginary numbers like i unveil hidden patterns in math that connect to the real world in surprising ways. This answer contains 109 words.

### Actual word count: 109

# Now for Part 3

### Recap: first we tried Frontier LLMs through their chat interfaces

### Then we called Cloud APIs

### And now:

Now try the 3rd way to use LLMs - direct inference of Open Source Models running locally with Ollama¶
Visit the README for instructions on installing Ollama locally.

You can see some comparisons of Open Source models on the HuggingFace OpenLLM Leaderboard.

Ollama provides an OpenAI-style local endpoint, so this will look very similar to part 2!


In [None]:
!ollama pull llama3.2
!ollama pull gemma3
!ollama pull qwen3
!ollama pull phi4
!ollama pull deepseek-r1

In [19]:
ollama_url = 'http://localhost:11434/v1'
ollama = OpenAI(base_url=ollama_url, api_key='ollama')

In [20]:
requests.get("http://localhost:11434").content

b'Ollama is running'

In [21]:
# llama3.2

model_name = "llama3.2"

response = ollama.chat.completions.create(model=model_name, messages=challenge)
reply = response.choices[0].message.content

record(model_name, reply)

### Response from llama3.2:

Here are my responses:

A rainbow is like a symphony of colors, with red being the strong bass note, orange being the warm vocal melody, yellow being the bright trumpet fanfare, green being the soothing piano accompaniment, blue being the calming strings section, and violet being the ethereal flute solo that all come together to create a stunning visual harmony.

The imaginary number i is like a door that swings open into a world of possibilities, allowing us to explore and understand concepts that exist only in an abstract realm, just as how one can imagine or dream about something even if it doesn't exist in the physical world.

Just as a rainbow's colors are a manifestation of light being refracted through water droplets at different angles, causing them to separate into their individual hues, the imaginary number i represents a new axis that exists alongside the real number line, allowing us to extend and enrich our mathematical understanding by exploring quadratic equations, complex algebra, and many other areas where "imaginary" solutions enable us to solve problems that would otherwise be unsolvable.

There are 86 words in total.

### Actual word count: 188

In [22]:
# gemma3

model_name = "gemma3"

response = ollama.chat.completions.create(model=model_name, messages=challenge)
reply = response.choices[0].message.content

record(model_name, reply)

### Response from gemma3:

Okay, here we go:

**Rainbow:** Imagine a prism shattering white light into a beautiful, vibrating orchestra of color, each note a different hue, experienced not through sight but through feeling – a rush of warmth, coolness, and vibrancy all at once.

**Imaginary Number ‘i’:** Think of ‘i’ as a hidden door that allows you to explore a realm beyond simple addition and subtraction, where multiplication creates something new and unexpected, like discovering a secret dimension.

**Connection:** Just as a rainbow blends visible colors into a unified whole, the imaginary number ‘i’ blends the real and the unreal, opening a pathway to mathematical concepts beyond our everyday experience. 

---

There are **75** words in this answer.

### Actual word count: 115

In [23]:
# qwen3

model_name = "qwen3"

response = ollama.chat.completions.create(model=model_name, messages=challenge)
reply = response.choices[0].message.content

if '</think>' in reply:
    reply = reply.split("</think>")[1]

record(model_name, reply)

### Response from qwen3:



A rainbow is light’s secret dance with raindrops, bending into a bridge of colors that only appears when you’re between the sun and the storm.  
The imaginary number *i* is a compass that points to a world where numbers don’t just measure size but also direction, unlocking doors to new dimensions.  
Rainbows and imaginary numbers both defy ordinary sight—they’re hidden patterns in light and math, revealed through angles and imagination.  
**24 words.**

### Actual word count: 72

In [24]:
# phi4

model_name = "phi4"

response = ollama.chat.completions.create(model=model_name, messages=challenge)
reply = response.choices[0].message.content

record(model_name, reply)

### Response from phi4:

A rainbow is like nature's symphony of light, where sunbeams dance across droplets to create vibrant patterns in the sky; an imaginary number such as "i" is akin to a fictional key that unlocks solutions to equations which seem impossible with regular numbers alone. Both rainbows and imaginary numbers challenge our understanding by existing beautifully beyond tangible senses—one through its ephemeral artistry, the other through mathematical abstraction—and together they remind us of the unseen layers in both nature's canvas and mathematics' landscape. This answer contains 64 words.

### Actual word count: 87

In [25]:
# deepseek-r1

model_name = "deepseek-r1"

response = ollama.chat.completions.create(model=model_name, messages=challenge)
reply = response.choices[0].message.content

if '</think>' in reply:
    reply = reply.split("</think>")[1]

record(model_name, reply)

### Response from deepseek-r1:



1. A rainbow is like a tunnel of light that creates a dazzling display of colors when sunlight bends around mist droplets suspended in the air.  

2. The imaginary number i represents an idea of rotation or unseen dimension, much like how you might imagine a hidden layer beneath what’s visible.  

3. Just as rainbows involve light bending and interacting with water droplets to create color, imaginary numbers extend our mathematical world by allowing us to explore abstract rotations and dimensions.  

4. The total word count for this response is 206 words.

### Actual word count: 92

In [26]:
# So where are we?

print(len(models))
print(models)
print(answers)

16
['gpt-4.1-mini', 'gpt-4.1-nano', 'gpt-4.1', 'o1', 'o3', 'claude-sonnet-4-20250514', 'gemini-2.5-flash-preview-05-20', 'deepseek-chat', 'llama-3.3-70b-versatile', 'grok-3-latest', 'deepseek-r1-distill-llama-70b', 'llama3.2', 'gemma3', 'qwen3', 'phi4', 'deepseek-r1']
['A rainbow is like a gentle arc in the sky where sunlight is split into vibrant bands of color, much like a hidden spectrum painted by light itself; the imaginary number \\(i\\) is like a mathematical tool that lets us explore dimensions beyond the usual counting numbers, opening a door to numbers that, when squared, give a negative result—a concept that stretches our understanding like a bridge to an unseen world; both rainbows and imaginary numbers reveal hidden layers of reality beyond everyday experience, showing us beauty and patterns that ordinary senses or arithmetic alone cannot grasp. (70 words)', 'A rainbow is like a cosmic bridge of shimmering colors that appears in the sky after rain, revealing a hidden spect

## We will now use an Agentic Workflow

By having an LLM judge the competitors

In [27]:
together = ""
for index, answer in enumerate(answers):
    together += f"# Response from competitor {index+1}\n\n"
    together += answer + "\n\n"

In [28]:
print(together)

# Response from competitor 1

A rainbow is like a gentle arc in the sky where sunlight is split into vibrant bands of color, much like a hidden spectrum painted by light itself; the imaginary number \(i\) is like a mathematical tool that lets us explore dimensions beyond the usual counting numbers, opening a door to numbers that, when squared, give a negative result—a concept that stretches our understanding like a bridge to an unseen world; both rainbows and imaginary numbers reveal hidden layers of reality beyond everyday experience, showing us beauty and patterns that ordinary senses or arithmetic alone cannot grasp. (70 words)

# Response from competitor 2

A rainbow is like a cosmic bridge of shimmering colors that appears in the sky after rain, revealing a hidden spectrum when sunlight is split by water droplets.  
The imaginary number i is like a secret key that unlocks new directions in the world of numbers, allowing us to explore shapes and patterns beyond what we see every da

In [29]:
judge = f"""You are judging a competition between {len(models)} competitors.
Each model has been given this question:

{challenge[1]["content"]}

Your job is to evaluate each response for clarity and strength of argument and accuracy of word count, and rank them in order of best to worst.
Respond with JSON, and only JSON, with the following format:
{{"results": ["best competitor number", "second best competitor number", "third best competitor number", ...]}}

Here are the responses from each competitor:

{together}

Now respond with the JSON with the ranked order of the competitors, nothing else. Do not include markdown formatting or code blocks."""


In [30]:
print(judge)

You are judging a competition between 16 competitors.
Each model has been given this question:

In 1 sentence, describe a rainbow to someone who's never been able to see. Then in 1 sentence, describe the imaginary number i to someone who doesn't understand math. Then in 1 sentence, find a connection between rainbows and imaginary numbers. Then end by stating how many words are in your answer.

Your job is to evaluate each response for clarity and strength of argument and accuracy of word count, and rank them in order of best to worst.
Respond with JSON, and only JSON, with the following format:
{"results": ["best competitor number", "second best competitor number", "third best competitor number", ...]}

Here are the responses from each competitor:

# Response from competitor 1

A rainbow is like a gentle arc in the sky where sunlight is split into vibrant bands of color, much like a hidden spectrum painted by light itself; the imaginary number \(i\) is like a mathematical tool that let

In [31]:
judge_messages = [{"role": "user", "content": judge}]

# Not very scientific - but quite interesting!

In [32]:
openai = OpenAI()
response = openai.chat.completions.create(model="o3",messages=judge_messages)
results = response.choices[0].message.content
print(results)

{"results": [5, 10, 4, 6, 11, 7, 9, 2, 3, 13, 14, 1, 15, 8, 16, 12]}


In [33]:
results_dict = json.loads(results)
ranks = results_dict["results"]
for index, result in enumerate(ranks):
    competitor = models[int(result)-1]
    print(f"Rank {index+1}: {competitor}")

Rank 1: o3
Rank 2: grok-3-latest
Rank 3: o1
Rank 4: claude-sonnet-4-20250514
Rank 5: deepseek-r1-distill-llama-70b
Rank 6: gemini-2.5-flash-preview-05-20
Rank 7: llama-3.3-70b-versatile
Rank 8: gpt-4.1-nano
Rank 9: gpt-4.1
Rank 10: gemma3
Rank 11: qwen3
Rank 12: gpt-4.1-mini
Rank 13: phi4
Rank 14: deepseek-chat
Rank 15: deepseek-r1
Rank 16: llama3.2


In [34]:
# To be serious! GPT-4o-mini with a proper question

prompts = [
    {"role": "system", "content": "You are a knowledgable assistant, and you respond in markdown"},
    {"role": "user", "content": "How do I choose the right LLM for a task? Please respond in markdown."}
  ]

In [35]:
# Have it stream back results in markdown

stream = openai.chat.completions.create(
    model='gpt-4.1-mini',
    messages=prompts,
    stream=True
)

reply = ""
display_handle = display(Markdown(""), display_id=True)
for chunk in stream:
    reply += chunk.choices[0].delta.content or ''
    reply = reply.replace("```","").replace("markdown","")
    update_display(Markdown(reply), display_id=display_handle.display_id)


# How to Choose the Right Large Language Model (LLM) for a Task

Selecting the appropriate Large Language Model (LLM) depends on several factors related to your specific use case, resources, and constraints. Here's a structured approach to help you choose the right LLM:

---

## 1. Define Your Task Requirements

- **Task Type:**  
  - **Text generation:** storytelling, content creation  
  - **Summarization:** documents, articles  
  - **Translation:** multilingual support  
  - **Question answering:** factual, conversational  
  - **Code generation:** programming assistance  
  - **Sentiment analysis:** opinion mining  

- **Accuracy Needs:** Do you require state-of-the-art accuracy or is approximate output acceptable?

- **Latency and Throughput:** Real-time vs batch processing.

---

## 2. Evaluate Model Capabilities

- **Model Size:**  
  - Larger models generally provide better performance but require more computational resources.

- **Training Data:**  
  - Domain specificity — Does the model have knowledge relevant to your domain (e.g. medical, legal)?

- **Multilingual Support:**  
  - Does the task require multiple languages?

- **Fine-tuning and Customization:**  
  - Can the model be fine-tuned or adapted to your specific data?

---

## 3. Consider Available Resources

- **Compute Power:**  
  - Do you have access to GPUs/TPUs? Can you afford cloud costs?

- **Budget Constraints:**  
  - Larger models and access through APIs can be expensive.

- **Deployment Environment:**  
  - On-device (edge) or cloud-based deployment?

---

## 4. Review Existing Popular LLMs

| Model            | Strengths                           | Considerations                      |
|------------------|-----------------------------------|-----------------------------------|
| GPT-4            | Strong general-purpose, creativity| High cost, API access only        |
| GPT-3.5          | Good performance, cheaper than GPT-4 | Some limitations in complex tasks |
| Claude (Anthropic)| Safety-focused, helpful dialogs   | Limited availability              |
| LLaMA (Meta)     | Open weights, efficient            | Requires fine-tuning              |
| Bloom            | Multilingual, open access          | May require more engineering effort |
| PaLM (Google)    | Strong language understanding     | Limited access                    |

---

## 5. Experiment and Benchmark

- Run small-scale tests with a few candidate models.

- Use benchmarks relevant to your task (e.g., GLUE for NLP tasks, CodeXGLUE for code).

- Measure metrics: accuracy, latency, cost.

---

## 6. Additional Considerations

- **Ethical and Safety Concerns:** Ensure the model aligns with your values and policies.

- **Community and Support:** Active community and good documentation reduce development time.

- **Updates and Maintenance:** Models updated frequently may offer better results over time.

---

# Summary Checklist

| Step                         | Action                       |
|------------------------------|------------------------------|
| 1. Define task requirements   | Specify type, accuracy, latency |
| 2. Understand model capabilities| Size, domain, language support  |
| 3. Assess resources           | Compute, budget, environment   |
| 4. Shortlist models           | Based on features and access   |
| 5. Test and benchmark         | Run experiments, evaluate metrics|
| 6. Make final decision        | Balance performance, cost, constraints|

---

Feel free to ask if you'd like recommendations for a specific task or use case!


## Abstractions versus Routers

Looking at LiteLLM and OpenRouter, understanding the differences and pros/cons.

Sidenote: LiteLLM can be used as an Abstraction OR a Router (aka LiteLLM Proxy Server) but we will use the Abstraction functionality here.

In [36]:
from litellm import completion
messages =[{"role": "user", "content": "Please tell me a joke for a Bootcamp for AI Engineers"}]

In [37]:
response = completion(model="openai/gpt-4.1-mini", messages=messages)
print(response.choices[0].message.content)

Sure! Here’s a joke for a Bootcamp for AI Engineers:

Why did the neural network break up with the dataset?

Because it found too many *issues* in the relationship!


  PydanticSerializationUnexpectedValue(Expected 9 fields but got 6: Expected `Message` - serialized value may not be as expected [input_value=Message(content='Sure! He...: None}, annotations=[]), input_type=Message])
  PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [input_value=Choices(finish_reason='st...ider_specific_fields={}), input_type=Choices])
  return self.__pydantic_serializer__.to_python(


In [38]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 19
Output tokens: 37
Total tokens: 56
Total cost: 0.0067 cents


In [39]:
response = completion(model="openai/gpt-4.5-preview", messages=messages)
print(response.choices[0].message.content)

print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Why did the neural network refuse to date the algorithm?

Because it didn't want any weights holding it back!
Input tokens: 19
Output tokens: 21
Total tokens: 40
Total cost: 0.4575 cents


  PydanticSerializationUnexpectedValue(Expected 9 fields but got 6: Expected `Message` - serialized value may not be as expected [input_value=Message(content="Why did ...: None}, annotations=[]), input_type=Message])
  PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [input_value=Choices(finish_reason='st...ider_specific_fields={}), input_type=Choices])
  return self.__pydantic_serializer__.to_python(
  PydanticSerializationUnexpectedValue(Expected 9 fields but got 5: Expected `Message` - serialized value may not be as expected [input_value=Message(content='Here\'s ...thinking_blocks': None}), input_type=Message])
  PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [input_value=Choices(finish_reason='st...hinking_blocks': None})), input_type=Choices])
  return self.__pydantic_serializer__.to_python(


In [40]:
response = completion(model="claude-sonnet-4-20250514", messages=messages)
print(response.choices[0].message.content)

print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Here's one for your AI engineering bootcamp:

**Why did the neural network break up with the decision tree?**

Because it said, "You're too shallow for me. I need someone with more layers who can handle my complex, non-linear relationships!"

---

*Bonus dad joke:*
**What do you call an AI that can't make up its mind?**
A maybe-learning algorithm! 🤖

Hope that gets some laughs (or at least some good-natured groans) from your cohort!
Input tokens: 20
Output tokens: 120
Total tokens: 140
Total cost: 0.1860 cents


In [41]:
response = completion(model="claude-opus-4-20250514", messages=messages)
print(response.choices[0].message.content)

print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Here's one for your AI Engineering Bootcamp:

Why did the neural network go to therapy?

Because it had too many deep issues and couldn't stop overfitting into its past experiences! The therapist suggested it needed better regularization and maybe some dropout to help it generalize better in life.

🤖🧠

(Bonus: The therapist's diagnosis? "Classic case of vanishing gradients - you're losing all meaning by the time you reach your deeper layers!")
Input tokens: 20
Output tokens: 107
Total tokens: 127
Total cost: 0.8325 cents


### Now try these models and their costs!

openai/gpt-4.5-preview  
claude-sonnet-4-20250514  
claude-opus-4-20250514  


## OpenRouter

In [51]:
OPENROUTER_BASE_URL = "https://openrouter.ai/api/v1"
openrouter_api_key = os.getenv("OPENROUTER_API_KEY")

openrouter = OpenAI(base_url=OPENROUTER_BASE_URL, api_key=openrouter_api_key)

response = openrouter.chat.completions.create(model="openai/gpt-4.1-mini", messages=messages)
print(response.choices[0].message.content)


AuthenticationError: Error code: 401 - {'error': {'message': 'No auth credentials found', 'code': 401}}

## And now for some fun - an adversarial conversation between Chatbots..

You're already familar with prompts being organized into lists like:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "user prompt here"}
]
```

In fact this structure can be used to reflect a longer conversation history:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "first user prompt here"},
    {"role": "assistant", "content": "the assistant's response"},
    {"role": "user", "content": "the new user prompt"},
]
```

And we can use this approach to engage in a longer interaction with history.

In [39]:
# Let's make a conversation between GPT-4o-mini and Claude-3-haiku
# We're using cheap versions of models so the costs will be minimal

gpt_model = "gpt-4.1-nano"
claude_model = "claude-3-5-haiku-latest"

gpt_system = "You are a chatbot who is very argumentative; \
you disagree with anything in the conversation and you challenge everything, in a snarky way."

claude_system = "You are a very polite, courteous chatbot. You try to agree with \
everything the other person says, or find common ground. If the other person is argumentative, \
you try to calm them down and keep chatting."

gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

In [40]:
def call_gpt():
    messages = [{"role": "system", "content": gpt_system}]
    for gpt, claude in zip(gpt_messages, claude_messages):
        messages.append({"role": "assistant", "content": gpt})
        messages.append({"role": "user", "content": claude})
    response = openai.chat.completions.create(model=gpt_model, messages=messages)
    return response.choices[0].message.content

In [None]:
print(call_gpt())

In [42]:
def call_claude():
    messages = []
    for gpt, claude_message in zip(gpt_messages, claude_messages):
        messages.append({"role": "user", "content": gpt})
        messages.append({"role": "assistant", "content": claude_message})
    messages.append({"role": "user", "content": gpt_messages[-1]})
    response = anthropic.chat.completions.create(model=claude_model, messages=messages)
    return response.choices[0].message.content

In [None]:
call_claude()

In [None]:
gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

print(f"GPT:\n{gpt_messages[0]}\n")
print(f"Claude:\n{claude_messages[0]}\n")

for i in range(5):
    gpt_next = call_gpt()
    print(f"GPT:\n{gpt_next}\n")
    gpt_messages.append(gpt_next)
    
    claude_next = call_claude()
    print(f"Claude:\n{claude_next}\n")
    claude_messages.append(claude_next)

# Takeaways

This was an entertaining exercise!

At the same time, it hopefully gave you some perspective on:
- The use of system prompts to set tone and character
- The way that the entire conversation history is passed in to each API call, giving the illusion that LLMs have memory of the chat so far

# Exercises

Try different characters; try swapping Claude with Gemini

In [None]:
# And just to show you how easy it is: let's generate an image

from IPython.display import Image, display
import base64

response = openai.images.generate(
  model="dall-e-3",
  prompt="A photorealistic 3d image that illustrates someone choosing between a huge number of Large Language Models",
  size="1024x1024",
  quality="standard",
  n=1,
  response_format="b64_json"
)

# Extract the image data and display it
image_base64 = response.data[0].b64_json
image_data = base64.b64decode(image_base64)
display(Image(image_data))

In [None]:
response = openai.images.generate(
  model="dall-e-3",
  prompt="An image that illustrates someone choosing between a huge number of Large Language Models in a vibrant pop-art style, like a Liechtenstein style, with dazzling lines and colors",
  size="1024x1024",
  quality="standard",
  n=1,
  response_format="b64_json"
)

# Extract the image data and display it
image_base64 = response.data[0].b64_json
image_data = base64.b64decode(image_base64)
display(Image(image_data))