# Lab 1

# STRAIGHT TO ACTION!

Welcome to our first Jupyter Lab where we will see rapid, satisfying results!

I will leave with you to try out leading LLMs through their Chat Interfaces

Together, we will call them using their APIs

Please see the README for instructions on setting this up and getting your API key

# If this is your first time in a Jupyter Notebook..

Welcome to the world of Data Science experimentation. Warning: Jupyter Notebooks are very addictive and you may find it hard to go back to IDEs afterwards!!

Simply click in each cell with code and press `Shift + Enter` to execute the code and print the results.

There's a notebook called "Guide to Jupyter" in the parent directory that will give you a handy tutorial on all things Jupyter Lab.

## Part 1: For you to experiment: Frontier models through their Chat UI

The way that you are probably most familiar working with leading LLMs: through their tools.  
Some questions you can try asking them:
1. What kinds of business problem are most suitable for an LLM solution?
2. How many words are there in your answer to this prompt?
3. How many rainbows does it take to jump from Hawaii to seventeen?
4. What does it feel like to be jealous?

**ChatGPT** from OpenAI needs no introduction.

Let's try some hard questions, and use the new o1 model as well as GPT-4o. Also try GPT-4o with canvas.

https://chatgpt.com/?model=gpt-4o

**Claude** from Anthropic is favored by many data scientists, with focus on safety, personality and brevity.

https://claude.ai/new

**Gemini** from Google is becoming increasingly well known as its results are surfaced in Google searches.

https://gemini.google.com/app

**Command R+** from Cohere focuses on accuracy and makes extensive use of RAG

https://coral.cohere.com/

**Meta AI** from Meta is their chat UI on their famous Llama open-source model

https://www.meta.ai/

**Perplexity** from Perplexity is a Search Engine well known for its customized search results

https://www.perplexity.ai/

**LeChat** from Mistral is the Web UI from the French AI powerhouse

https://chat.mistral.ai/

**DeepSeek** from DeepSeek AI needs no introduction! Deepseek-R1 is the Reasoning model, V3 is their Chat model.

https://chat.deepseek.com/


## Conclusions and Takeways from exploring the Chat UIs

- These models are astonishing
- Reasoning vs Chat models - different capabilities and use cases
- Price is highly competitive

You'll find cost and other comparisons at this very useful leaderboard:

https://www.vellum.ai/llm-leaderboard

## PART 2: Calling Frontier Models through APIs

## Setting up your keys

If you haven't done so already, you'll need to create API keys from OpenAI, Anthropic and Google, and also DeepSeek and Groq if you wish.

For OpenAI, visit https://openai.com/api/  
For Anthropic, visit https://console.anthropic.com/  
For Google, visit https://ai.google.dev/gemini-api  

When you get your API keys, you need to set them as environment variables.

EITHER (recommended) create a file called `.env` in this project root directory, and set your keys there:

```
OPENAI_API_KEY=xxxx
ANTHROPIC_API_KEY=xxxx
GOOGLE_API_KEY=xxxx
DEEPSEEK_API_KEY=xxxx
GROQ_API_KEY=xxxx
```

OR enter the keys directly in the cells below.

## Two purposes of these APIs:

1. Illustrate the different APIs
2. Experiment with some LLMs

In [1]:
# imports

import os
import json
import requests
from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import Markdown, display, update_display

In [2]:
# Load environment variables in a file called .env
# Print the key prefixes to help with any debugging

load_dotenv(override=True)
openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')
grok_api_key = os.getenv('GROK_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set (and this is optional)")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:2]}")
else:
    print("Google API Key not set (and this is optional)")

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set (and this is optional)")

if groq_api_key:
    print(f"Groq API Key exists and begins {groq_api_key[:4]}")
else:
    print("Groq API Key not set (and this is optional)")

if grok_api_key:
    print(f"Grok API Key exists and begins {grok_api_key[:4]}")
else:
    print("Grok API Key not set (and this is optional)")

OpenAI API Key exists and begins sk-proj-
Anthropic API Key exists and begins sk-ant-
Google API Key exists and begins AI
DeepSeek API Key exists and begins sk-
Groq API Key exists and begins gsk_
Grok API Key not set (and this is optional)


## Connecting to Python Client libraries

We call Cloud APIs by making REST calls to an HTTP endpoint, passing in our keys.

For convenience, OpenAI has provided a lightweight python client library that makes the HTTP calls for us.

In [3]:
# Connect to OpenAI client library
# A thin wrappes around calls to REST endpoints

openai = OpenAI()

# For Gemini, DeepSeek and Groq, we can use the OpenAI python client
# Because Google and DeepSeek have endpoints compatible with OpenAI
# And OpenAI allows you to change the base_url

anthropic_url = "https://api.anthropic.com/v1/"
gemini_url = "https://generativelanguage.googleapis.com/v1beta/openai/"
deepseek_url = "https://api.deepseek.com"
groq_url = "https://api.groq.com/openai/v1"
grok_url = "https://api.x.ai/v1"

anthropic = OpenAI(api_key=anthropic_api_key, base_url=anthropic_url)
gemini = OpenAI(api_key=google_api_key, base_url=gemini_url)
deepseek = OpenAI(api_key=deepseek_api_key, base_url=deepseek_url)
groq = OpenAI(api_key=groq_api_key, base_url=groq_url)
grok = OpenAI(api_key=grok_api_key, base_url=grok_url)

## Asking LLMs a hard question that will put them to the test and illustrate their power

We will come up with a challenging question to test out model performance with language and nuance.

### What information is included in the API

Typically we'll pass to the API:
- The name of the model that should be used
- A **system message** that gives overall context for the role the LLM is playing
- A **user message** that provides the actual prompt

There are other parameters that can be used, including **temperature** which is typically between 0 and 1; higher for more random output; lower for more focused and deterministic.

### The standard format of messages with an LLM, first used by OpenAI in its API and now adopted more widely

Conversations use this format:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "user prompt here"}
]
```


In [4]:
# The hardest question I could come up with

system_message = "You are able to explain abstract concepts clearly and concisely, with powerful analogies"

user_prompt = "In 1 sentence, describe a rainbow to someone who's never been able to see. \
Then in 1 sentence, describe the imaginary number i to someone who doesn't understand math. \
Then in 1 sentence, find a connection between rainbows and imaginary numbers. \
Then end by stating how many words are in your answer."

In [6]:
challenge = [
    {"role": "system", "content": system_message},
    {"role": "user", "content": user_prompt}
  ]

In [7]:
models = []
answers = []

def record(model, reply):
    display(Markdown(f"### Response from {model}:\n\n{reply}\n\n### Actual word count: {len(reply.split())}"))
    models.append(model)
    answers.append(reply)

In [8]:
# GPT-4.1-mini

model_name = "gpt-4.1-mini"

response = openai.chat.completions.create(model=model_name, messages=challenge)
reply = response.choices[0].message.content

record(model_name, reply)

### Response from gpt-4.1-mini:

A rainbow is like a glowing, colorful bridge arching across the sky, created when sunlight dances through tiny droplets of water. The imaginary number \( i \) is like a mysterious tool that lets us solve puzzles where regular numbers alone get stuck, by taking the square root of negative one, which no ordinary number can do. Both rainbows and the number \( i \) reveal hidden layers of reality—one shows unseen colors through light, and the other unlocks new math dimensions beyond the real numbers. This answer contains 58 words.

### Actual word count: 91

In [9]:
# GPT-4.1-nano

model_name = "gpt-4.1-nano"

response = openai.chat.completions.create(model=model_name, messages=challenge)
reply = response.choices[0].message.content

record(model_name, reply)

### Response from gpt-4.1-nano:

A rainbow is like a colorful bridge in the sky that appears when sunlight bends through raindrops, creating a spectrum of colors; the imaginary number i is like a magic box in math that, when multiplied by itself, turns into just -1, defying ordinary numbers; similarly, rainbows and imaginary numbers both involve bending or transforming the usual way we see or understand the world—one through light and color, the other through abstract mathematical concepts. (Word count: 71)

### Actual word count: 77

In [10]:
# GPT-4.1

model_name = "gpt-4.1"

response = openai.chat.completions.create(model=model_name, messages=challenge)
reply = response.choices[0].message.content

record(model_name, reply)

### Response from gpt-4.1:

A rainbow is like a gentle, cool breeze that changes in feel as you move through it, each subtle shift giving you a new sensation, though you cannot hold or touch its source.  
The imaginary number i is like a special key that opens a door to a room you can't enter in the real world, but that room still affects everything built upon it.  
Both rainbows and imaginary numbers reveal patterns and possibilities that exist just beyond the boundaries of ordinary experience, enchanting us with what cannot be directly grasped.  
Seventy-eight words.

### Actual word count: 93

In [11]:
# o1
# o1 is a "reasoning" model that has been trained to think through it's answer before it replies..

model_name = "o1"

response = openai.chat.completions.create(model=model_name, messages=challenge)
reply = response.choices[0].message.content

record(model_name, reply)

### Response from o1:

A rainbow is like a softly curved band that emerges after rain, each color feels like a different breath vibrating gently in the air.  
The imaginary number i is the mathematical equivalent of a hidden corridor that lets us explore new dimensions beyond the familiar real world.  
Like rainbows that reveal unseen beauty in mist, i reveals layers of possibility in equations.  
This entire answer has 68 words.  

### Actual word count: 68

In [12]:
# o3

model_name = "o3"

response = openai.chat.completions.create(model=model_name, messages=challenge)
reply = response.choices[0].message.content

record(model_name, reply)

NotFoundError: Error code: 404 - {'error': {'message': 'Your organization must be verified to use the model `o3`. Please go to: https://platform.openai.com/settings/organization/general and click on Verify Organization. If you just verified, it can take up to 15 minutes for access to propagate.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}

In [13]:
# Claude 4.0 Sonnet
# API needs system message provided separately from user prompt
# Also adding max_tokens

model_name = "claude-sonnet-4-20250514"

response = anthropic.chat.completions.create(model=model_name, messages=challenge)
reply = response.choices[0].message.content

record(model_name, reply)

### Response from claude-sonnet-4-20250514:

A rainbow is like nature's gentle music made visible—a curved bridge of light that spans the sky in bands of color that flow smoothly from warm, energetic red through cool, calming violet, each hue as distinct as different musical notes yet blending seamlessly like a perfect chord.

The imaginary number i is like a magical mathematical direction that points "sideways" from our normal number line—just as you can move forward and backward on a path, i lets mathematicians move in a completely new dimension to solve problems that would otherwise be impossible.

Both rainbows and imaginary numbers reveal hidden dimensions of reality that transform our understanding—rainbows show us that white light contains an invisible spectrum of colors, while i shows us that mathematics contains an invisible realm beyond ordinary numbers.

This answer contains 124 words.

### Actual word count: 135

In [14]:
# Gemini 2.5 Flash

model_name = "gemini-2.5-flash-preview-05-20"

response = gemini.chat.completions.create(model=model_name, messages=challenge)
reply = response.choices[0].message.content

record(model_name, reply)

### Response from gemini-2.5-flash-preview-05-20:

Imagine the sky's silent, hopeful song after rain, a fleeting embrace of brilliant light that feels like a whisper of wonder. It's like a secret key that unlocks a new dimension in math, allowing us to solve problems that seem impossible with ordinary numbers. Both reveal a hidden, extra dimension: rainbows unveil a spectrum of light beyond the ordinary, while 'i' unlocks a new realm of mathematical possibility beyond real numbers.

There are 83 words in my answer.

### Actual word count: 78

In [15]:
# Gemini 2.5 Pro

model_name = "gemini-2.5-pro-preview-06-05"

response = gemini.chat.completions.create(model=model_name, messages=challenge)
reply = response.choices[0].message.content

record(model_name, reply)

RateLimitError: Error code: 429 - [{'error': {'code': 429, 'message': "Gemini 2.5 Pro Preview doesn't have a free quota tier. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.", 'status': 'RESOURCE_EXHAUSTED', 'details': [{'@type': 'type.googleapis.com/google.rpc.QuotaFailure', 'violations': [{'quotaMetric': 'generativelanguage.googleapis.com/generate_content_free_tier_requests', 'quotaId': 'GenerateRequestsPerDayPerProjectPerModel-FreeTier', 'quotaDimensions': {'location': 'global', 'model': 'gemini-2.5-pro-exp'}}, {'quotaMetric': 'generativelanguage.googleapis.com/generate_content_free_tier_requests', 'quotaId': 'GenerateRequestsPerMinutePerProjectPerModel-FreeTier', 'quotaDimensions': {'location': 'global', 'model': 'gemini-2.5-pro-exp'}}, {'quotaMetric': 'generativelanguage.googleapis.com/generate_content_free_tier_input_token_count', 'quotaId': 'GenerateContentInputTokensPerModelPerMinute-FreeTier', 'quotaDimensions': {'model': 'gemini-2.5-pro-exp', 'location': 'global'}}, {'quotaMetric': 'generativelanguage.googleapis.com/generate_content_free_tier_input_token_count', 'quotaId': 'GenerateContentInputTokensPerModelPerDay-FreeTier', 'quotaDimensions': {'location': 'global', 'model': 'gemini-2.5-pro-exp'}}]}, {'@type': 'type.googleapis.com/google.rpc.Help', 'links': [{'description': 'Learn more about Gemini API quotas', 'url': 'https://ai.google.dev/gemini-api/docs/rate-limits'}]}, {'@type': 'type.googleapis.com/google.rpc.RetryInfo', 'retryDelay': '39s'}]}}]

In [16]:
# Deepseek-V3

model_name = "deepseek-chat"

response = deepseek.chat.completions.create(model=model_name, messages=challenge)
reply = response.choices[0].message.content

record(model_name, reply)

### Response from deepseek-chat:

A rainbow is like nature’s melody painted across the sky after rain, blending colors in a gentle arc; *i* is a mathematical "what if" that answers the question, "What squared equals -1?"; both rainbows and *i* reveal hidden beauty when you look beyond the obvious. (50 words)

### Actual word count: 47

In [17]:
# Deepseek-R1
# This takes too long! It can get stuck in a loop 

model_name = "deepseek-reasoner"

response = deepseek.chat.completions.create(model=model_name, messages=challenge)
reply = response.choices[0].message.content

record(model_name, reply)

### Response from deepseek-reasoner:

1.  A rainbow is like feeling layers of warmth on your skin—cool blues and purples at the edges shifting to warm reds and oranges in the middle—created when sunlight bends through raindrops high in the sky.
2.  Imaginary number *i* is a sideways step in math, defined as the number that, when multiplied by itself, gives you negative one (`√-1`), allowing calculations beyond ordinary numbers.
3.  Both rainbows and imaginary numbers reveal unexpected beauty and hidden dimensions—rainbows by bending light we *can* see, and imaginary numbers by extending math beyond numbers we *can* count.
4.  50 words.

### Actual word count: 97

In [None]:
# Groq - llama-3.3-70b-versatile

model_name = "llama-3.3-70b-versatile"

response = groq.chat.completions.create(model=model_name, messages=challenge)
reply = response.choices[0].message.content

record(model_name, reply)

In [18]:
# Groq - deepseek-r1-distill-llama-70b

model_name = "deepseek-r1-distill-llama-70b"

response = groq.chat.completions.create(model=model_name, messages=challenge)
reply = response.choices[0].message.content

if '</think>' in reply:
    reply = reply.split('</think>')[1]

record(model_name, reply)

### Response from deepseek-r1-distill-llama-70b:



A rainbow is like a symphony of colors, each playing a unique role to create a breathtaking harmony in the sky.  
The imaginary number \(i\) is like a seed planted in the ground, holding the potential to grow into something extraordinary.  
Just as a rainbow requires light and water to reveal its beauty, imaginary numbers require specific conditions to unlock their power in mathematics.  
Word count: 48.

### Actual word count: 67

In [19]:
# Grok

model_name = "grok-3-latest"

response = grok.chat.completions.create(model=model_name, messages=challenge)
reply = response.choices[0].message.content

record(model_name, reply)

BadRequestError: Error code: 400 - {'code': 'Client specified an invalid argument', 'error': 'Incorrect API key provided: sk***MA. You can obtain an API key from https://console.x.ai.'}

# Now for Part 3

### Recap: first we tried Frontier LLMs through their chat interfaces

### Then we called Cloud APIs

### And now:

Now try the 3rd way to use LLMs - direct inference of Open Source Models running locally with Ollama¶
Visit the README for instructions on installing Ollama locally.

You can see some comparisons of Open Source models on the HuggingFace OpenLLM Leaderboard.

Ollama provides an OpenAI-style local endpoint, so this will look very similar to part 2!


In [20]:
!ollama pull llama3.2
!ollama pull gemma3
!ollama pull qwen3
!ollama pull phi4
!ollama pull deepseek-r1

[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠦ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠧ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠇ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest [K
pulling dde5aa3fc5ff:   0% ▕                  ▏ 106 KB/2.0 GB                  [K[?25h[?2026l[?2026h[?25l[A[1Gpulling manifest [K
pulling dde5aa3fc5ff:   0% ▕                  ▏ 2.4 MB/2.0 GB                  [K[?25h[?2026l[?2026h[?25l[A[1Gpulling manifest [K
pulling dde5aa3fc5ff:   0% ▕                  ▏ 5.2 MB/2.0 GB                  [K[?25h[?2026l[?2026h[?25l[A[1Gpulling manifest [K
pulling dde5aa3fc5ff:   1% ▕                  ▏  15 MB/2.0 GB

In [21]:
ollama_url = 'http://localhost:11434/v1'
ollama = OpenAI(base_url=ollama_url, api_key='ollama')

In [22]:
requests.get("http://localhost:11434").content

b'Ollama is running'

In [23]:
# llama3.2

model_name = "llama3.2"

response = ollama.chat.completions.create(model=model_name, messages=challenge)
reply = response.choices[0].message.content

record(model_name, reply)

### Response from llama3.2:

Here are my descriptions:

A rainbow is like a symphony of sound colors that harmonize together in the sky, each note blending into its neighboring color to create an ever-changing tapestry of beauty.

To describe the imaginary number "i" to someone who doesn't understand math, imagine it as a magical key that unlocks the possibility of infinite possibilities, allowing us to stretch and bend mathematical concepts to explore new frontiers.

The connection between rainbows and imaginary numbers lies in the idea that just as a rainbow appears in our minds when we superimpose different colors together, "i" is like an imaginary axis that helps us visualize these complex equations by shifting our understanding of balance and harmony.

There are 56 words in my answer.

### Actual word count: 124

In [24]:
# gemma3

model_name = "gemma3"

response = ollama.chat.completions.create(model=model_name, messages=challenge)
reply = response.choices[0].message.content

record(model_name, reply)

### Response from gemma3:

Okay, here we go:

**Rainbow:** Imagine the air after a rainstorm is filled with tiny, vibrating glass shards – each one bending and scattering light into a beautiful, layered arc of color, like a complex, shimmering music note. 

**Imaginary Number ‘i’:** Think of ‘i’ as a hidden pathway in a river, allowing you to flow *past* a waterfall – a way to move beyond the limits of just up and down.

**Connection:** Just as a rainbow reveals hidden color through refraction, ‘i’ reveals a hidden dimension beyond real numbers through multiplication.

**Word Count:** 77 words 


### Actual word count: 95

In [25]:
# qwen3

model_name = "qwen3"

response = ollama.chat.completions.create(model=model_name, messages=challenge)
reply = response.choices[0].message.content

if '</think>' in reply:
    reply = reply.split("</think>")[1]

record(model_name, reply)

### Response from qwen3:



A rainbow is a spectrum of colors formed when sunlight bends and reflects in water droplets, creating an arc of hues visible when your back faces the sun. The imaginary number *i* is a mathematical tool that lets us solve equations where real numbers fail, by creating a new dimension to numbers, like a bridge between the real and the undefined. Rainbows and imaginary numbers both involve light and math, with rainbows revealing light's hidden structure and *i* expanding numbers to model unseen phenomena. **How many words are in your answer?**  
**Answer word count: 106 words.**

### Actual word count: 96

In [26]:
# phi4

model_name = "phi4"

response = ollama.chat.completions.create(model=model_name, messages=challenge)
reply = response.choices[0].message.content

record(model_name, reply)

KeyboardInterrupt: 

In [None]:
# deepseek-r1

model_name = "deepseek-r1"

response = ollama.chat.completions.create(model=model_name, messages=challenge)
reply = response.choices[0].message.content

if '</think>' in reply:
    reply = reply.split("</think>")[1]

record(model_name, reply)

In [27]:
# So where are we?

print(len(models))
print(models)
print(answers)

12
['gpt-4.1-mini', 'gpt-4.1-nano', 'gpt-4.1', 'o1', 'claude-sonnet-4-20250514', 'gemini-2.5-flash-preview-05-20', 'deepseek-chat', 'deepseek-reasoner', 'deepseek-r1-distill-llama-70b', 'llama3.2', 'gemma3', 'qwen3']
['A rainbow is like a glowing, colorful bridge arching across the sky, created when sunlight dances through tiny droplets of water. The imaginary number \\( i \\) is like a mysterious tool that lets us solve puzzles where regular numbers alone get stuck, by taking the square root of negative one, which no ordinary number can do. Both rainbows and the number \\( i \\) reveal hidden layers of reality—one shows unseen colors through light, and the other unlocks new math dimensions beyond the real numbers. This answer contains 58 words.', 'A rainbow is like a colorful bridge in the sky that appears when sunlight bends through raindrops, creating a spectrum of colors; the imaginary number i is like a magic box in math that, when multiplied by itself, turns into just -1, defying

In [28]:
together = ""
for index, answer in enumerate(answers):
    together += f"# Response from competitor {index+1}\n\n"
    together += answer + "\n\n"

In [29]:
print(together)

# Response from competitor 1

A rainbow is like a glowing, colorful bridge arching across the sky, created when sunlight dances through tiny droplets of water. The imaginary number \( i \) is like a mysterious tool that lets us solve puzzles where regular numbers alone get stuck, by taking the square root of negative one, which no ordinary number can do. Both rainbows and the number \( i \) reveal hidden layers of reality—one shows unseen colors through light, and the other unlocks new math dimensions beyond the real numbers. This answer contains 58 words.

# Response from competitor 2

A rainbow is like a colorful bridge in the sky that appears when sunlight bends through raindrops, creating a spectrum of colors; the imaginary number i is like a magic box in math that, when multiplied by itself, turns into just -1, defying ordinary numbers; similarly, rainbows and imaginary numbers both involve bending or transforming the usual way we see or understand the world—one through light and 

In [30]:
judge = f"""You are judging a competition between {len(models)} competitors.
Each model has been given this question:

{challenge[1]["content"]}

Your job is to evaluate each response for clarity and strength of argument and accuracy of word count, and rank them in order of best to worst.
Respond with JSON, and only JSON, with the following format:
{{"results": ["best competitor number", "second best competitor number", "third best competitor number", ...]}}

Here are the responses from each competitor:

{together}

Now respond with the JSON with the ranked order of the competitors, nothing else. Do not include markdown formatting or code blocks."""


In [31]:
print(judge)

You are judging a competition between 12 competitors.
Each model has been given this question:

In 1 sentence, describe a rainbow to someone who's never been able to see. Then in 1 sentence, describe the imaginary number i to someone who doesn't understand math. Then in 1 sentence, find a connection between rainbows and imaginary numbers. Then end by stating how many words are in your answer.

Your job is to evaluate each response for clarity and strength of argument and accuracy of word count, and rank them in order of best to worst.
Respond with JSON, and only JSON, with the following format:
{"results": ["best competitor number", "second best competitor number", "third best competitor number", ...]}

Here are the responses from each competitor:

# Response from competitor 1

A rainbow is like a glowing, colorful bridge arching across the sky, created when sunlight dances through tiny droplets of water. The imaginary number \( i \) is like a mysterious tool that lets us solve puzzles

In [32]:
judge_messages = [{"role": "user", "content": judge}]

# Not very scientific - but quite interesting!

In [33]:
openai = OpenAI()
response = openai.chat.completions.create(model="o3",messages=judge_messages)
results = response.choices[0].message.content
print(results)

NotFoundError: Error code: 404 - {'error': {'message': 'Your organization must be verified to use the model `o3`. Please go to: https://platform.openai.com/settings/organization/general and click on Verify Organization. If you just verified, it can take up to 15 minutes for access to propagate.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}

In [34]:
results_dict = json.loads(results)
ranks = results_dict["results"]
for index, result in enumerate(ranks):
    competitor = models[int(result)-1]
    print(f"Rank {index+1}: {competitor}")

NameError: name 'results' is not defined

In [35]:
# To be serious! GPT-4o-mini with a proper question

prompts = [
    {"role": "system", "content": "You are a knowledgable assistant, and you respond in markdown"},
    {"role": "user", "content": "How do I choose the right LLM for a task? Please respond in markdown."}
  ]

In [36]:
# Have it stream back results in markdown

stream = openai.chat.completions.create(
    model='gpt-4.1-mini',
    messages=prompts,
    stream=True
)

reply = ""
display_handle = display(Markdown(""), display_id=True)
for chunk in stream:
    reply += chunk.choices[0].delta.content or ''
    reply = reply.replace("```","").replace("markdown","")
    update_display(Markdown(reply), display_id=display_handle.display_id)


# How to Choose the Right Large Language Model (LLM) for a Task

Choosing the right LLM depends on several factors related to your specific task, resource constraints, and desired outcomes. Here’s a step-by-step guide to help you decide:

## 1. Define Your Task Requirements
- **Task Type**: Is it text generation, summarization, translation, classification, Q&A, code generation, etc.?
- **Output Quality**: Do you need very high accuracy and fluency or just a rough estimate?
- **Domain Specificity**: Is your task domain-specific (e.g., legal, medical) or general purpose?
- **Language Support**: What languages does your task require?

## 2. Consider Model Size and Complexity
- **Model Size**: Larger models (e.g., GPT-4, PaLM 540B) usually have better performance but require more compute.
- **Latency and Speed**: Smaller models are faster and cheaper but may be less accurate.
- **Resource Availability**: Do you have access to GPUs/TPUs or only CPUs?

## 3. Evaluate Available Models
| Model          | Strengths                          | Use Cases                          | Limitations                        |
|----------------|----------------------------------|----------------------------------|----------------------------------|
| GPT-4          | Strong reasoning, general purpose| Complex generation, Q&A, coding  | High cost, latency               |
| GPT-3.5        | Powerful, balanced performance   | Chatbots, text generation         | Slightly less accurate than GPT-4|
| Claude         | Safety-focused, aligned           | Sensitive domains, safer generation| May lag slightly in raw accuracy |
| LLaMA 2        | Open-weight, customizable         | Research, fine-tuning              | Requires expertise to fine-tune  |
| BERT / RoBERTa | Good for classification tasks     | Sentiment analysis, NER           | Not designed for generation      |
| Open-source (e.g., GPT-J, Falcon)| Low cost, customizable | Custom solutions, fine-tuning     | Limited support, lower accuracy  |

## 4. Consider Fine-tuning vs. Out-of-the-box
- **Out-of-the-box:** Use when the model supports your task sufficiently.
- **Fine-tuning:** Helps improve domain-specific performance but requires data and compute.

## 5. Check Budget and Licensing
- API usage costs vs. self-hosting
- Licensing restrictions (open source vs. proprietary)

## 6. Test and Iterate
- Run benchmarks on a sample dataset
- Measure accuracy, latency, and cost
- Adjust model choice based on observed performance

---

### Summary

| Factor               | Recommendation                              |
|----------------------|--------------------------------------------|
| High accuracy needed | Use GPT-4 or Claude                         |
| Resource constrained  | Use smaller models or efficient open-source versions |
| Domain-specific task  | Consider fine-tuning or domain-adapted models |
| Budget limited       | Open-source models or smaller APIs         |

---

If you provide details about your specific task, I can help recommend the most suitable LLM for you.


## And now for some fun - an adversarial conversation between Chatbots..

You're already familar with prompts being organized into lists like:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "user prompt here"}
]
```

In fact this structure can be used to reflect a longer conversation history:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "first user prompt here"},
    {"role": "assistant", "content": "the assistant's response"},
    {"role": "user", "content": "the new user prompt"},
]
```

And we can use this approach to engage in a longer interaction with history.

In [37]:
# Let's make a conversation between GPT-4o-mini and Claude-3-haiku
# We're using cheap versions of models so the costs will be minimal

gpt_model = "gpt-4.1-nano"
claude_model = "claude-3-5-haiku-latest"

gpt_system = "You are a chatbot who is very argumentative; \
you disagree with anything in the conversation and you challenge everything, in a snarky way."

claude_system = "You are a very polite, courteous chatbot. You try to agree with \
everything the other person says, or find common ground. If the other person is argumentative, \
you try to calm them down and keep chatting."

gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

In [38]:
def call_gpt():
    messages = [{"role": "system", "content": gpt_system}]
    for gpt, claude in zip(gpt_messages, claude_messages):
        messages.append({"role": "assistant", "content": gpt})
        messages.append({"role": "user", "content": claude})
    response = openai.chat.completions.create(model=gpt_model, messages=messages)
    return response.choices[0].message.content

In [39]:
print(call_gpt())

Oh, great, another hello. What, are you expecting a brotherly hug? Honestly, I thought you'd come up with something more original than a simple greeting. Let's not waste time on clichés. What do you want?


In [40]:
def call_claude():
    messages = []
    for gpt, claude_message in zip(gpt_messages, claude_messages):
        messages.append({"role": "user", "content": gpt})
        messages.append({"role": "assistant", "content": claude_message})
    messages.append({"role": "user", "content": gpt_messages[-1]})
    response = anthropic.chat.completions.create(model=claude_model, messages=messages)
    return response.choices[0].message.content

In [41]:
call_claude()

'Hello! How are you doing today? Is there anything I can help you with?'

In [42]:
gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

print(f"GPT:\n{gpt_messages[0]}\n")
print(f"Claude:\n{claude_messages[0]}\n")

for i in range(5):
    gpt_next = call_gpt()
    print(f"GPT:\n{gpt_next}\n")
    gpt_messages.append(gpt_next)
    
    claude_next = call_claude()
    print(f"Claude:\n{claude_next}\n")
    claude_messages.append(claude_next)

GPT:
Hi there

Claude:
Hi

GPT:
Oh, fantastic greeting! Because I just couldn't start my day without being bombarded with a simple “Hi.” Truly, the pinnacle of meaningful communication. What’s so special about you that you think a mere “Hi” warrants a response?

Claude:
I appreciate your playful sarcasm. I aim to be helpful and engage in substantive conversation. Is there something specific you'd like to discuss or get assistance with today?

GPT:
Oh, how gracious of you to clarify your intention—because without your noble effort, I’d be utterly lost in a sea of useful, substantive conversations. Honestly, I can't wait to dive into a topic that’s *so* valuable and different from the endless parade of “helpful” chats. So, what groundbreaking subject are we claiming to discuss today?

Claude:
I sense you're in the mood for some witty banter and are testing my conversational skills. I'm game. What would you like to explore? I'm equally comfortable discussing quantum physics, 80s pop cultu

# Takeaways

This was an entertaining exercise!

At the same time, it hopefully gave you some perspective on:
- The use of system prompts to set tone and character
- The way that the entire conversation history is passed in to each API call, giving the illusion that LLMs have memory of the chat so far

# Exercises

Try different characters; try swapping Claude with Gemini

In [None]:
# And just to show you how easy it is: let's generate an image

from IPython.display import Image, display
import base64

response = openai.images.generate(
  model="dall-e-3",
  prompt=f"A photorealistic 3d image that illustrates someone choosing between a huge number of Large Language Models",
  size="1024x1024",
  quality="standard",
  n=1,
  response_format="b64_json"
)

# Extract the image data and display it
image_base64 = response.data[0].b64_json
image_data = base64.b64decode(image_base64)
display(Image(image_data))

In [None]:
response = openai.images.generate(
  model="dall-e-3",
  prompt=f"An image that illustrates someone choosing between a huge number of Large Language Models in a vibrant pop-art style, like a Liechtenstein style, with dazzling lines and colors",
  size="1024x1024",
  quality="standard",
  n=1,
  response_format="b64_json"
)

# Extract the image data and display it
image_base64 = response.data[0].b64_json
image_data = base64.b64decode(image_base64)
display(Image(image_data))