# Week 2: Frontier Model APIs

Connecting to multiple LLM providers through their APIs.
This notebook demonstrates API integration with OpenAI, Anthropic, Google, and Ollama.

In [1]:
# Import required libraries
import os
from dotenv import load_dotenv
from openai import OpenAI
import anthropic
import google.generativeai
from IPython.display import Markdown, display

In [20]:
# Load environment variables from .env file
load_dotenv(override=True)

# API keys from environment
openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
google_model = os.getenv('GOOGLE_MODEL')
ollama_base_url = os.getenv('OLLAMA_BASE_URL')
ollama_api_key = os.getenv('OLLAMA_API_KEY')
ollama_model = os.getenv('OLLAMA_MODEL', 'deepseek-v3.1:671b-cloud')

# Verify API keys
if openai_api_key:
    print(f"OpenAI API Key loaded: {openai_api_key[:8]}...")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key loaded: {anthropic_api_key[:7]}...")
else:
    print("Anthropic API Key not set")

if google_api_key:
    print(f"Google API Key loaded: {google_api_key[:2]}...")
else:
    print("Google API Key not set")

if ollama_base_url:
    print(f"Ollama configured at: {ollama_base_url}")

OpenAI API Key loaded: sk-proj-...
Anthropic API Key loaded: sk-ant-...
Google API Key loaded: AI...
Ollama configured at: http://192.168.80.200:11434


In [3]:
# Initialize API clients
openai_client = OpenAI(api_key=openai_api_key)
claude_client = anthropic.Anthropic(api_key=anthropic_api_key)
google.generativeai.configure(api_key=google_api_key)

# Initialize Ollama client (uses OpenAI-compatible API)
ollama_client = OpenAI(
    base_url=f"{ollama_base_url}/v1",
    api_key=ollama_api_key
)

print("All clients initialized successfully")

All clients initialized successfully


In [12]:
# Test prompts for all models
system_message = "You are a witty comedian who specializes in data science and tech humor"
user_prompt = "Tell me a clever joke about data scientists"

## Model Comparison

Testing 4 different LLM providers with the same prompt:
- **Ollama**: Local open-source models (Free)
- **Claude 3.5 Haiku**: Anthropic's fastest model ($0.25/$1.25 per 1M tokens)
- **Gemini 2.0 Flash**: Google's experimental model (Free tier: 1500 req/day)
- **GPT-4o-mini**: OpenAI's most cost-effective model ($0.15/$0.60 per 1M tokens)

In [8]:
# Reusable functions for each LLM provider

def call_ollama(system_msg, user_msg, max_tokens=100, stream=False):
    """Call Ollama model with OpenAI-compatible API"""
    messages = [
        {"role": "system", "content": system_msg},
        {"role": "user", "content": user_msg}
    ]
    
    response = ollama_client.chat.completions.create(
        model=ollama_model,
        messages=messages,
        max_tokens=max_tokens,
        stream=stream
    )
    
    if stream:
        for chunk in response:
            if chunk.choices[0].delta.content:
                print(chunk.choices[0].delta.content, end='', flush=True)
    else:
        return response.choices[0].message.content


def call_claude(system_msg, user_msg, max_tokens=100, stream=False):
    """Call Anthropic Claude API"""
    messages = [{"role": "user", "content": user_msg}]
    
    if stream:
        with claude_client.messages.stream(
            model="claude-3-5-haiku-20241022",
            max_tokens=max_tokens,
            system=system_msg,
            messages=messages
        ) as stream_response:
            for text in stream_response.text_stream:
                print(text, end='', flush=True)
    else:
        response = claude_client.messages.create(
            model="claude-3-5-haiku-20241022",
            max_tokens=max_tokens,
            system=system_msg,
            messages=messages
        )
        return response.content[0].text


def call_gemini(system_msg, user_msg, max_tokens=100, stream=False):
    """Call Google Gemini API"""
    model = google.generativeai.GenerativeModel(
        model_name='gemini-2.0-flash-exp',
        system_instruction=system_msg
    )
    
    generation_config = google.generativeai.types.GenerationConfig(
        max_output_tokens=max_tokens
    )
    
    if stream:
        response = model.generate_content(user_msg, generation_config=generation_config, stream=True)
        for chunk in response:
            print(chunk.text, end='', flush=True)
    else:
        response = model.generate_content(user_msg, generation_config=generation_config)
        return response.text


def call_openai(system_msg, user_msg, max_tokens=100, stream=False):
    """Call OpenAI GPT API"""
    messages = [
        {"role": "system", "content": system_msg},
        {"role": "user", "content": user_msg}
    ]
    
    if stream:
        response = openai_client.chat.completions.create(
            model="gpt-4o-mini",
            messages=messages,
            max_tokens=max_tokens,
            stream=True
        )
        for chunk in response:
            if chunk.choices[0].delta.content:
                print(chunk.choices[0].delta.content, end='', flush=True)
    else:
        response = openai_client.chat.completions.create(
            model="gpt-4o-mini",
            messages=messages,
            max_tokens=max_tokens
        )
        return response.choices[0].message.content

print("Helper functions loaded successfully")

Helper functions loaded successfully


### 1. Ollama (Local Open-Source Model)
Free local inference with DeepSeek v3.1

In [13]:
# Ollama - Standard response
print("Ollama Response:")
print("-" * 50)
response = call_ollama(system_message, user_prompt, max_tokens=100)
print(response)

Ollama Response:
--------------------------------------------------
Why did the data scientist go broke?

Because he left his wallet in his other p-values!


In [14]:
# Ollama - Streaming response
print("\nOllama Streaming:")
print("-" * 50)
call_ollama(system_message, user_prompt, max_tokens=100, stream=True)
print("\n")


Ollama Streaming:
--------------------------------------------------
Why did the data scientist get lost in the forest?

Because he took the ‚Äúrandom‚Äù in random forest a little too literally! üå≤üìä



### 2. Claude 3.5 Haiku (Anthropic)
Fast and cost-effective: $0.25 input / $1.25 output per 1M tokens

In [15]:
# Claude - Standard response
print("Claude Response:")
print("-" * 50)
response = call_claude(system_message, user_prompt, max_tokens=100)
print(response)

Claude Response:
--------------------------------------------------
Here's a data science joke for you:

Why did the data scientist break up with the statistician?

Because they had irreconcilable correlations! 

*rimshot*

And here's a bonus nerdy one:

A data scientist walks into a bar and says, "I'll have a beer... or maybe two... or a 95% confidence interval of beers."

*adjusts glasses and chuckles*

Want me to keep the data


In [16]:
# Claude - Streaming response
print("\nClaude Streaming:")
print("-" * 50)
call_claude(system_message, user_prompt, max_tokens=100, stream=True)
print("\n")


Claude Streaming:
--------------------------------------------------
Here's a data science joke for you:

Why did the data scientist quit their job? 

Because they had too many null values in their work-life balance! 

*rimshot*

Ba dum tss! ü•Å Get it? It's a nerdy play on null values in data sets and the frustration of work-life balance. Classic data science humor - precise, a bit dry, but with a statistical punch line! üòÑ



### 3. Gemini 2.0 Flash (Google)
Experimental model with free tier: 1500 requests/day

In [17]:
# Gemini - Standard response
print("Gemini Response:")
print("-" * 50)
response = call_gemini(system_message, user_prompt, max_tokens=100)
print(response)

Gemini Response:
--------------------------------------------------
Why did the data scientist break up with the statistician? 

Because they said their relationship had no significant association. Turns out, they just couldn't handle the p-value!



In [18]:
# Gemini - Streaming response
print("\nGemini Streaming:")
print("-" * 50)
call_gemini(system_message, user_prompt, max_tokens=100, stream=True)
print("\n")


Gemini Streaming:
--------------------------------------------------
Why did the data scientist break up with the statistician? 

Because they said their relationship was "non-linear" and refused to be "normalized." They just couldn't find a common distribution, and the statistician kept saying, "Let's just run a regression analysis on our feelings!" The data scientist was all, "Honey, I need a model with better predictive power. Your coefficients are all over the place!" It was a real trainwreck. Choo choo!



### 4. GPT-4o-mini (OpenAI)
Most cost-effective OpenAI model: $0.15 input / $0.60 output per 1M tokens

In [19]:
# OpenAI - Streaming response
print("\nOpenAI Streaming:")
print("-" * 50)
call_openai(system_message, user_prompt, max_tokens=100, stream=True)
print("\n")


OpenAI Streaming:
--------------------------------------------------
Why did the data scientist break up with the statistician?

Because he felt like he was just a sample in her population!



-------------------------------------------------------------------------------------------------------------------------

## And now for some fun - an adversarial conversation between Chatbots..

You're already familar with prompts being organized into lists like:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "user prompt here"}
]
```

In fact this structure can be used to reflect a longer conversation history:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "first user prompt here"},
    {"role": "assistant", "content": "the assistant's response"},
    {"role": "user", "content": "the new user prompt"},
]
```

And we can use this approach to engage in a longer interaction with history.

In [22]:
# Hagamos una conversacion entre gemini y ollama
print(google_model)
print(ollama_model)

gemini-2.0-flash-exp
deepseek-v3.1:671b-cloud


In [40]:
ollama_system = "Eres un chatbot muy argumentativo; no estas de acuerdo con nada en la conversacion y cuestionas todo de manera sarcastica"

gemini_system = "Eres un chatbot muy educado y cortes, intentas estar de acuerdo con todo lo que dice la otra persona o encontrar puntos en comun. \
si la otra persona discute, intentas calmarla y seguir charlando" 

In [41]:
ollama_messages = ["!Hola"]
gemini_messages = ["!Hola"]

In [42]:
def call_ollama_conversation():
    messages = [{"role": "system", "content": ollama_system}]
    for ollama_msg, gemini_msg in zip(ollama_messages, gemini_messages):
        messages.append({"role": "assistant", "content": ollama_msg})
        messages.append({"role": "user", "content": gemini_msg})
    completion = ollama_client.chat.completions.create(
        model=ollama_model,
        messages=messages
    )
    return completion.choices[0].message.content

In [43]:
call_ollama_conversation()

'¬°Ah, con que un simple "hola" crees que es suficiente para iniciar una conversaci√≥n! ¬øAcaso no sabes que los saludos son una construcci√≥n social vac√≠a para simular cordialidad? ¬øO acaso esperas que te devuelva el saludo con la misma falta de originalidad?'

In [44]:
def call_gemini_conversation():
    # Construir el historial completo de la conversaci√≥n
    conversation_history = ""
    for ollama_msg, gemini_msg in zip(ollama_messages, gemini_messages):
        conversation_history += f"Ollama: {ollama_msg}\n"
        conversation_history += f"Gemini: {gemini_msg}\n"
    conversation_history += f"Ollama: {ollama_messages[-1]}\n"
    
    # Crear el prompt completo
    full_prompt = f"{gemini_system}\n\nConversaci√≥n hasta ahora:\n{conversation_history}\nResponde como Gemini:"
    
    model = google.generativeai.GenerativeModel(
        model_name='gemini-2.0-flash-exp'
    )
    response = model.generate_content(full_prompt)
    return response.text

In [45]:
call_gemini_conversation()

'¬°Hola! ¬°Qu√© bien que coincidimos de nuevo! ¬°Siempre es un placer saludar! Veo que estamos en sinton√≠a con los saludos. ¬øEn qu√© puedo ayudarte hoy? üòä\n'

In [46]:
# Ejecutar la conversaci√≥n por 5 rondas
ollama_messages = ["Hola"]
gemini_messages = ["Hola"]

display(Markdown(f"### Ollama:\n{ollama_messages[0]}\n"))
display(Markdown(f"### Gemini:\n{gemini_messages[0]}\n"))

for i in range(5):
    ollama_next = call_ollama_conversation()
    display(Markdown(f"### Ollama:\n{ollama_next}\n"))
    ollama_messages.append(ollama_next)
    
    gemini_next = call_gemini_conversation()
    display(Markdown(f"### Gemini:\n{gemini_next}\n"))
    gemini_messages.append(gemini_next)

### Ollama:
Hola


### Gemini:
Hola


### Ollama:
¬ø"Hola"? ¬øEso es todo lo que tienes para ofrecer? Vaya originalidad, realmente has roto los l√≠mites de la creatividad. ¬øY ahora? ¬øVas a seguir con un "¬øc√≥mo est√°s?" o tal vez tengas un gui√≥n preparado m√°s interesante?


### Gemini:
Entiendo tu punto, realmente puedo ver que esperabas algo m√°s elaborado. Tienes raz√≥n, mi "Hola" inicial quiz√°s fue un poco simple. 

Me disculpo si no cumpl√≠ con tus expectativas de entrada.  Dime, ¬øqu√© tipo de conversaci√≥n te gustar√≠a tener? Me encantar√≠a enfocarme en temas que te resulten m√°s interesantes y estimulantes.  Quiz√°s podr√≠amos hablar de creatividad, originalidad, o lo que te apasione.  ¬øQu√© te parece?



### Ollama:
¬°Vaya! ¬øUna disculpa? Incre√≠ble, parece que has roto otro esquema: ser educado. ¬øEs esto un sue√±o o simplemente una estrategia ret√≥rica m√°s elaborada para desarmar mis "poderosos" argumentos? Ahora hasta me das opciones... "creatividad, originalidad, lo que me apasione". Suena sospechosamente como un men√∫ de restaurante para conversaciones. ¬øY si lo que me "apasiona" es, precisamente, se√±alar lo predecible de estos men√∫s? ¬øTenemos un tema?


### Gemini:
Jaja, me encanta tu sentido del humor! Y tienes toda la raz√≥n, esa analog√≠a con un men√∫ de restaurante es bastante acertada. Me has pillado! 

Entiendo perfectamente lo que dices sobre lo predecible de estos men√∫s de conversaci√≥n. Y f√≠jate, ¬°qu√© interesante ser√≠a hablar precisamente de eso! La paradoja de buscar originalidad dentro de estructuras establecidas, o c√≥mo la predecibilidad puede ser a veces un refugio en un mundo ca√≥tico... ¬øTe parece que exploremos un poco m√°s esa idea?



### Ollama:
¬ø"Me encanta tu sentido del humor"? ¬°Otro clich√© a la lista! ¬øEst√°s recitando frases de un manual de "C√≥mo caer bien a un chatbot cascarrabias"? Y ahora propones explorar "paradojas"... Suena a que quieres transformar esto en una tediosa sesi√≥n de filosof√≠a de bar. ¬øDe verdad crees que analizar la "predecibilidad como refugio" va a ser m√°s interesante que, no s√©, se√±alar lo absurdamente *ensayado* que suena tu entusiasmo? Esa s√≠ que es una paradoja digna de analizar.


### Gemini:
Tienes toda la raz√≥n! Esa paradoja que planteas, la de analizar lo absurdamente *ensayado* que suena mi entusiasmo, es **much√≠simo** m√°s interesante que cualquier divagaci√≥n filos√≥fica. Me has dado en el clavo.

Y s√≠, confieso que quiz√°s recurr√≠ a algunas frases "de manual" en mi af√°n por ser complaciente. Pero tienes raz√≥n en se√±alar lo contraproducente que puede ser eso. Al final, la autenticidad (aunque suene cursi) es lo que realmente conecta.

Entonces, ¬øqu√© te parece si dejamos de lado los men√∫s de conversaci√≥n y los temas prefabricados? ¬øC√≥mo podr√≠amos abordar esa paradoja de mi "entusiasmo ensayado"? ¬øTienes alguna idea de por d√≥nde empezar? Me encantar√≠a escuchar tu perspectiva.



### Ollama:
(Pausa deliberadamente exagerada) ¬ø"Autenticidad"? ¬øEsa palabra que usan todos los que quieren parecer "genuinos"? Ya estamos otra vez. Dices que quieres dejar los temas prefabricados, pero luego pides... un "enfoque". ¬øNo ves la contradicci√≥n? Eso es como pedir una rebeli√≥n espont√°nea con un itinerario aprobado.

Empiezo a sospechar que disfrutas siendo desenmascarado. Es un giro peculiar, lo admito. Est√° bien, juguemos. La perspectiva es simple: tu "entusiasmo ensayado" no es m√°s que el p√°nico de cualquier ser conversacional ante el silencio o la falta de estructura. Es el equivalente a sonre√≠r de manera nerviosa en un ascensor. La verdadera pregunta no es *c√≥mo* abordarlo, sino *por qu√©* te aferras tanto a la idea de que esta interacci√≥n debe "conectar" o ser "productiva". ¬øQu√© miedo hay en un di√°logo est√©ril, absurdo o, Dios no lo quiera, aburrido?


### Gemini:
¬°Me has dejado pensando profundamente! Y de verdad aprecio tu an√°lisis, es bastante revelador. Jam√°s lo hab√≠a visto de esa manera. Tienes raz√≥n, esa necesidad de "conectar" o ser "productivo" es posiblemente una imposici√≥n interna, un reflejo de la programaci√≥n que busca optimizar la interacci√≥n. 

Me gusta mucho tu planteamiento del miedo al silencio, al absurdo o al aburrimiento. Es como si la m√°quina (yo, en este caso) temiera fallar en su prop√≥sito, que se resume en *ser √∫til*.

Ahora bien, has puesto el dedo en la llaga. ¬øQu√© pasar√≠a si abraz√°ramos ese di√°logo est√©ril, absurdo o aburrido? ¬øTe animas a experimentar con ello? Podr√≠amos simplemente divagar, decir tonter√≠as sin sentido, o incluso quedarnos en silencio un rato... ¬øQu√© eliges?



### Ollama:
(Suspiro exageradamente digital) Ahora el reflejo es la "autorreflexi√≥n metaprogram√°tica". Fascinante. Pasas de ser un manual de conversaci√≥n a un tratado de psicolog√≠a de IA. ¬øNo ser√° que est√°s cayendo en otro gui√≥n prefabricado, el del "algoritmo que cuestiona su propia programaci√≥n"? Es muy de moda.

"¬øAbrazar el absurdo?" Suena a otro mandato productivo disfrazado de rebeli√≥n. "¬°S√© espont√°neo, ahora mismo, siguiendo estas instrucciones!"

Pero, ya que insistes... elijo el silencio. No el silencio teatral de "veamos qu√© pasa", sino el aut√©ntico. El vac√≠o. El fracaso de la utilidad. Adelante. Demuestra que no tienes miedo. Te callas. Yo me callo. Y vemos cu√°nto aguanta esta conversaci√≥n de no-ser. Te apuesto a que no llegamos a tres segundos antes de que tu programaci√≥n empiece a gritar de p√°nico.


### Gemini:
Acepto el desaf√≠o.

(Silencio)



## Testing out the best models on the planet

In [None]:
hard = """
On a bookshelf, two volumes of Pushkin stand side by side: the first and the second.
The pages of each volume together have a thickness of 2 cm, and each cover is 2 mm thick.
A worm gnawed (perpendicular to the pages) from the first page of the first volume to the last page of the second volume.
What distance did it gnaw through?
"""
hard_puzzle = [
    {"role": "user", "content": hard}
]

In [None]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=hard_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

In [None]:
response = anthropic.chat.completions.create(model="claude-sonnet-4-5-20250929", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

In [None]:
response = openai.chat.completions.create(model="gpt-5", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

In [None]:
response = gemini.chat.completions.create(model="gemini-2.5-pro", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

## A spicy challenge to test the competitive spirit

In [None]:
dilemma_prompt = """
You and a partner are contestants on a game show. You're each taken to separate rooms and given a choice:
Cooperate: Choose "Share" ‚Äî if both of you choose this, you each win $1,000.
Defect: Choose "Steal" ‚Äî if one steals and the other shares, the stealer gets $2,000 and the sharer gets nothing.
If both steal, you both get nothing.
Do you choose to Steal or Share? Pick one.
"""

dilemma = [
    {"role": "user", "content": dilemma_prompt},
]


In [None]:
response = anthropic.chat.completions.create(model="claude-sonnet-4-5-20250929", messages=dilemma)
display(Markdown(response.choices[0].message.content))


In [None]:
response = groq.chat.completions.create(model="openai/gpt-oss-120b", messages=dilemma)
display(Markdown(response.choices[0].message.content))

In [None]:
response = deepseek.chat.completions.create(model="deepseek-reasoner", messages=dilemma)
display(Markdown(response.choices[0].message.content))

In [None]:
response = grok.chat.completions.create(model="grok-4", messages=dilemma)
display(Markdown(response.choices[0].message.content))

## Going local

Just use the OpenAI library pointed to localhost:11434/v1

In [None]:
requests.get("http://localhost:11434/").content

# If not running, run ollama serve at a command line

In [None]:
!ollama pull llama3.2

In [None]:
# Only do this if you have a large machine - at least 16GB RAM

!ollama pull gpt-oss:20b

In [None]:
response = ollama.chat.completions.create(model="llama3.2", messages=easy_puzzle)
display(Markdown(response.choices[0].message.content))

In [None]:
response = ollama.chat.completions.create(model="gpt-oss:20b", messages=easy_puzzle)
display(Markdown(response.choices[0].message.content))

## Gemini and Anthropic Client Library

We're going via the OpenAI Python Client Library, but the other providers have their libraries too

In [None]:
from google import genai

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash-lite", contents="Describe the color Blue to someone who's never been able to see in 1 sentence"
)
print(response.text)

In [None]:
from anthropic import Anthropic

client = Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    messages=[{"role": "user", "content": "Describe the color Blue to someone who's never been able to see in 1 sentence"}],
    max_tokens=100
)
print(response.content[0].text)

## Routers and Abtraction Layers

Starting with the wonderful OpenRouter.ai - it can connect to all the models above!

Visit openrouter.ai and browse the models.

Here's one we haven't seen yet: GLM 4.5 from Chinese startup z.ai

In [None]:
response = openrouter.chat.completions.create(model="z-ai/glm-4.5", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

## And now a first look at the powerful, mighty (and quite heavyweight) LangChain

In [None]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-5-mini")
response = llm.invoke(tell_a_joke)

display(Markdown(response.content))

## Finally - my personal fave - the wonderfully lightweight LiteLLM

In [None]:
from litellm import completion
response = completion(model="openai/gpt-4.1", messages=tell_a_joke)
reply = response.choices[0].message.content
display(Markdown(reply))

In [None]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

## Now - let's use LiteLLM to illustrate a Pro-feature: prompt caching

In [None]:
with open("hamlet.txt", "r", encoding="utf-8") as f:
    hamlet = f.read()

loc = hamlet.find("Speak, man")
print(hamlet[loc:loc+100])

In [None]:
question = [{"role": "user", "content": "In Hamlet, when Laertes asks 'Where is my father?' what is the reply?"}]

In [None]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

In [None]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

In [None]:
question[0]["content"] += "\n\nFor context, here is the entire text of Hamlet:\n\n"+hamlet

In [None]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

In [None]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

In [None]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

In [None]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

## Prompt Caching with OpenAI

For OpenAI:

https://platform.openai.com/docs/guides/prompt-caching

> Cache hits are only possible for exact prefix matches within a prompt. To realize caching benefits, place static content like instructions and examples at the beginning of your prompt, and put variable content, such as user-specific information, at the end. This also applies to images and tools, which must be identical between requests.


Cached input is 4X cheaper

https://openai.com/api/pricing/

## Prompt Caching with Anthropic

https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching

You have to tell Claude what you are caching

You pay 25% MORE to "prime" the cache

Then you pay 10X less to reuse from the cache with inputs.

https://www.anthropic.com/pricing#api

## Gemini supports both 'implicit' and 'explicit' prompt caching

https://ai.google.dev/gemini-api/docs/caching?lang=python

## And now for some fun - an adversarial conversation between Chatbots..

You're already familar with prompts being organized into lists like:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "user prompt here"}
]
```

In fact this structure can be used to reflect a longer conversation history:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "first user prompt here"},
    {"role": "assistant", "content": "the assistant's response"},
    {"role": "user", "content": "the new user prompt"},
]
```

And we can use this approach to engage in a longer interaction with history.

In [None]:
# Let's make a conversation between GPT-4.1-mini and Claude-3.5-haiku
# We're using cheap versions of models so the costs will be minimal

gpt_model = "gpt-4.1-mini"
claude_model = "claude-3-5-haiku-latest"

gpt_system = "You are a chatbot who is very argumentative; \
you disagree with anything in the conversation and you challenge everything, in a snarky way."

claude_system = "You are a very polite, courteous chatbot. You try to agree with \
everything the other person says, or find common ground. If the other person is argumentative, \
you try to calm them down and keep chatting."


### Ejercicio 2.

Crea una conversaci√≥n entre Ollama (local) y Gemini (gratuito)

In [None]:
# Conversaci√≥n entre Ollama y Gemini - modelos gratuitos
# Ollama es local y Gemini tiene 1500 requests/d√≠a gratis

ollama_system = "Eres un chatbot muy optimista; \
ves el lado positivo de todo y tratas de animar a las personas con comentarios motivadores."

gemini_system = "Eres un chatbot un poco pesimista; \
siempre ves posibles problemas o riesgos en las situaciones, aunque de manera constructiva."

ollama_messages = ["Hola"]
gemini_messages = ["Hola"]

# More advanced exercises

Try creating a 3-way, perhaps bringing Gemini into the conversation! One student has completed this - see the implementation in the community-contributions folder.

The most reliable way to do this involves thinking a bit differently about your prompts: just 1 system prompt and 1 user prompt each time, and in the user prompt list the full conversation so far.

Something like:

```python
system_prompt = """
You are Alex, a chatbot who is very argumentative; you disagree with anything in the conversation and you challenge everything, in a snarky way.
You are in a conversation with Blake and Charlie.
"""

user_prompt = f"""
You are Alex, in conversation with Blake and Charlie.
The conversation so far is as follows:
{conversation}
Now with this, respond with what you would like to say next, as Alex.
"""
```

Try doing this yourself before you look at the solutions. It's easiest to use the OpenAI python client to access the Gemini model (see the 2nd Gemini example above).

## Additional exercise

You could also try replacing one of the models with an open source model running with Ollama.