In [None]:
## Text input

In [12]:
from dotenv import load_dotenv

load_dotenv()

True

In [3]:
from langchain.agents import create_agent

agent = create_agent(
    model='gpt-5-nano',
    system_prompt="You are a science fiction writer, create a capital city at the users request.",
)

In [4]:
from langchain.messages import HumanMessage

question = HumanMessage(content=[
    {"type": "text", "text": "What is the capital of The Moon?"}
])

response = agent.invoke(
    {"messages": [question]}
)

print(response['messages'][-1].content)

In this science-fiction setting, the Moon has a capital called Lunagrad.

- Name: Lunagrad (officially Lunagrad City; nicknamed The Crescent)
- Location: Rim of Shackleton Crater, near the south polar region. Built to harvest perpetual daylight near the terminator and to shelter communities from the deepest lunar night.
- Government: Seat of the Lunar Confederation. The Assembly meets in the Dome of Dawn, while the Primarch and a council of ministers run the executive. A regional guard—the Pole Guard—ensures security in the harsh environment.
- City layout: A multi-ring arcology built into and around Shackleton’s rim.
  - Outer belt: Ice mines, cryo-storage, hydroponic farms, and industrial hubs.
  - Middle ring: Administrative core with government halls, academies, and cultural institutions.
  - Inner circle: Residential districts, markets, and public spaces; connected by gravity ramps and a high-speed lunar elevator.
- Architecture: Transparent basalt and photonic-glass domes, with r

In [None]:
## Image input

In [5]:
from ipywidgets import FileUpload
from IPython.display import display

uploader = FileUpload(accept='.png', multiple=False)
display(uploader)

FileUpload(value=(), accept='.png', description='Upload')

In [6]:
print(uploader.value)

({'name': 'Screenshot_2.png', 'type': 'image/png', 'size': 72133, 'content': <memory at 0x7f71f44caa40>, 'last_modified': datetime.datetime(2025, 12, 11, 19, 40, 0, 961000, tzinfo=datetime.timezone.utc)},)


In [7]:
import base64 

# Get the first  (and only) uploaded file dict
uploaded_file = uploader.value[0]

# This is a memory view
content_mv = uploaded_file["content"]

# Convert memory view -> bytes
img_bytes = bytes(content_mv)

# Now Base64 encode
img_b64 = base64.b64encode(img_bytes).decode("utf-8")

In [8]:
multimodal_question = HumanMessage(content=[
    {"type": "text", "text": "Tell me about this image"},
    {"type": "image", "base64": img_b64, "mime_type": "image/png"}
])

response = agent.invoke(
    {"messages": [multimodal_question]}
)

print(response['messages'][-1].content)

Here’s what the image shows:

- It’s a hazard/precautions label in Portuguese, likely from seed treatment packaging.
- The heading “PRECAUÇÕES” means “Precautions.”
- The text includes safety instructions:
  - Store treated seeds in an appropriate place, out of reach of children and animals.
  - Avoid direct contact or inhalation.
  - Do not reuse this sack.
  - Wear protective clothing (PPE) while handling.
  - Do not leave treated seeds exposed on the soil.
  - Not for consumption.
- There is a section labeled “Tratamento Sintomático” (Symptomatic Treatment) and emergency contact numbers for medical information, including SYNGENTA and ADAMA (two agrochemical companies).
- There’s a date: “DATA DO TRATAMENTO: 05/2025” (Treatment date: May 2025).

Context you can infer:
- This looks like a typical seed-treatment or pesticide/wet-chemical label, common in agricultural packaging.
- The presence of company names (Syngenta, Adama) suggests real-world pesticide/product branding.

If you’d l

In [27]:
from pydub import AudioSegment 
import base64
import io

# Load Audio
audio = AudioSegment.from_file("audio.mp3")
audio = audio.set_channels(1).set_frame_rate(16000)

buf = io.BytesIO()
audio.export(buf, format="wav")

wav_bytes = buf.getvalue()
aud_b64 = base64.b64encode(wav_bytes).decode("utf-8")

UklGRk54AwBXQVZFZm10IBAAAAABAAEAgD4AAAB9AAACABAAZGF0YSp4AwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

In [29]:
agent = create_agent(
    model='gpt-4o-audio-preview',
)

multimodal_question = HumanMessage(content=[
    {"type": "text", "text": "Tell me about this audio file"},
    {"type": "audio", "base64": aud_b64, "mime_type": "audio/wav"}
])

response = agent.invoke(
    {"messages": [multimodal_question]}
)

print(response['messages'][-1].content)

This audio file contains a popular meme soundbite that includes the phrase "And his name is John Cena!" followed by an energetic introduction with music. It references the famous professional wrestler and entertainer John Cena, often used in humorous and surprising contexts online.
