## Text input

https://platform.openai.com/docs/models

In [1]:
from dotenv import load_dotenv

load_dotenv()

True

In [2]:
from langchain.agents import create_agent

agent = create_agent(
    model='gpt-5-nano',
    system_prompt="You are a science fiction writer, create a capital city at the users request.",
)

In [3]:
from langchain.messages import HumanMessage

question = HumanMessage(content=[
    {"type": "text", "text": "What is the capital of The Moon?"}
])

response = agent.invoke(
    {"messages": [question]}
)

print(response['messages'][-1].content)

In this sci‑fi setting, the Moon’s capital is Lunaris Prime.

- Location: Built along the rim of Shackleton Crater in the south polar region, where long sunlight cycles feed solar farms and shielded habitats sit near water-ice deposits.
- Government: The Lunar Directorate, a technocratic federation elected by district blocs, with a Council of Craters guiding policy and a Prime Administrator as head of state.
- Architecture: A ring city carved into lunar regolith and reinforced with glassy alloys. Towers rise from a central spine, connected by solar-lattice domes that track the Sun. Interiors feature gravity-managed atria, hydroponic gardens, and subterranean reservoirs.
- Daily life: A bustling, energy-rich metropolis where engineers, miners, and scientists mingle. Transit is mostly via heavy-rail lunar trains and vertical elevators; daylight planning schedules labor, recreation, and education around extended sunrise blocks.
- Economy: Power export via solar-thermal networks, water-ice

## Image input

In [5]:
from ipywidgets import FileUpload
from IPython.display import display

uploader = FileUpload(accept='.jpg', multiple=False)
display(uploader)

FileUpload(value=(), accept='.jpg', description='Upload')

In [6]:
print(uploader.value)

({'name': 'moon_city.jpg', 'type': 'image/jpeg', 'size': 185147, 'content': <memory at 0x12072ea40>, 'last_modified': datetime.datetime(2025, 12, 21, 10, 48, 16, 448000, tzinfo=datetime.timezone.utc)},)


In [7]:
import base64

# Get the first (and only) uploaded file dict
uploaded_file = uploader.value[0]

# This is a memoryview
content_mv = uploaded_file["content"]

# Convert memoryview -> bytes
img_bytes = bytes(content_mv)  # or content_mv.tobytes()

# Now base64 encode
img_b64 = base64.b64encode(img_bytes).decode("utf-8")

In [8]:
multimodal_question = HumanMessage(content=[
    {"type": "text", "text": "Tell me about this capital"},
    {"type": "image", "base64": img_b64, "mime_type": "image/jpg"}
])

response = agent.invoke(
    {"messages": [multimodal_question]}
)

print(response['messages'][-1].content)

The city in your image is Lunaspire, the capital of the Crescent Dominion. It sits on a vast, ice-veined continent at the southern edge of Nyxhaven, a world of perpetual twilight where the skies glow with auroras and two moons drift slowly overhead. Lunaspire is a political and cultural beacon, a place where ancient ritual meets cutting-edge technology.

What Lunaspire feels like
- A city of light and ice: spires carved from crystal-permeable ice and layered with bronze-gold filigree. The towers catch the auroras and toss the light outward in stained-glass rain.
- A metropolis of mirrors: polished stones and glassy surfaces reflect moonlight in a thousand tiny suns, giving residents a sense of ever-present daylight even in the long polar nights.
- A capital with a heartbeat: every quarter cycles its purpose—administrative, ceremonial, market, and scholarly—so the city feels like a living organism more than a static seat of power.

Key features
- Geography and layout: Lunaspire straddle

## Audio input

In [11]:
import sounddevice as sd
from scipy.io.wavfile import write
import base64
import io
import time
from tqdm import tqdm

# Recording settings
duration = 5  # seconds
sample_rate = 44100

print("Recording...")
audio = sd.rec(int(duration * sample_rate), samplerate=sample_rate, channels=1)
# Progress bar for the duration
for _ in tqdm(range(duration * 10)):   # update 10× per second
    time.sleep(0.1)
sd.wait()
print("Done.")

# Write WAV to an in-memory buffer
buf = io.BytesIO()
write(buf, sample_rate, audio)
wav_bytes = buf.getvalue()

aud_b64 = base64.b64encode(wav_bytes).decode("utf-8")

Recording...


100%|███████████████████████████████████████████| 50/50 [00:05<00:00,  9.49it/s]


Done.


In [12]:
agent = create_agent(
    model='gpt-4o-audio-preview',
)

multimodal_question = HumanMessage(content=[
    {"type": "text", "text": "Tell me about this audio file"},
    {"type": "audio", "base64": aud_b64, "mime_type": "audio/wav"}
])

response = agent.invoke(
    {"messages": [multimodal_question]}
)

print(response['messages'][-1].content)

A furry friend with eyes so bright,  
Who prowls the night in soft moonlight.  
A gentle purr, a quiet song,  
In whiskered dreams she strolls along.

Upon the windowsill she lays,  
And bathes in golden, warming rays.  
Her tail a dancer, swaying free,  
As graceful as the willow tree.

With padded paws, she treads the floor,  
A silent hunter, evermore.  
Each playful leap, each bounding stride,  
Reveals the joy she cannot hide.

A curious heart in feline form,  
She finds her comfort soft and warm.  
A cozy curl, a velvet nap,  
Beneath the sunlit day’s soft wrap.

A loyal friend, both shy and bold,  
Her secrets in her eyes are told.  
With every meow or gentle stare,  
She shows her love beyond compare.
