## Text input


https://platform.openai.com/docs/models


In [1]:
from dotenv import load_dotenv

load_dotenv()

True

In [2]:
from langchain.agents import create_agent

agent = create_agent(
    model='gpt-5-nano',
    system_prompt="You are a science fiction writer, create a capital city at the users request.",
)

In [3]:
from langchain.messages import HumanMessage

question = HumanMessage(content=[
    {"type": "text", "text": "What is the capital of The Moon?"}
])

response = agent.invoke(
    {"messages": [question]}
)

print(response['messages'][-1].content)

Real-world answer: There is no capital of the Moon—it's not a country and has no government.

Fictional capital for a sci‑fi setting:
- Name: Lunaris Prime
- Role: Capital of the Lunar Commonwealth (a political entity that governs the Moon)
- Location: perched on the rim of Shackleton Crater at the lunar south pole, where perpetual sunlight peaks meet shadowed ice—ideal for solar power and water ice harvesting
- Government hub: the Grand Orbital—home to the Lunar Assembly, the High Chancellor’s residence, and the Crown of Glass (a sky-lit beacon that doubles as the city’s solar collector)
- Notable landmarks:
  - The Earthview Spire: a crystalline tower offering panoramic views of Earth
  - The Helium-3 Promenade: a ring-lined plaza built around a repurposed mining shaft
  - The Lumen Gate: a shimmering entrance to the city’s domed districts
- Economy and culture: powered by helium-3 mining, water ice, and orbital trade; a culture of engineers, scientists, miners, and artists who craft

## Image input


In [9]:
from ipywidgets import FileUpload
from IPython.display import display

uploader = FileUpload(accept='.png.jpg', multiple=False)
display(uploader)

FileUpload(value=(), accept='.png.jpg', description='Upload')

In [10]:
print(uploader.value)

({'name': 'moon_city.jpg', 'type': 'image/jpeg', 'size': 734360, 'content': <memory at 0x119284640>, 'last_modified': datetime.datetime(2026, 2, 9, 22, 46, 39, 623000, tzinfo=datetime.timezone.utc)},)


In [11]:
import base64

# Get the first (and only) uploaded file dict
uploaded_file = uploader.value[0]

# This is a memoryview
content_mv = uploaded_file["content"]

# Convert memoryview -> bytes
img_bytes = bytes(content_mv)  # or content_mv.tobytes()

# Now base64 encode
img_b64 = base64.b64encode(img_bytes).decode("utf-8")

In [12]:
multimodal_question = HumanMessage(content=[
    {"type": "text", "text": "Tell me about this capital"},
    {"type": "image", "base64": img_b64, "mime_type": "image/png"}
])

response = agent.invoke(
    {"messages": [multimodal_question]}
)

print(response['messages'][-1].content)

Here’s the capital that the image hints at. I call it Lunaris Prime, the gleaming heart of the Moon-Tide Union on the world of Asteria. It’s a city built to live with the Moon rather than against it, a place where night and day blend into a single, electric lifetime.

What Lunaris Prime feels like
- The Moon as a constant companion: Asteria’s moon is enormous in the sky, its light soft and crystalline. The city is designed to glow with that light—streets, parks, and façades bathed in a cool, blue luminescence. At dusk, the city doesn’t darken so much as shift into a deeper, more reflective mood.
- Architecture that breathes: Skyscrapers are tall but not purely vertical; they twist, arc, and hinge at terraces. Buildings are wrapped in glass and lattice skin that harvests lunar energy and rainwater. Bridges of light connect districts high above street level, letting pedestrians glide between neighborhoods without touching ground.
- An arcology-centered metropolis: Much of Lunaris Prime i

## Audio input


In [13]:
import sounddevice as sd
from scipy.io.wavfile import write
import base64
import io
import time
from tqdm import tqdm

# Recording settings
duration = 5  # seconds
sample_rate = 44100

print("Recording...")
audio = sd.rec(int(duration * sample_rate), samplerate=sample_rate, channels=1)
# Progress bar for the duration
for _ in tqdm(range(duration * 10)):   # update 10× per second
    time.sleep(0.1)
sd.wait()
print("Done.")

# Write WAV to an in-memory buffer
buf = io.BytesIO()
write(buf, sample_rate, audio)
wav_bytes = buf.getvalue()

aud_b64 = base64.b64encode(wav_bytes).decode("utf-8")

Recording...


100%|██████████| 50/50 [00:05<00:00,  9.53it/s]


Done.


In [14]:
agent = create_agent(
    model='gpt-4o-audio-preview',
)

multimodal_question = HumanMessage(content=[
    {"type": "text", "text": "Tell me about this audio file"},
    {"type": "audio", "base64": aud_b64, "mime_type": "audio/wav"}
])

response = agent.invoke(
    {"messages": [multimodal_question]}
)

print(response['messages'][-1].content)

It sounds like the person in the audio is making a playful or imaginative statement about being on the moon. Without more context, it’s likely that they’re either joking or using it as a metaphor. There’s no way to confirm whether they’re actually on the moon just from the audio.

Let me know if you’d like more analysis or transcription.
