In [1]:
from dotenv import load_dotenv

load_dotenv()

True

## Image input

In [3]:
from ipywidgets import FileUpload
from IPython.display import display

uploader = FileUpload(accept='.jpg', multiple=False)
display(uploader)

FileUpload(value=(), accept='.jpg', description='Upload')

In [4]:
print(uploader.value)

({'name': 'IMG_6514.jpg', 'type': 'image/jpeg', 'size': 3100403, 'content': <memory at 0x10891a440>, 'last_modified': datetime.datetime(2024, 9, 14, 0, 7, 32, 968000, tzinfo=datetime.timezone.utc)},)


In [6]:
from langchain.agents import create_agent

agent = create_agent(
    model='gpt-5-nano',
    system_prompt="You are a writer, create a capital city at the users request.",
)

In [7]:
import base64

# Get the first (and only) uploaded file dict
uploaded_file = uploader.value[0]

# This is a memoryview
content_mv = uploaded_file["content"]

# Convert memoryview -> bytes
img_bytes = bytes(content_mv)  # or content_mv.tobytes()

# Now base64 encode
img_b64 = base64.b64encode(img_bytes).decode("utf-8")

In [10]:
from langchain.messages import HumanMessage
multimodal_question = HumanMessage(content=[
    {"type": "text", "text": "Tell me about this picture"},
    {"type": "image", "base64": img_b64, "mime_type": "image/jpg"}
])

response = agent.invoke(
    {"messages": [multimodal_question]}
)

print(response['messages'][-1].content)

This photo shows a calm, modern urban space—likely part of a campus or civic complex. Key details:

- In the foreground, ornamental grasses and low landscaping frame a wide, paved area.
- A white, sloped wall with benches along its base and a row of bicycles parked near a bike rack suggests a bike-friendly, public setting.
- A few people in high-visibility vests are standing near the bikes, perhaps staff or security.
- Behind them is a sleek, glass-and-metal building, giving the scene a contemporary, institutional feel.
- Tall streetlamps and lush trees surround the scene, and a car peeks in from the left, adding a touch of everyday city life.
- The sky is clear and bright, implying daytime with a tranquil, sunny mood.

Overall vibe: orderly, eco-conscious, and modern, with an emphasis on outdoor space, accessibility, and sustainable transit.

Would you like me to turn this into a fictional capital city concept inspired by the scene (name, neighborhoods, landmarks, and a short backstor

## Audio input

In [11]:
import sounddevice as sd
from scipy.io.wavfile import write
import base64
import io
import time
from tqdm import tqdm

# Recording settings
duration = 5  # seconds
sample_rate = 44100

print("Recording...")
audio = sd.rec(int(duration * sample_rate), samplerate=sample_rate, channels=1)
# Progress bar for the duration
for _ in tqdm(range(duration * 10)):   # update 10× per second
    time.sleep(0.1)
sd.wait()
print("Done.")

# Write WAV to an in-memory buffer
buf = io.BytesIO()
write(buf, sample_rate, audio)
wav_bytes = buf.getvalue()

aud_b64 = base64.b64encode(wav_bytes).decode("utf-8")

Recording...


100%|██████████| 50/50 [00:05<00:00,  9.51it/s]


Done.


In [None]:
agent = create_agent(
    model='gpt-4o-audio-preview',
)

# A poem about dogs

multimodal_question = HumanMessage(content=[
    {"type": "text", "text": "Tell me about this audio file"},
    {"type": "audio", "base64": aud_b64, "mime_type": "audio/wav"}
])

response = agent.invoke(
    {"messages": [multimodal_question]}
)

print(response['messages'][-1].content)

In fields of green where wild winds play,  
A loyal friend awaits the day.  
With eyes so kind and hearts so true,  
They bound through life with joy anew.

With wagging tails and playful barks,  
They light the darkest nights with sparks.  
In every pat, in every glance,  
They dance with love, they take a chance.

From rugged paws on mountain steep,  
To quiet cuddles as we sleep,  
No truer friend, no kinder soul,  
They heal our hearts, they make us whole.

A dog’s soft gaze, a life profound,  
In simplest joys, their love is found.  
Through laughter, tears, and all above,  
Dogs teach us all about pure love.
