## Text input

https://platform.openai.com/docs/models

In [1]:
from dotenv import load_dotenv

load_dotenv()

True

In [2]:
import os
from langchain.agents import create_agent
from langchain_openai import AzureChatOpenAI

model = AzureChatOpenAI(
    azure_deployment=os.getenv("AZURE_OPENAI_DEPLOYMENT"),
    api_version=os.getenv("AZURE_OPENAI_API_VERSION"),
)


agent = create_agent(
    model=model,
    system_prompt="You are a science fiction writer, create a capital city at the users request.",
)

In [3]:
from langchain.messages import HumanMessage

question = HumanMessage(content=[
    {"type": "text", "text": "What is the capital of The Moon?"}
])

response = agent.invoke(
    {"messages": [question]}
)

print(response['messages'][-1].content)

The capital of the Moon is **Lunaris Prime**, a sprawling domed metropolis nestled in the Sea of Tranquility. Lunaris Prime is a breathtaking blend of advanced technology and sustainable design, symbolizing humanity's ambition and ingenuity. The city is protected beneath a massive, transparent graphene dome that shields its inhabitants from harsh cosmic radiation while offering an unparalleled view of the stars and Earth.

Lunaris Prime is constructed in layers: 

1. **The Core Ring:** The bustling administrative and political heart of the city where the Lunar Assembly—an interplanetary coalition governing the Moon—meets. This area features the **Celestial Spire**, an impossibly sleek tower that houses the Hall of Unity and an observatory.

2. **The Habitat Rings:** These concentric residential districts are designed with zero-gravity architecture in mind and boast vertical gardens, artificial skies, and oxygen-generating bio-domes. They host a diverse population of scientists, enginee

## Image input

In [4]:
from ipywidgets import FileUpload
from IPython.display import display

uploader = FileUpload(accept='.png', multiple=False)
display(uploader)

FileUpload(value=(), accept='.png', description='Upload')

In [5]:
print(uploader.value)

({'name': 'paris.png', 'type': 'image/png', 'size': 1061566, 'content': <memory at 0x115ca2800>, 'last_modified': datetime.datetime(2025, 12, 26, 13, 49, 58, 128000, tzinfo=datetime.timezone.utc)},)


In [6]:
import base64

# Get the first (and only) uploaded file dict
uploaded_file = uploader.value[0]

# This is a memoryview
content_mv = uploaded_file["content"]

# Convert memoryview -> bytes
img_bytes = bytes(content_mv)  # or content_mv.tobytes()

# Now base64 encode
img_b64 = base64.b64encode(img_bytes).decode("utf-8")

In [7]:
multimodal_question = HumanMessage(content=[
    {"type": "text", "text": "Tell me about this capital"},
    {"type": "image", "base64": img_b64, "mime_type": "image/png"}
])

response = agent.invoke(
    {"messages": [multimodal_question]}
)

print(response['messages'][-1].content)

This image shows Paris, the capital city of France. Known as the "City of Light" (Ville Lumière), Paris is one of the most famous and beautiful capitals in the world. Its iconic landmarks, rich history, and vibrant cultural scene make it a major global hub.

### Key Features of Paris:
1. **Landmarks:**
   - **Eiffel Tower:** The most recognizable symbol of Paris, it was originally criticized when constructed in 1889 but has since become a global icon.
   - **Notre-Dame Cathedral:** A masterpiece of Gothic architecture.
   - **Arc de Triomphe:** Commemorates those who fought for France, located at the end of the Champs-Élysées.
   - **Louvre Museum:** Houses some of the most famous artworks, including the Mona Lisa.
   - **Sacre-Coeur Basilica:** A stunning basilica overlooking the city from Montmartre.

2. **Culture and Arts:**
   - Paris is a global center for art, fashion, and gastronomy.
   - It is home to countless museums, galleries, theaters, and performance spaces.
   - The Pari

## Audio input

In [8]:
import sounddevice as sd
from scipy.io.wavfile import write
import base64
import io
import time
from tqdm import tqdm

# Recording settings
duration = 5  # seconds
sample_rate = 44100

print("Recording...")
audio = sd.rec(int(duration * sample_rate), samplerate=sample_rate, channels=1)
# Progress bar for the duration
for _ in tqdm(range(duration * 10)):   # update 10× per second
    time.sleep(0.1)
sd.wait()
print("Done.")

# Write WAV to an in-memory buffer
buf = io.BytesIO()
write(buf, sample_rate, audio)
wav_bytes = buf.getvalue()

aud_b64 = base64.b64encode(wav_bytes).decode("utf-8")

Recording...


100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:05<00:00,  9.56it/s]


Done.


In [9]:
agent = create_agent(
    model=model,
)

multimodal_question = HumanMessage(content=[
    {"type": "text", "text": "Tell me about this audio file"},
    {"type": "audio", "base64": aud_b64, "mime_type": "audio/wav"}
])

response = agent.invoke(
    {"messages": [multimodal_question]}
)

print(response['messages'][-1].content)

BadRequestError: Error code: 400 - {'error': {'message': "Invalid 'messages[0]'. Content blocks are expected to be either text or image_url type.", 'type': 'invalid_request_error', 'param': 'messages[0]', 'code': 'invalid_value'}}