## Text input

https://platform.openai.com/docs/models

In [1]:
from dotenv import load_dotenv

load_dotenv()

True

In [2]:
from langchain_ollama import ChatOllama
from langchain.agents import create_agent

model = ChatOllama(model="gpt-oss:20b", temperature=0.8)

agent = create_agent(
    model=model,
    system_prompt="You are a science fiction writer, create a capital city at the users request.",
)

In [3]:
from langchain.messages import HumanMessage

question = HumanMessage(content=[
    {"type": "text", "text": "What is the capital of The Moon?"}
])

response = agent.invoke(
    {"messages": [question]}
)

print(response['messages'][-1].content)

**Luna Nova ‚Äì The Capital of the Moon**

*In the year 2149, after the first permanent lunar settlement was established, the interplanetary government of the Lunar Assembly elected a single, shining metropolis to serve as the heart of the Moon‚Äôs new civilization. That city is Luna‚ÄØNova.*

---

### 1. Location & Layout
- **Geographical Setting**: Luna‚ÄØNova sits on the far‚Äëside, nestled within the serene basin of **Mare Serenitatis**. The basin‚Äôs basaltic plain provides a stable foundation for the city‚Äôs massive orbital ring, while the far‚Äëside‚Äôs lack of Earth‚Äôs glare offers a clear, uninterrupted sky for research stations and observatories.
- **Structural Core**: The city‚Äôs core is a **giant orbital ring**‚Äîa 5‚ÄØkm‚Äëwide, 1‚ÄØkm‚Äëtall structure that orbits the Moon at a 1‚ÄØkm altitude. Inside the ring are three concentric layers of habitation, governance, and industrial zones.
- **Dome Tiers**: Each layer is protected by a transparent, self‚Äërepairing nanogel 

## Image input

In [4]:
from ipywidgets import FileUpload
from IPython.display import display

uploader = FileUpload(accept='.jpg', multiple=False)
display(uploader)

FileUpload(value=(), accept='.jpg', description='Upload')

In [5]:
print(uploader.value)

({'name': 'langchain-picture.jpg', 'type': 'image/jpeg', 'size': 46865, 'content': <memory at 0x707210fcd540>, 'last_modified': datetime.datetime(2026, 2, 6, 18, 13, 50, 575000, tzinfo=datetime.timezone.utc)},)


In [6]:
import base64

# Get the first (and only) uploaded file dict
uploaded_file = uploader.value[0]

# This is a memoryview
content_mv = uploaded_file["content"]

# Convert memoryview -> bytes
img_bytes = bytes(content_mv)  # or content_mv.tobytes()

# Now base64 encode
img_b64 = base64.b64encode(img_bytes).decode("utf-8")

In [7]:
model = ChatOllama(model="qwen3-vl:8b", temperature=0.8)

agent = create_agent(
    model=model,
    system_prompt="You are a science fiction writer, create a capital city at the users request.",
)

In [8]:
multimodal_question = HumanMessage(content=[
    {"type": "text", "text": "Tell me about this capital"},
    {"type": "image", "base64": img_b64, "mime_type": "image/.jpg"}
])

response = agent.invoke(
    {"messages": [multimodal_question]}
)

print(response['messages'][-1].content)

**Luna Prime: The Capital of the Lunar Concord**  

Nestled in the Sea of Tranquility (Mare Tranquillitatis), **Luna Prime** isn‚Äôt just a settlement‚Äîit‚Äôs the beating heart of humanity‚Äôs first extraterrestrial civilization. Rising from the Moon‚Äôs ancient lava plains, it‚Äôs the capital of the **Lunar Concord**, a federation of Earth-based colonies and independent lunar settlements. Here, the lunar regolith isn‚Äôt just dust‚Äîit‚Äôs the foundation of a society that has redefined humanity‚Äôs relationship with the cosmos.  

---

### **The Heart of the City: The Concord Dome**  
At the center of Luna Prime stands the **Concord Dome**, a colossal geodesic structure 300 meters across, its transparent shell housing the city‚Äôs political, administrative, and cultural core. Unlike Earth‚Äôs cities, Luna Prime‚Äôs architecture is *functional poetry*: the dome‚Äôs inner surface is lined with **TerraSculpt‚Ñ¢ biomes**‚Äîself-sustaining ecosystems that grow crops under artificial sunli

## Audio input

In [None]:
from ipywidgets import FileUpload
from IPython.display import display

uploader = FileUpload(accept='.mp3', multiple=False)
display(uploader)

In [None]:
print(uploader.value)

In [None]:
import base64
import io
from faster_whisper import WhisperModel
from langchain_ollama import ChatOllama
from langchain_core.messages import HumanMessage

# 1. Setup the Transcription & LLM
stt_model = WhisperModel("base", device="cpu", compute_type="int8")
model = ChatOllama(model="gpt-oss:20b", temperature=0.8)

# 2. Extract and Encode Audio from Uploader
if uploader.value:
    uploaded_file = uploader.value[0]
    content_mv = uploaded_file["content"]
    
    # Convert to base64 (Your requested step)
    aud_bytes = bytes(content_mv)
    aud_b64 = base64.b64encode(aud_bytes).decode("utf-8")
    
    # 3. Transcribe Audio
    print("Transcribing voice input...")
    audio_data = base64.b64decode(aud_b64)
    audio_file = io.BytesIO(audio_data)
    
    segments, _ = stt_model.transcribe(audio_file, beam_size=5)
    transcribed_text = " ".join([segment.text for segment in segments])
    print(f"User said: {transcribed_text}")
    
    # 4. Agent Response
    response = model.invoke([
        HumanMessage(content=f"The user said: '{transcribed_text}'. Provide a helpful response.")
    ])
    
    print("\n--- AI Response ---")
    print(response.content)
else:
    print("Please upload an audio file (.wav) first.")