Summary
The Custom Gateway currently only supports text messages. Add support for sending and receiving images/audio on webhook-based platforms (LINE, Telegram).
| Direction |
Text |
Images |
Audio/Voice |
| Inbound (user → bot) |
✅ |
❌ |
❌ |
| Outbound (bot → user) |
✅ |
❌ |
❌ |
Use Case
Users on LINE and Telegram want to send photos/screenshots to the AI agent for analysis (e.g. "what's in this image?", "review this architecture diagram", "debug this error screenshot"). Currently the gateway silently drops non-text messages — the agent never sees them. This blocks image-understanding workflows that already work on Discord.
Recommended Approach: Gateway Media Proxy
Gateway downloads media from platform APIs (which require auth), serves it at a local HTTP endpoint. OAB core fetches from that URL — same pattern as Discord CDN, zero core changes needed.
LINE/Telegram webhook (image message)
→ gateway downloads via platform API (auth required)
→ gateway stores in memory with UUID key + 2-min TTL
→ gateway serves at: GET http://gateway:8080/media/<uuid>
→ WS message to OAB: { attachments: [{ type: "image", url: "http://gateway:8080/media/<uuid>" }] }
→ OAB core downloads from gateway URL (no auth, internal network)
→ same flow as Discord CDN images through media.rs
Why this approach
- Gateway already listens on :8080 — adding
/media/<uuid> route is trivial
- OAB core
media.rs already downloads images from URLs — zero code change
- No shared volumes, no S3, no external dependencies
- Works for all gateway platforms (LINE, Telegram, Teams)
Implementation sketch (~50 lines in gateway)
struct MediaEntry {
data: Vec<u8>,
content_type: String,
created_at: Instant,
}
type MediaStore = Arc<RwLock<HashMap<String, MediaEntry>>>;
// On inbound image:
// 1. Download from platform API (LINE Content API, Telegram getFile)
// 2. Store: media_store.insert(uuid, MediaEntry { data, content_type, now })
// 3. Send WS message with url: "http://gateway:8080/media/<uuid>"
// GET /media/<uuid> handler:
// → return bytes + content-type
// → 404 if expired/not found
// Background eviction: every 30s, remove entries older than 2 minutes
Why not base64 over WebSocket?
- 33% size overhead (3MB photo → 4MB payload)
- Large WS frames cause backpressure issues
Why not pass platform URL directly to core?
- LINE: no public URL — requires
Authorization: Bearer header with channel access token
- Telegram: download URL contains bot token (secret leak concern)
- Discord CDN works because URLs are public (no auth needed)
Prior art: OpenClaw media store
OpenClaw uses a local filesystem store (~/.openclaw/media/inbound/<uuid>) with 2-min TTL and media:// URI scheme. Their approach assumes co-located processes (same pod/shared volume). Our separated gateway architecture makes the HTTP proxy more appropriate, but the TTL cleanup and size-limit patterns are worth adopting.
Platform-Specific Details
Inbound
| Platform |
Image source |
Auth required |
| LINE |
GET https://api-data.line.me/v2/bot/message/{id}/content |
Authorization: Bearer {token} |
| Telegram |
GET https://api.telegram.org/file/bot<token>/<path> |
Token in URL |
| Teams |
Attachment URL in activity |
Bearer token |
Outbound (future)
| Platform |
Method |
Requirement |
| LINE |
Image message type |
Public HTTPS URL for image |
| Telegram |
sendPhoto API |
Supports direct file upload |
| Teams |
Adaptive Card with image |
Public URL or hosted attachment |
Scaling Path
Start with in-memory HashMap (sufficient for low-volume usage). If memory pressure becomes an issue, swap to temp files — same API surface, no protocol change.
Suggested Implementation Order
- Inbound images — highest value (user sends photo → agent sees it)
- Inbound audio/voice — STT pipeline (similar to Discord voice messages)
- Outbound images — agent sends image back to user
Summary
The Custom Gateway currently only supports text messages. Add support for sending and receiving images/audio on webhook-based platforms (LINE, Telegram).
Use Case
Users on LINE and Telegram want to send photos/screenshots to the AI agent for analysis (e.g. "what's in this image?", "review this architecture diagram", "debug this error screenshot"). Currently the gateway silently drops non-text messages — the agent never sees them. This blocks image-understanding workflows that already work on Discord.
Recommended Approach: Gateway Media Proxy
Gateway downloads media from platform APIs (which require auth), serves it at a local HTTP endpoint. OAB core fetches from that URL — same pattern as Discord CDN, zero core changes needed.
Why this approach
/media/<uuid>route is trivialmedia.rsalready downloads images from URLs — zero code changeImplementation sketch (~50 lines in gateway)
Why not base64 over WebSocket?
Why not pass platform URL directly to core?
Authorization: Bearerheader with channel access tokenPrior art: OpenClaw media store
OpenClaw uses a local filesystem store (
~/.openclaw/media/inbound/<uuid>) with 2-min TTL andmedia://URI scheme. Their approach assumes co-located processes (same pod/shared volume). Our separated gateway architecture makes the HTTP proxy more appropriate, but the TTL cleanup and size-limit patterns are worth adopting.Platform-Specific Details
Inbound
GET https://api-data.line.me/v2/bot/message/{id}/contentAuthorization: Bearer {token}GET https://api.telegram.org/file/bot<token>/<path>Outbound (future)
sendPhotoAPIScaling Path
Start with in-memory HashMap (sufficient for low-volume usage). If memory pressure becomes an issue, swap to temp files — same API surface, no protocol change.
Suggested Implementation Order