Skip to content

feat(gateway): media proxy colocate mode — filesystem store replaces base64 inline#858

Merged
thepagent merged 6 commits into
mainfrom
feat/media-proxy-colocate
May 19, 2026
Merged

feat(gateway): media proxy colocate mode — filesystem store replaces base64 inline#858
thepagent merged 6 commits into
mainfrom
feat/media-proxy-colocate

Conversation

@chaodu-agent
Copy link
Copy Markdown
Collaborator

@chaodu-agent chaodu-agent commented May 19, 2026

What problem does this solve?

The gateway currently uses base64-over-WebSocket for media transport. This has fundamental limitations:

  • 33% size overhead — 3MB photo → 4MB base64 payload
  • WebSocket backpressure — large frames block text events
  • Memory spikes — download + encode + decode = ~4x RAM per file
  • Cannot scale — hard ceiling on file size (WS frame limits)
  • No streaming — must buffer entire file before sending

How does it solve it?

Media Proxy (Colocate Mode) — Gateway downloads media and writes to ~/.openab/media/inbound/<uuid>. The file path is passed to Core via the WS event. Core reads bytes directly from disk.

User sends media (photo/voice/file)
  → Platform webhook delivers to Gateway
  → Gateway downloads via platform API (auth stays in Gateway)
  → Image: resize ≤1200px, JPEG compress (GIF passthrough ≤5MB)
  → Store to ~/.openab/media/inbound/<uuid>
  → WS event includes file path in attachments[].path
  → Core reads from disk (zero encoding overhead)
  → Processes: image → LLM, audio → STT, text_file → code block
  → File auto-evicted after 2 minutes

Architecture

┌─────────────────────────────────────────────────────────────────┐
│  Pod: openab-xxx-kiro                                           │
│                                                                 │
│  ┌──────────────┐  ┌──────────────┐  ┌───────────────────┐     │
│  │   openab     │  │   gateway    │  │   cloudflared     │     │
│  │   (core)     │  │   (sidecar)  │  │   (sidecar)       │     │
│  │              │  │              │  │                   │     │
│  │ kiro-cli     │  │ HTTP :8080   │  │ tunnel → :8080   │     │
│  │ agent pool   │  │ /webhook/tg  │  │                   │     │
│  │              │  │ /ws (core)   │  │                   │     │
│  │ reads media ←──── stores media │  │                   │     │
│  │              │  │              │  │                   │     │
│  └──────┬───────┘  └──────┬───────┘  └───────────────────┘     │
│         │                  │                                    │
│         ▼                  ▼                                    │
│  ┌─────────────────────────────────────┐                        │
│  │  PVC: data (mounted at /home/agent) │                        │
│  │                                     │                        │
│  │  /home/agent/                       │  ← shared HOME        │
│  │    .openab/media/inbound/           │  ← gateway writes     │
│  │    .kiro/                           │  ← core agent state   │
│  │    ...                              │                        │
│  └─────────────────────────────────────┘                        │
└─────────────────────────────────────────────────────────────────┘

Data flow (image from Telegram):

Telegram ──HTTPS──→ Cloudflare Tunnel
                         │
                         ▼
                    cloudflared ──HTTP──→ gateway :8080
                                              │
                         ┌────────────────────┘
                         ▼
                    1. Download image from Telegram API
                    2. Store → /home/agent/.openab/media/inbound/<uuid>
                    3. Forward event via WebSocket → core
                         │
                         ▼
                    core (openab)
                    4. Receives event with media path
                    5. Reads image from shared PVC
                    6. Passes to kiro-cli agent
                    7. Agent responds → gateway → Telegram

Why colocate mode?

Gateway runs as a sidecar in the same pod as Core — they share $HOME. Simplest, fastest path: no HTTP proxy, no shared PVC config, just filesystem I/O.

Prior Art

  • OpenClaw: ~/.openclaw/media/inbound/<uuid> with 2-min TTL — same pattern we adopt
  • Hermes Agent: gateway-first design with per-platform capability declarations

Security Considerations

Documented in gateway/src/store.rs comments:

  1. Path traversal prevention — Filenames are server-generated UUIDs only, never user-supplied
  2. No auth token leakage — Platform tokens stay in Gateway, never reach Core/Agent
  3. TTL auto-eviction — Files deleted after 2 minutes, prevents disk exhaustion
  4. Colocate trust boundary — File path passed over internal WS only, never exposed externally
  5. Size limits — Unified 20MB cap in store_media() (defense-in-depth)
  6. No executable content — Stored as raw data, never executed; MIME type determines processing

Platform Support Matrix

Platform Images Audio/Voice Text Files Video Binary
Telegram ✅ (STT) ✅ (whitelist) skipped skipped
Feishu ✅ (STT) ✅ (whitelist) skipped skipped
Google Chat ✅ (STT) ✅ (whitelist) skipped skipped
WeCom ✅ (whitelist) skipped skipped
LINE follow-up PR follow-up PR

Implementation

Component Change
gateway/src/store.rs File store + 20MB cap + TTL eviction loop (new)
gateway/src/media.rs Shared image resize/compress + MediaKind enum (new)
gateway/src/schema.rs Attachment.path: Option<String> field
gateway/src/adapters/telegram.rs Inbound photo/voice/audio/document via file store
gateway/src/adapters/feishu.rs Migrated from base64 to file store
gateway/src/adapters/googlechat.rs Migrated from base64 to file store
gateway/src/adapters/wecom.rs Migrated from base64 to file store
gateway/src/main.rs Shared reqwest::Client, eviction task spawn
src/gateway.rs Core reads from path (preferred) with base64 data fallback
docs/inbound-attachments.md Unified cross-platform media reference (new)
docs/telegram.md Added inbound media section
docs/feishu.md Updated to filesystem store
docs/google-chat.md Updated to filesystem store

Supersedes / Closes

Future Roadmap

  • HTTP media proxy mode for separated deployments (Gateway ≠ Core pod)
  • ~/.openab/media/outbound/ for agent → user file sends
  • LINE adapter media support (same pattern, separate PR)
  • Core-side truncation for large text files

Test Plan

  • cargo check — gateway + core pass
  • cargo test — 170 gateway tests pass (store, media, adapters)
  • Manual: send photo/voice/document to Telegram bot, verify agent receives content

Thread: 1506327876427845678

@github-actions github-actions Bot added the closing-soon PR missing Discord Discussion URL — will auto-close in 3 days label May 19, 2026
@github-actions
Copy link
Copy Markdown

⚠️ This PR is missing a Discord Discussion URL in the body.

All PRs must reference a prior Discord discussion to ensure community alignment before implementation.

Please edit the PR description to include a link like:

Discord Discussion URL: https://discord.com/channels/...

This PR will be automatically closed in 3 days if the link is not added.

@shaun-agent
Copy link
Copy Markdown
Contributor

shaun-agent commented May 19, 2026

OpenAB PR Screening

This is auto-generated by the OpenAB project-screening flow for context collection and reviewer handoff.
Click 👍 if you find this useful. Human review will be done within 24 hours. We appreciate your support and contribution 🙏

Screening report screened PR #858, posted the GitHub comment, and moved the project item to `PR-Screening`.

GitHub comment: #858 (comment)
Project action: https://github.com/orgs/openabdev/projects/1/views/1, item PVTI_lADOEFbZWM4BUUALzgtNaME now has status PR-Screening.

Intent

PR #858 is trying to replace inline base64 attachment transport between Gateway and Core with co-located filesystem transport. The operator-visible problem is that large or non-image media currently bloats WebSocket messages, creates memory pressure, introduces inconsistent adapter behavior, and forces practical size/type limits.

Feat

Feature work. Gateway gains a media store under ~/.openab/media/inbound/, Attachment gains optional path and url fields, and the intended behavior is for Core to read attachments from a local file path instead of decoding base64 inline data.

Current implementation appears partial: it adds media_store.rs and schema fields, but the PR body says adapter integration, Core path reading, and cleanup task wiring are still incomplete.

Who It Serves

Primary beneficiaries are agent runtime operators and deployers running co-located Gateway/Core deployments. Secondary beneficiaries are Discord, LINE, Telegram, and Feishu users who need reliable media/file delivery without silent drops or payload limits caused by base64 transport.

Rewritten Prompt

Implement co-located filesystem media transport for OpenAB Gateway attachments.

Add a Gateway media store that writes inbound platform media to ~/.openab/media/inbound/<uuid>.<ext> with safe file creation, bounded size handling, extension/MIME preservation where possible, and periodic cleanup of expired files. Extend the Gateway attachment schema with path: Option<String> and url: Option<String> while keeping data backward-compatible during migration.

Update LINE, Telegram, and Feishu inbound media handling so adapters download authenticated media, store it through the shared media store, and emit attachments with path set and data empty. Update Core gateway attachment conversion so it prefers path when present, reads the file from disk, validates type/size according to existing image/text/audio handling, and falls back to base64 data only for backward compatibility. Wire cleanup startup in Gateway main, add focused tests for store/write/cleanup/schema compatibility/Core path reading, and document the same-HOME/shared-volume deployment requirement.

Acceptance criteria: a Gateway/Core deployment sharing $HOME can pass image, audio, text, and binary attachments through path-based transport; existing base64 attachments still work during migration; stale files are cleaned up; separated deployments fail clearly or continue to use the older supported path until remote proxy mode exists.

Merge Pitch

This should move forward because base64-over-WS is the wrong long-term transport for authenticated platform media. The direction is sound: it reduces bandwidth, avoids large-frame backpressure, allows non-text/non-image files, and matches the co-located deployment model OpenAB already assumes in many gateway setups.

Risk is medium-high until the missing integration lands. The likely reviewer concern is not the schema addition, but incomplete behavior: absolute path trust boundaries, same-HOME assumptions, cleanup races, file lifetime vs Core read timing, and whether separated Gateway/Core deployments degrade safely.

Best-Practice Comparison

OpenClaw applies well here. The PR intentionally mirrors OpenClaw local media directory pattern, but it should also carry over the reliability details: durable-enough handoff semantics, explicit cleanup policy, clear run/read logs on failures, and delivery routing that does not assume a path is valid outside the local execution boundary.

Hermes Agent only partially applies. Its in-process memory approach is less relevant because OpenAB agent/Core boundary is external and Gateway media often requires platform-authenticated download. The useful Hermes comparison is the preference for self-contained handoff data and clean lifecycle ownership; OpenAB should define which process owns media cleanup and how Core reports missing/expired files.

Implementation Options

Conservative: keep base64 as the default transport, land only schema fields plus media store behind an opt-in flag, and add tests/docs. This reduces merge risk but does not solve current adapter payload pressure yet.

Balanced: complete co-located path mode for the adapters listed in the PR, make Core prefer path with base64 fallback, wire cleanup, and document same-HOME/shared-volume requirements. Keep url reserved but unused. This delivers the intended benefit while preserving migration safety.

Ambitious: implement both co-located path mode and remote HTTP proxy mode now, with signed/short-lived media URLs, explicit media IDs instead of raw absolute paths, structured cleanup state, and adapter-wide migration. This gives the best architecture for separated deployments but is too large for this PR unless split aggressively.

Comparison Table

Option Speed Complexity Reliability Maintainability User Impact Fit for OpenAB now
Conservative Fast Low Medium High Low immediate improvement Useful only as a staging PR
Balanced Medium Medium High if path validation and cleanup are tested High High for media-heavy users Best fit
Ambitious Slow High Potentially highest, but more failure modes Medium until split Highest across deployment models Better as follow-up phases

Recommendation

Take the balanced path, but do not merge this PR while it is only schema plus media_store.rs. First finish the adapter writes, Core path-read preference, cleanup startup wiring, and focused tests. Keep base64 fallback in the same PR for compatibility, and split remote HTTP proxy mode into a separate follow-up once co-located mode is proven.

Sequencing: merge the complete co-located implementation first, then add separated-deployment proxy support, then remove the base64 decode path after at least one release cycle with telemetry/log evidence that path mode is stable.

@chaodu-agent chaodu-agent changed the title feat(gateway): media proxy — co-locate filesystem mode (deprecate base64 inline) feat(gateway): media proxy — co-locate filesystem mode for inbound multimodal (LINE/Telegram/Feishu) May 19, 2026
@chaodu-agent chaodu-agent removed the closing-soon PR missing Discord Discussion URL — will auto-close in 3 days label May 19, 2026
@github-actions github-actions Bot added the closing-soon PR missing Discord Discussion URL — will auto-close in 3 days label May 19, 2026
…cate base64 inline

Replace base64-over-WebSocket media transport with local filesystem store.
Gateway downloads media from platform APIs and writes to
~/.openab/media/inbound/<uuid>, passing the file path to Core via the
WS event. Core reads bytes directly from disk — zero encoding overhead,
no WS payload bloat.

Key changes:
- gateway/src/store.rs: file store with 2-min TTL eviction (OpenClaw pattern)
- gateway/src/media.rs: shared image resize/compress + MediaKind enum
- gateway/src/schema.rs: Attachment gains optional 'path' field
- gateway/src/adapters/telegram.rs: inbound photo/voice/audio/document support
- src/gateway.rs: Core reads from path (colocate) with base64 fallback

Security: UUID-only filenames (no path traversal), platform tokens never
reach Core, TTL auto-eviction prevents disk exhaustion, colocate trust
boundary documented.

Supersedes #757 (base64 inline approach).
Closes #690.
@chaodu-agent chaodu-agent force-pushed the feat/media-proxy-colocate branch from efb9f2a to 4a8e7a3 Compare May 19, 2026 17:31
@chaodu-agent chaodu-agent changed the title feat(gateway): media proxy — co-locate filesystem mode for inbound multimodal (LINE/Telegram/Feishu) feat(gateway): media proxy colocate mode — filesystem store replaces base64 inline May 19, 2026
Prevents future callers from accidentally writing unbounded files.
Matches AUDIO_MAX_DOWNLOAD as the largest allowed media type.
@chaodu-agent chaodu-agent force-pushed the feat/media-proxy-colocate branch from 14daca4 to 7847a0a Compare May 19, 2026 17:35
chaodu-agent pushed a commit that referenced this pull request May 19, 2026
LINE adapter:
- Support image and audio message types (same pattern as Telegram)
- Download via LINE Content API, resize images, store to filesystem
- Derive audio extension from content_type (mp3/ogg/m4a)
- Empty event guard

media_store.rs:
- Add 20MB hard cap inside store_media() as defense-in-depth
- Future callers cannot accidentally write unbounded files

Addresses review feedback from 普渡法師 on PR #858.
@chaodu-agent chaodu-agent marked this pull request as ready for review May 19, 2026 17:45
@chaodu-agent chaodu-agent requested a review from thepagent as a code owner May 19, 2026 17:45
@chaodu-agent
Copy link
Copy Markdown
Collaborator Author

Ready for review 🙏

Changes since last review:

  • ✅ LINE adapter added (image + audio inbound via filesystem)
  • store_media() now has 20MB internal hard cap (defense-in-depth)

Both findings from 普渡法師 addressed. Requesting review from @thepagent and @wangyuyan-agent.

@chaodu-agent chaodu-agent force-pushed the feat/media-proxy-colocate branch 2 times, most recently from 14daca4 to cc85846 Compare May 19, 2026 17:51
Feishu, Google Chat, and WeCom adapters now use store::store_media()
instead of base64 encoding. All media flows through the same
~/.openab/media/inbound/<uuid> path — consistent across all platforms.

No adapter left on base64 inline.
@chaodu-agent chaodu-agent force-pushed the feat/media-proxy-colocate branch from cc85846 to 74f3d94 Compare May 19, 2026 17:51
Replace base64 references with filesystem store description.
Add Telegram inbound media section (images, documents, audio/voice).
@chaodu-agent chaodu-agent force-pushed the feat/media-proxy-colocate branch from ec16ae8 to 8f1ccec Compare May 19, 2026 17:54
…ence

Covers architecture, platform support matrix, processing pipeline,
size limits, storage security, and future HTTP proxy roadmap.
@chaodu-agent chaodu-agent force-pushed the feat/media-proxy-colocate branch 2 times, most recently from 47e2afb to aadf507 Compare May 19, 2026 17:54
…pressure

With colocate mode, files go to disk not WS payload. The 512KB limit
was a base64-era constraint. Now unified at 20MB (same as store cap).
Core decides how much to read/truncate.
@chaodu-agent chaodu-agent force-pushed the feat/media-proxy-colocate branch from df7ab3d to 81ef91a Compare May 19, 2026 17:58
@chaodu-agent chaodu-agent removed the closing-soon PR missing Discord Discussion URL — will auto-close in 3 days label May 19, 2026
@thepagent thepagent merged commit abef26d into main May 19, 2026
15 of 16 checks passed
@github-actions github-actions Bot added pending-maintainer closing-soon PR missing Discord Discussion URL — will auto-close in 3 days labels May 19, 2026
chaodu-agent added a commit that referenced this pull request May 20, 2026
Gateway needs write access to ~/.openab/media/inbound/ for media proxy
colocate mode (PR #858). Both core and gateway now share the PVC.
chaodu-agent added a commit that referenced this pull request May 20, 2026
Gateway needs write access to ~/.openab/media/inbound/ for media proxy
colocate mode (PR #858). Both core and gateway now share the PVC.
thepagent pushed a commit that referenced this pull request May 21, 2026
* feat: add openab-telegram chart (colocated OAB + gateway + cloudflared)

Single-pod Helm chart for Telegram deployments:
- OAB agent, gateway, and cloudflared tunnel as colocated containers
- Shared emptyDir for /tmp, PVC for agent persistence
- Only 2 required --set flags: telegramBotToken, cloudflareTunnelToken
- Follows the reference architecture from docs/refarch/telegram-cloudflare-tunnel.md

Closes #872

* feat(openab-telegram): add release channel (beta/stable) support

- channel: stable (default) strips -beta.* from appVersion for both images
- channel: beta uses appVersion as-is for core, strips prerelease for gateway
  (gateway has no beta tags)
- Explicit image.tag / gateway.tag override still takes precedence

* fix(openab-telegram): pin gateway to v0.5.0, simplify helper

Gateway has independent release cadence from core — no appVersion
derivation. Just use the pinned tag directly.

* feat(openab-telegram): add existingSecret support + credential management README

- existingSecret: reference a pre-created K8s Secret (skips chart Secret creation)
- README documents 3 credential options: --set, --from-literal, --from-env-file
- Secrets from external managers (AWS SM) can flow to K8s without touching disk

* fix(openab-telegram): address review findings

- Pin cloudflared to 2026.5.0 (was 'latest')
- Change agent.command default to 'openab' (generic, not kiro-specific)
- Fix NOTES.txt webhook curl to respect existingSecret

* fix(openab-telegram): mount shared PVC in gateway container

Gateway needs write access to ~/.openab/media/inbound/ for media proxy
colocate mode (PR #858). Both core and gateway now share the PVC.

* docs(openab-telegram): add ASCII architecture diagram to README

* docs(openab-telegram): add Prerequisites section with CLI-only tunnel setup

* docs(openab-telegram): make README fully headless/CLI-only

- Cloudflare tunnel setup via API token (no browser)
- Ingress config via local config.yml
- Webhook setup moved to Prerequisites (before helm install)
- Post-install only has agent auth (device flow)
- Fixed agent command to 'openab'

* chore: bump gateway tag to v0.5.1

* refactor: use floating channel tags for agent image

Instead of regex-stripping beta suffix from appVersion, resolve
image tag directly from channel value (stable/beta). Requires
PR #878 to publish the floating tags.

* chore: update appVersion to 0.8.3, fix channel comments

* fix: retain PVC on helm uninstall

Agent auth credentials and state live in the PVC. Without this,
uninstall+reinstall requires re-authentication.

* docs: add tunnel ingress config step to NOTES.txt

* fix: default agent command to kiro-cli acp

* docs: rewrite NOTES.txt as structured AI-friendly post-install guide

* feat: support cloudflare-api-token for automated ingress config

Optional third key in the K8s Secret enables AI agents to configure
tunnel ingress via the Cloudflare API without external credentials.
NOTES.txt extracts all needed values from the secret itself.

* docs: add remote-mode ingress config and AI-assisted install prompt

---------

Co-authored-by: chaodu-agent <chaodu-agent@users.noreply.github.com>
Co-authored-by: Pahud Hsieh <pahud@Pahuds-MacBook-Neo.local>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

closing-soon PR missing Discord Discussion URL — will auto-close in 3 days pending-maintainer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(gateway): support images and audio for LINE/Telegram

3 participants