Skip to content

feat(channels): introduce Zalo Personal channel integration#32

Merged
viettranx merged 25 commits into
nextlevelbuilder:mainfrom
vanducng:feat/28-zalo-personal-channel
Mar 3, 2026
Merged

feat(channels): introduce Zalo Personal channel integration#32
viettranx merged 25 commits into
nextlevelbuilder:mainfrom
vanducng:feat/28-zalo-personal-channel

Conversation

@vanducng
Copy link
Copy Markdown
Contributor

@vanducng vanducng commented Mar 1, 2026

Summary

Introduce Zalo Personal channel — full messaging integration via Zalo's personal chat protocol.

  • Protocol layer: auth, encrypted API calls, WebSocket listener, message send/receive, crypto, contacts
  • Channel integration: Start/Stop lifecycle, DM/group policy, @mention gating, pairing, typing indicator, image attachments
  • QR login flow: browser-free onboarding via WS-pushed QR codes, credential encryption + persistence
  • Contacts picker: parallel friend/group fetch, searchable checklist with manual ID entry for allow_from
  • Web UI: channel wizard, QR dialog, status badges, has_credentials in API responses
  • Reliability: auto-restart with exponential backoff, silent disconnect detection, duplicate session recovery

Closes #28

Test plan

  • go build ./... compiles
  • go vet ./... passes
  • go test ./... passes
  • pnpm build in ui/web/ compiles
  • Manual: create/edit zalo_personal instance, QR login, contacts picker, message round-trip

Onboarding

onboarding_compressed.mp4

Chat

ScreenRecording_03-03-2026_compressed.mp4

vanducng added 8 commits March 1, 2026 16:54
Implement complete Zalo Personal Chat integration including:
- Message protocol layer (request/response/event types)
- Connection management with auth flow
- Message sending/receiving with text and media support
- User/group management and sync
- Telegram-style contact and conversation handling
- Comprehensive unit tests with 85%+ coverage

Architecture follows existing channel patterns (Telegram, Feishu) with
raw API calls for session management and message delivery. Includes
error handling, rate limiting awareness, and logging.
Wire protocol package to GoClaw's channel system:
- channel.go: Channel struct, Start/Stop/Send, listenLoop, message handlers
- auth.go: credential resolution (preloaded > file > QR), persistence
- policy.go: DM/group policy, @mention gating, pairing with debounce
- factory.go: managed mode factory (requires credentials, no QR)
- cmd/gateway.go: register standalone + managed factory
Add zalo_personal to channel type dropdown, credential fields
(IMEI, cookie, userAgent), and config schema (DM/group policy,
require_mention, allow_from).
Add real-time QR code login flow for zalo_personal channel instances
in managed mode. Users create an instance without credentials, then
trigger QR login from the web dashboard.

Backend:
- New RPC method zalo.personal.qr.start with per-instance mutex
- QR PNG pushed via client-scoped WS events (not broadcast)
- Credentials encrypted and saved to DB on successful scan
- Cache invalidation triggers automatic channel reload/start
- Factory returns nil,nil for missing credentials (skip, not error)
- Instance loader handles nil-channel gracefully

Frontend:
- ZaloPersonalQRDialog with auto-start, retry, and auto-close
- QR button in channel instances table for zalo_personal type
- Credential fields no longer required (auto-populated via QR)
QR flow already validates session via qrCheckSession + qrGetUserInfo.
Calling LoginWithCredentials again conflicts with the active QR session
state, causing "empty response" errors. Credentials are validated when
the channel starts instead. Also rename log prefix from "zca" to
"Zalo Personal".
BuildCookieJar only set cookies for chat.zalo.me but the login API
uses wpa.chat.zalo.me. Cookies weren't sent to the subdomain, causing
"empty response" on channel startup. Now sets cookies for both hosts.
…ener

The UTF-8 validity check in decryptAESGCMPayload ran on raw decrypted
bytes before gzip decompression, causing all encType=2 (AES-GCM+gzip)
messages to fail with "decrypted payload is not valid UTF-8".

Move the check to decryptEventData so it runs after all processing
(decryption + decompression) is complete.
…ersonal

- Remove credential text fields for zalo_personal, show QR auth info banner
- Add has_credentials boolean to HTTP and WS mask functions
- Implement FetchFriends/FetchGroups protocol (encrypted Zalo API)
- Add zalo.personal.contacts WS RPC method with parallel fetch
- Create ZaloContactsPicker component with search, selection, manual entry
- Integrate picker in channel instance edit dialog for allow_from config
@vanducng vanducng force-pushed the feat/28-zalo-personal-channel branch from 0e0131c to 78e2f63 Compare March 1, 2026 11:57
vanducng and others added 12 commits March 1, 2026 18:59
…tion

The Zalo API returns double-wrapped responses: outer envelope contains
encrypted base64 data, which when decrypted yields another Response
envelope with error_code and data fields. The decryptDataField helper
was returning the raw decrypted bytes without unwrapping the inner
envelope, causing json unmarshal failures when parsing friends/groups.
The Zalo group info endpoint uses a version-based caching mechanism.
Passing the actual version from step 1 causes the server to return
the group in "unchangedsGroup" with empty "gridInfoMap". By passing
version 0 for all groups, we force the server to return full group
info including name, avatar, and member count.
When the edit modal is reopened with already-selected contact IDs,
contacts are now auto-fetched so badges show display names instead
of raw numeric IDs.
SendMessage used io.ReadAll + json.Unmarshal directly but the response
is gzip-compressed (Accept-Encoding: gzip header). Use readJSON() which
handles gzip decompression, fixing "invalid character '\x1f'" errors.
The Zalo send message API response is encrypted like all other endpoints.
Parse outer envelope, decrypt the data field, then extract msgId from
the decrypted inner response.
- Migrate WebSocket client from gorilla to coder/websocket, eliminating
  unsafe/reflect hacks for RSV1 decompression and buffer inspection
- Add channel-level restart with exponential backoff (2s→60s cap, max 10)
  so channels auto-recover instead of stopping permanently
- Reset listener retry counters after 60s stable connection to prevent
  long-lived connections from exhausting retry budget
- Add code 3000 (duplicate session) recovery with 60s initial delay
- Detect silent disconnects via read deadline (2.5x ping interval)
- Fix Stop() to always cancel context, preventing reconnect timer leaks
- Refactor UI channel form into wizard-based flow with registry pattern
- Auto-refresh channel status after create/update dialog closes
Move Zalo personal channel RPC handlers from internal/gateway/methods to
internal/channels/zalo/personal/zalomethods, improving code organization
and removing prefix redundancy. Rename types: ZaloPersonalQRMethods →
QRMethods, ZaloPersonalContactsMethods → ContactsMethods.

- Move zalo_personal_qr.go → zalomethods/qr.go
- Move zalo_personal_contacts.go → zalomethods/contacts.go
- Update imports in cmd/gateway.go (2 call sites)
- Update internal/channels/zalo/personal imports
Show "typing..." in Zalo while the LLM processes messages, matching
the Telegram/Discord pattern. Uses the shared typing.Controller with
4s keepalive (Zalo typing expires ~5s) and 60s TTL safety net.
- Add Raw field to Content struct to preserve non-string JSON payloads
- Add Attachment struct with IsImage() detection (ext + Zalo CDN paths)
- Add AttachmentText() for human-readable placeholders (image/file/other)
- Download image attachments to temp files for agent vision pipeline
- Non-image files get text placeholder only (no download)
- Fix URL query param stripping in file extension detection
… jar fix

coder/websocket did not propagate session cookies for wss:// URLs,
causing Zalo backend to reject connections with "zpw_sek not found".
Switch to gorilla/websocket which handles wss→https scheme conversion
natively. Add wsJar safety wrapper and fix Close() mutex consistency.

Also update Makefile `up` target to use --no-cache builds.
Replace wsJar wrapper with direct cookie injection from chat.zalo.me
base domain. Fixes host-only cookies (zpw_sek) not matching WS
subdomains (ws*-msg.chat.zalo.me) due to Go cookiejar limitations.
@vanducng vanducng changed the title feat(channels): Zalo Personal QR-only onboarding + contacts picker feat(channels): introduce Zalo Personal channel integration Mar 2, 2026
@vanducng vanducng marked this pull request as ready for review March 2, 2026 01:13
vanducng added 5 commits March 2, 2026 08:17
- Add SSRF protection to downloadFile using CheckSSRF (URL validation,
  private IP blocking, DNS pinning) with context and 30s timeout
- Protect c.sess/c.listener with sync.RWMutex to eliminate data races
  during restart; add thread-safe session()/getListener() accessors
- Add stopped flag + reconnTimer to Listener to prevent zombie reconnects
  after Stop(); timer cancelled on Stop(), checked before Start()
- Fix QR flow using context.Background() detached from WS client; now
  derives from parent ctx so flow cancels on client disconnect
- Set initial 30s read deadline for cipher key handshake to prevent
  indefinite blocking before ping loop starts
- Use defer in WSClient.Close() to prevent connection leak on panic
- Document ReadMessage ctx limitation and two-layer reconnect design
gobwas/ws was a leftover from the previous coder/websocket usage,
no longer imported by any Go source files.
Policy defaults were inconsistent across three layers causing group/DM
allowlist enforcement to silently fail. New() applied "allowlist" default
to local vars but never wrote back to config; checkGroupPolicy() then
read empty string and defaulted to "open", bypassing the allowlist.
UI Select components displayed schema defaults visually without
persisting them to configValues, so DB config never stored the policy.
Resolve conflict in channel-instance-form-dialog.tsx by keeping
the wizard flow from the feature branch while incorporating
TelegramGroupOverrides from upstream. Add zalo_personal to the
shared CHANNEL_TYPES constant.
Resolve conflict in channels-page.tsx: keep wizard auth/edit buttons
from feature branch, integrate row-click navigation from upstream
with stopPropagation on action buttons. Restore editInstance state
and updateInstance hook that were lost in auto-merge.
Copy link
Copy Markdown
Contributor

@viettranx viettranx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Introduces a full Zalo Personal channel integration — reverse-engineered personal Zalo chat protocol. Includes: protocol layer (auth, crypto, WebSocket listener, message send/receive), channel lifecycle with reconnect/backoff, QR login flow via WS-pushed QR codes, contacts picker, and Web UI wizard. Well-structured, well-tested (~1200 lines of tests), follows existing channel patterns closely.

Risk level: Medium — Large surface area (5700+ lines, 45 files), unofficial/reverse-engineered protocol, but well-isolated in its own package with no modifications to core logic.

Findings

Critical

None found.

Important

1. Credentials directory created world-readable (auth.go:90)
os.MkdirAll(filepath.Dir(path), 0755) — directory should be 0700 to prevent other users from discovering the credentials file.

2. Temp files from image downloads never cleaned up (channel.go:628-682)
downloadFile() creates temp files via os.CreateTemp but there's no cleanup. These accumulate indefinitely.

3. Data race: sendPairingReply accesses c.sess without lock (policy.go:47)
if c.pairingService == nil || c.sess == nil reads c.sess directly, but it's protected by c.mu. Should use c.session().

4. Data race: checkGroupPolicy accesses c.sess.UID without lock (policy.go:99)
isBotMentioned(c.sess.UID, mentions) — should capture via sess := c.session() first.

5. Makefile --no-cache is likely a debugging leftover (Makefile:22)
Changed $(COMPOSE) up -d --build$(COMPOSE) build --no-cache + $(COMPOSE) up -d. Disables Docker layer caching, significantly slowing builds.

Suggestions

6. io.ReadAll without size limit in loadLoginPage() (auth.go:226) — consider io.LimitReader

7. decompressGzip uses unbounded io.ReadAll (listener_handlers.go:159) — malicious payload could cause OOM

8. Zero IV in AES-CBC is correctly documented as Zalo protocol requirement, but adding a brief note about why deterministic encryption is acceptable here would help future maintainers (crypto.go:41)

9. Inconsistent alignment in ChannelsConfig struct creates unnecessary diff noise for unchanged fields (config_channels.go:9-11)

10. Factory nil, nil return is a novel pattern — correctly guarded in instance_loader.go:121-124, but worth a comment in the factory docstring noting the intentional deviation from other factories

11. context.Background() in runQRFlow credential save (qr.go:134) — likely intentional (don't lose creds if client disconnects), but worth a comment

Verdict

Request changes — The 2 data races in policy.go, the Makefile --no-cache regression, and the directory permission issue should be addressed. The data races are particularly concerning since message handling is concurrent.

Quick fixes:

  1. policy.go:47 → use c.session() instead of c.sess
  2. policy.go:99 → capture sess := c.session() before sess.UID
  3. auth.go:9007550700
  4. Makefile → revert --no-cache
  5. Add temp file cleanup for downloaded images

@viettranx viettranx merged commit 0f5dd08 into nextlevelbuilder:main Mar 3, 2026
itsddvn pushed a commit to itsddvn/goclaw that referenced this pull request Mar 4, 2026
…lbuilder#32)

* feat(channels): implement Zalo Personal Chat (ZCA) protocol layer

Implement complete Zalo Personal Chat integration including:
- Message protocol layer (request/response/event types)
- Connection management with auth flow
- Message sending/receiving with text and media support
- User/group management and sync
- Telegram-style contact and conversation handling
- Comprehensive unit tests with 85%+ coverage

Architecture follows existing channel patterns (Telegram, Feishu) with
raw API calls for session management and message delivery. Includes
error handling, rate limiting awareness, and logging.

* feat(channels): add Zalo Personal channel integration layer

Wire protocol package to GoClaw's channel system:
- channel.go: Channel struct, Start/Stop/Send, listenLoop, message handlers
- auth.go: credential resolution (preloaded > file > QR), persistence
- policy.go: DM/group policy, @mention gating, pairing with debounce
- factory.go: managed mode factory (requires credentials, no QR)
- cmd/gateway.go: register standalone + managed factory

* feat(ui): add Zalo Personal channel type to web dashboard

Add zalo_personal to channel type dropdown, credential fields
(IMEI, cookie, userAgent), and config schema (DM/group policy,
require_mention, allow_from).

* feat(channels): add WebSocket QR login for Zalo Personal channel

Add real-time QR code login flow for zalo_personal channel instances
in managed mode. Users create an instance without credentials, then
trigger QR login from the web dashboard.

Backend:
- New RPC method zalo.personal.qr.start with per-instance mutex
- QR PNG pushed via client-scoped WS events (not broadcast)
- Credentials encrypted and saved to DB on successful scan
- Cache invalidation triggers automatic channel reload/start
- Factory returns nil,nil for missing credentials (skip, not error)
- Instance loader handles nil-channel gracefully

Frontend:
- ZaloPersonalQRDialog with auto-start, retry, and auto-close
- QR button in channel instances table for zalo_personal type
- Credential fields no longer required (auto-populated via QR)

* fix(channels): skip redundant LoginWithCredentials after QR login

QR flow already validates session via qrCheckSession + qrGetUserInfo.
Calling LoginWithCredentials again conflicts with the active QR session
state, causing "empty response" errors. Credentials are validated when
the channel starts instead. Also rename log prefix from "zca" to
"Zalo Personal".

* fix(channels): fix Zalo Personal cookie domain for login API

BuildCookieJar only set cookies for chat.zalo.me but the login API
uses wpa.chat.zalo.me. Cookies weren't sent to the subdomain, causing
"empty response" on channel startup. Now sets cookies for both hosts.

* fix(channels): move UTF-8 check after gzip decompression in Zalo listener

The UTF-8 validity check in decryptAESGCMPayload ran on raw decrypted
bytes before gzip decompression, causing all encType=2 (AES-GCM+gzip)
messages to fail with "decrypted payload is not valid UTF-8".

Move the check to decryptEventData so it runs after all processing
(decryption + decompression) is complete.

* feat(channels): add QR-only onboarding and contacts picker for Zalo Personal

- Remove credential text fields for zalo_personal, show QR auth info banner
- Add has_credentials boolean to HTTP and WS mask functions
- Implement FetchFriends/FetchGroups protocol (encrypted Zalo API)
- Add zalo.personal.contacts WS RPC method with parallel fetch
- Create ZaloContactsPicker component with search, selection, manual entry
- Integrate picker in channel instance edit dialog for allow_from config

* refactor(channels): rename zca error prefix to zalo_personal across protocol package

* fix(channels): unwrap inner response envelope in Zalo contacts decryption

The Zalo API returns double-wrapped responses: outer envelope contains
encrypted base64 data, which when decrypted yields another Response
envelope with error_code and data fields. The decryptDataField helper
was returning the raw decrypted bytes without unwrapping the inner
envelope, causing json unmarshal failures when parsing friends/groups.

* fix(channels): pass version 0 for group details to get full data

The Zalo group info endpoint uses a version-based caching mechanism.
Passing the actual version from step 1 causes the server to return
the group in "unchangedsGroup" with empty "gridInfoMap". By passing
version 0 for all groups, we force the server to return full group
info including name, avatar, and member count.

* fix(ui): auto-load contacts on modal reopen to resolve display names

When the edit modal is reopened with already-selected contact IDs,
contacts are now auto-fetched so badges show display names instead
of raw numeric IDs.

* fix(channels): handle gzip-compressed response in Zalo SendMessage

SendMessage used io.ReadAll + json.Unmarshal directly but the response
is gzip-compressed (Accept-Encoding: gzip header). Use readJSON() which
handles gzip decompression, fixing "invalid character '\x1f'" errors.

* fix(channels): decrypt encrypted send response in Zalo SendMessage

The Zalo send message API response is encrypted like all other endpoints.
Parse outer envelope, decrypt the data field, then extract msgId from
the decrypted inner response.

* feat(channels): improve Zalo listener reliability and UI channel wizard

- Migrate WebSocket client from gorilla to coder/websocket, eliminating
  unsafe/reflect hacks for RSV1 decompression and buffer inspection
- Add channel-level restart with exponential backoff (2s→60s cap, max 10)
  so channels auto-recover instead of stopping permanently
- Reset listener retry counters after 60s stable connection to prevent
  long-lived connections from exhausting retry budget
- Add code 3000 (duplicate session) recovery with 60s initial delay
- Detect silent disconnects via read deadline (2.5x ping interval)
- Fix Stop() to always cancel context, preventing reconnect timer leaks
- Refactor UI channel form into wizard-based flow with registry pattern
- Auto-refresh channel status after create/update dialog closes

* refactor(channels): move Zalo RPC methods to zalomethods package

Move Zalo personal channel RPC handlers from internal/gateway/methods to
internal/channels/zalo/personal/zalomethods, improving code organization
and removing prefix redundancy. Rename types: ZaloPersonalQRMethods →
QRMethods, ZaloPersonalContactsMethods → ContactsMethods.

- Move zalo_personal_qr.go → zalomethods/qr.go
- Move zalo_personal_contacts.go → zalomethods/contacts.go
- Update imports in cmd/gateway.go (2 call sites)
- Update internal/channels/zalo/personal imports

* feat(channels): add typing indicator to Zalo Personal channel

Show "typing..." in Zalo while the LLM processes messages, matching
the Telegram/Discord pattern. Uses the shared typing.Controller with
4s keepalive (Zalo typing expires ~5s) and 60s TTL safety net.

* feat(channels): handle image attachments in Zalo Personal channel

- Add Raw field to Content struct to preserve non-string JSON payloads
- Add Attachment struct with IsImage() detection (ext + Zalo CDN paths)
- Add AttachmentText() for human-readable placeholders (image/file/other)
- Download image attachments to temp files for agent vision pipeline
- Non-image files get text placeholder only (no download)
- Fix URL query param stripping in file extension detection

* fix(channels): switch Zalo WS client to gorilla/websocket with cookie jar fix

coder/websocket did not propagate session cookies for wss:// URLs,
causing Zalo backend to reject connections with "zpw_sek not found".
Switch to gorilla/websocket which handles wss→https scheme conversion
natively. Add wsJar safety wrapper and fix Close() mutex consistency.

Also update Makefile `up` target to use --no-cache builds.

* fix(channels): inject cookies manually for Zalo WS connection

Replace wsJar wrapper with direct cookie injection from chat.zalo.me
base domain. Fixes host-only cookies (zpw_sek) not matching WS
subdomains (ws*-msg.chat.zalo.me) due to Go cookiejar limitations.

* fix(channels): harden Zalo Personal channel security and concurrency

- Add SSRF protection to downloadFile using CheckSSRF (URL validation,
  private IP blocking, DNS pinning) with context and 30s timeout
- Protect c.sess/c.listener with sync.RWMutex to eliminate data races
  during restart; add thread-safe session()/getListener() accessors
- Add stopped flag + reconnTimer to Listener to prevent zombie reconnects
  after Stop(); timer cancelled on Stop(), checked before Start()
- Fix QR flow using context.Background() detached from WS client; now
  derives from parent ctx so flow cancels on client disconnect
- Set initial 30s read deadline for cipher key handshake to prevent
  indefinite blocking before ping loop starts
- Use defer in WSClient.Close() to prevent connection leak on panic
- Document ReadMessage ctx limitation and two-layer reconnect design

* chore: remove unused gobwas/ws dependency from go.mod

gobwas/ws was a leftover from the previous coder/websocket usage,
no longer imported by any Go source files.

* fix(channels): align Zalo Personal policy defaults across UI and backend

Policy defaults were inconsistent across three layers causing group/DM
allowlist enforcement to silently fail. New() applied "allowlist" default
to local vars but never wrote back to config; checkGroupPolicy() then
read empty string and defaulted to "open", bypassing the allowlist.
UI Select components displayed schema defaults visually without
persisting them to configValues, so DB config never stored the policy.
viettranx added a commit that referenced this pull request Mar 16, 2026
* fix(channels): start outbound dispatcher before channel check

StartAll() returned early when no channels existed at boot,
skipping the dispatchOutbound goroutine. Channels loaded later
via Reload() assumed the dispatcher was running, causing outbound
messages (agent responses) to never reach Telegram.

Move dispatcher startup before the empty-channel early return so
dynamically loaded channels always have a running consumer.

* feat(ui): add LLM provider warning on overview page and ignore plans dir

Show alert when no providers configured or all disabled, linking to provider settings. Add plans/ to .gitignore.

* feat(onboard): add provider connectivity verification and placeholder seeding

- Add onboard_verify.go: verify API keys via POST to chat/completions
  endpoint (401/403 = fatal, 400/422 = key valid, 5xx = warn)
- Verify all configured providers before seeding in auto-onboard
- Seed disabled placeholder providers (OpenRouter, Synthetic, AliCloud
  API/Sub) for UI discoverability after managed data seeding

* fix: use model ID as display name in OpenAI-compatible provider list

The `owned_by` field (e.g. "system") was incorrectly used as the model
display name, causing all models to show as "system" in the UI dropdown
for providers like AliCloud DashScope.

* fix(chat): show all active agents in chat dropdown

Chat agent selector showed "No agents available" because:
- WS agents.list only returned in-memory router cache (empty in managed mode)
- useEffect had stale [ws] dep that never re-fired after connect

Frontend: switch agent-selector from WS to HTTP /v1/agents API with
proper access control (ListAccessible). Backend: add store-backed
agents.list for WS consumers + Router.IsRunning() helper.

* feat: channel-isolated workspace, resolvePath fix, create_image workspace, summoner Expertise section, bus Topic constants

- Fix resolvePath for nested non-existent dirs (use resolveThroughExistingAncestors)
- Channel-isolated workspace: user_agent_profiles.workspace stores channel prefix,
  used as source of truth with backward compat for existing users
- Loop caches workspace per-user with CacheKindUserWorkspace invalidation via pubsub
- ContractHome/ExpandHome for portable ~-based paths in DB
- create_image saves to workspace/generated/YYYY-MM-DD/ instead of OS temp dir
- SOUL.md template: add ## Expertise section for domain knowledge
- Summoner buildEditPrompt: section guide, complete file output, frontmatter update
- Bus: Topic* constants for Subscribe/Broadcast keys, CacheKind* for payload kinds
- Teams, delegates, sessions, agent links: various enhancements

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(telegram): port forum topic features from TS — per-topic config, DM threads, thread fallback, createForumTopic tool, Web UI

Port 4 missing Telegram forum/topic features from TypeScript OpenClaw:

1. Thread-not-found fallback: retry sends without message_thread_id when
   a topic is deleted (sendHTML, sendPhoto, sendVideo, sendAudio,
   sendDocument, stream flush).

2. Per-topic config: hierarchical config resolution (global → wildcard
   group "*" → specific group → specific topic) for groupPolicy,
   requireMention, allowFrom, enabled, skills, systemPrompt.
   New TelegramGroupConfig/TelegramTopicConfig structs, resolveTopicConfig()
   with 10 unit tests.

3. DM topic support: preserve message_thread_id in private chats for
   session isolation. New BuildDMThreadSessionKey, parseRawChatID handles
   🧵 suffix.

4. createForumTopic agent tool: ForumTopicCreator interface decoupled
   from telego, lazy bot resolution via channel manager.

5. Web UI: structured group/topic config form with tri-state booleans
   (Inherit/Yes/No), nested collapsible group and topic entries.

Also fix: forum group pairing reply and approval notification now
correctly set MessageThreadID so messages land in the right topic.
Send() extracts threadID from localKey suffix as fallback for cases
where metadata is absent (e.g. pairing approval via SendToChannel).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: Propagate local key for subagent, delegation, and team messages to enable topic/thread-specific routing and context.

* feat: add a hint to bot reply bodies indicating full content is in session history for LLM context.

* fix(store): handle NULL JSONB columns in MCP server scan (#40)

* fix(store): handle NULL JSONB columns in MCP server scan

Scan JSONB nullable columns (args, headers, env, settings) into *[]byte
instead of directly into json.RawMessage to prevent silent scan failures
when database values are SQL NULL. Also initialize result slices with
make() to return empty JSON arrays instead of null.

* fix(store): keep settings scanning direct since column is NOT NULL

* feat(security): enforce group file writer restrictions + harden exec against env/config leaks

Group writer enforcement (managed mode):
- GroupWriterCache with 5min TTL wrapping AgentStore.ListGroupFileWriters
- Tool-level blocking: write_file, edit, read_file (SOUL.md/AGENTS.md), cron mutations
- System prompt injection: non-writers get refusal instructions + filtered context files
- Cache invalidation via bus events on add/remove writer
- Wired through resolver, loop, gateway_managed, gateway_callbacks

Exec security hardening:
- Block /proc/PID/environ and /proc/self/environ reads (env var exfiltration)
- Block strings on /proc files (binary env dump)
- DenyPaths() on ExecTool: block data dir, .goclaw/, config file from exec commands
- Scrub VIRTUAL_* env vars from tool output

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: Refine tool policies with updated groups and aliases, and enhance credential scrubbing by dynamically detecting and redacting server IPs.

* feat: Add tool allow list configuration and enforcement for Telegram channels, allowing per-group/topic tool restrictions.

* feat: Add support for sending and receiving media attachments in the Feishu channel.

* feat: Add Feishu channel configuration options for topic session mode, message limits, and group allowlist, refine existing field descriptions, and create a staging tarball.

* feat(skills): per-agent skill filtering with grant-based access control (#45)

* fix(store): expand tilde in skills storage directory path

The default skillsDir (~/.goclaw/skills-store) was not expanded,
causing os.MkdirAll to fail when creating skill upload directories.

* feat(skills): per-agent skill filtering with grant-based access control (#42)

Wire skill_agent_grants into the agent resolver so each agent only sees
skills explicitly granted to it. Add Skills tab to the web UI for
managing per-agent skill grants with toggle switches.

- Add SkillAccessStore interface to avoid import cycles
- Filter skills in resolver via ListAccessible + filesystem union
- Add GET /v1/agents/:id/skills endpoint with grant status
- Invoke onGrantChange callback to invalidate agent caches on grant/revoke
- Add agent-skills-tab React component with Switch toggles
- Allow read_file access to managed skills-store directory
- Fix rows.Err() propagation in ListAccessible/ListWithGrantStatus

Closes #42

* feat: centralize agent skill access filtering within the skill search tool and implement optimistic UI updates for skill grants

* feat: Mount channel webhook handlers directly on the main gateway.

* feat: restrict /reset command to file writers in Telegram group chats.

* fix: improve spawn tool team_task_id validation and orphan detection

When LLMs call team_tasks create + spawn in parallel, the spawn
tool receives a hallucinated task_id that fails uuid.Parse, causing
a misleading error and bypassing orphan detection.

- Include pending task IDs in spawn error message so LLM can retry
  with the correct UUID
- Move spawn counting to post-execution so failed spawns don't
  increment teamTaskSpawns, allowing orphan detection to fire

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(feishu): group pairing uses group-level ID instead of per-user

Changed Lark group pairing to use "group:<chatID>" as sender_id
(matching Telegram pattern) so one approval covers the entire group.
Added approvedGroups in-memory cache to avoid DB queries per message.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(feishu): mention detection fallback when botOpenID is empty

If probeBotInfo fails (missing bot:read permission), botOpenID is empty
and mention detection always returns false — causing all group messages
to be silently recorded to history instead of processed.

Now treats any mention as bot mention when botOpenID is unknown.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(channels): introduce Zalo Personal channel integration (#32)

* feat(channels): implement Zalo Personal Chat (ZCA) protocol layer

Implement complete Zalo Personal Chat integration including:
- Message protocol layer (request/response/event types)
- Connection management with auth flow
- Message sending/receiving with text and media support
- User/group management and sync
- Telegram-style contact and conversation handling
- Comprehensive unit tests with 85%+ coverage

Architecture follows existing channel patterns (Telegram, Feishu) with
raw API calls for session management and message delivery. Includes
error handling, rate limiting awareness, and logging.

* feat(channels): add Zalo Personal channel integration layer

Wire protocol package to GoClaw's channel system:
- channel.go: Channel struct, Start/Stop/Send, listenLoop, message handlers
- auth.go: credential resolution (preloaded > file > QR), persistence
- policy.go: DM/group policy, @mention gating, pairing with debounce
- factory.go: managed mode factory (requires credentials, no QR)
- cmd/gateway.go: register standalone + managed factory

* feat(ui): add Zalo Personal channel type to web dashboard

Add zalo_personal to channel type dropdown, credential fields
(IMEI, cookie, userAgent), and config schema (DM/group policy,
require_mention, allow_from).

* feat(channels): add WebSocket QR login for Zalo Personal channel

Add real-time QR code login flow for zalo_personal channel instances
in managed mode. Users create an instance without credentials, then
trigger QR login from the web dashboard.

Backend:
- New RPC method zalo.personal.qr.start with per-instance mutex
- QR PNG pushed via client-scoped WS events (not broadcast)
- Credentials encrypted and saved to DB on successful scan
- Cache invalidation triggers automatic channel reload/start
- Factory returns nil,nil for missing credentials (skip, not error)
- Instance loader handles nil-channel gracefully

Frontend:
- ZaloPersonalQRDialog with auto-start, retry, and auto-close
- QR button in channel instances table for zalo_personal type
- Credential fields no longer required (auto-populated via QR)

* fix(channels): skip redundant LoginWithCredentials after QR login

QR flow already validates session via qrCheckSession + qrGetUserInfo.
Calling LoginWithCredentials again conflicts with the active QR session
state, causing "empty response" errors. Credentials are validated when
the channel starts instead. Also rename log prefix from "zca" to
"Zalo Personal".

* fix(channels): fix Zalo Personal cookie domain for login API

BuildCookieJar only set cookies for chat.zalo.me but the login API
uses wpa.chat.zalo.me. Cookies weren't sent to the subdomain, causing
"empty response" on channel startup. Now sets cookies for both hosts.

* fix(channels): move UTF-8 check after gzip decompression in Zalo listener

The UTF-8 validity check in decryptAESGCMPayload ran on raw decrypted
bytes before gzip decompression, causing all encType=2 (AES-GCM+gzip)
messages to fail with "decrypted payload is not valid UTF-8".

Move the check to decryptEventData so it runs after all processing
(decryption + decompression) is complete.

* feat(channels): add QR-only onboarding and contacts picker for Zalo Personal

- Remove credential text fields for zalo_personal, show QR auth info banner
- Add has_credentials boolean to HTTP and WS mask functions
- Implement FetchFriends/FetchGroups protocol (encrypted Zalo API)
- Add zalo.personal.contacts WS RPC method with parallel fetch
- Create ZaloContactsPicker component with search, selection, manual entry
- Integrate picker in channel instance edit dialog for allow_from config

* refactor(channels): rename zca error prefix to zalo_personal across protocol package

* fix(channels): unwrap inner response envelope in Zalo contacts decryption

The Zalo API returns double-wrapped responses: outer envelope contains
encrypted base64 data, which when decrypted yields another Response
envelope with error_code and data fields. The decryptDataField helper
was returning the raw decrypted bytes without unwrapping the inner
envelope, causing json unmarshal failures when parsing friends/groups.

* fix(channels): pass version 0 for group details to get full data

The Zalo group info endpoint uses a version-based caching mechanism.
Passing the actual version from step 1 causes the server to return
the group in "unchangedsGroup" with empty "gridInfoMap". By passing
version 0 for all groups, we force the server to return full group
info including name, avatar, and member count.

* fix(ui): auto-load contacts on modal reopen to resolve display names

When the edit modal is reopened with already-selected contact IDs,
contacts are now auto-fetched so badges show display names instead
of raw numeric IDs.

* fix(channels): handle gzip-compressed response in Zalo SendMessage

SendMessage used io.ReadAll + json.Unmarshal directly but the response
is gzip-compressed (Accept-Encoding: gzip header). Use readJSON() which
handles gzip decompression, fixing "invalid character '\x1f'" errors.

* fix(channels): decrypt encrypted send response in Zalo SendMessage

The Zalo send message API response is encrypted like all other endpoints.
Parse outer envelope, decrypt the data field, then extract msgId from
the decrypted inner response.

* feat(channels): improve Zalo listener reliability and UI channel wizard

- Migrate WebSocket client from gorilla to coder/websocket, eliminating
  unsafe/reflect hacks for RSV1 decompression and buffer inspection
- Add channel-level restart with exponential backoff (2s→60s cap, max 10)
  so channels auto-recover instead of stopping permanently
- Reset listener retry counters after 60s stable connection to prevent
  long-lived connections from exhausting retry budget
- Add code 3000 (duplicate session) recovery with 60s initial delay
- Detect silent disconnects via read deadline (2.5x ping interval)
- Fix Stop() to always cancel context, preventing reconnect timer leaks
- Refactor UI channel form into wizard-based flow with registry pattern
- Auto-refresh channel status after create/update dialog closes

* refactor(channels): move Zalo RPC methods to zalomethods package

Move Zalo personal channel RPC handlers from internal/gateway/methods to
internal/channels/zalo/personal/zalomethods, improving code organization
and removing prefix redundancy. Rename types: ZaloPersonalQRMethods →
QRMethods, ZaloPersonalContactsMethods → ContactsMethods.

- Move zalo_personal_qr.go → zalomethods/qr.go
- Move zalo_personal_contacts.go → zalomethods/contacts.go
- Update imports in cmd/gateway.go (2 call sites)
- Update internal/channels/zalo/personal imports

* feat(channels): add typing indicator to Zalo Personal channel

Show "typing..." in Zalo while the LLM processes messages, matching
the Telegram/Discord pattern. Uses the shared typing.Controller with
4s keepalive (Zalo typing expires ~5s) and 60s TTL safety net.

* feat(channels): handle image attachments in Zalo Personal channel

- Add Raw field to Content struct to preserve non-string JSON payloads
- Add Attachment struct with IsImage() detection (ext + Zalo CDN paths)
- Add AttachmentText() for human-readable placeholders (image/file/other)
- Download image attachments to temp files for agent vision pipeline
- Non-image files get text placeholder only (no download)
- Fix URL query param stripping in file extension detection

* fix(channels): switch Zalo WS client to gorilla/websocket with cookie jar fix

coder/websocket did not propagate session cookies for wss:// URLs,
causing Zalo backend to reject connections with "zpw_sek not found".
Switch to gorilla/websocket which handles wss→https scheme conversion
natively. Add wsJar safety wrapper and fix Close() mutex consistency.

Also update Makefile `up` target to use --no-cache builds.

* fix(channels): inject cookies manually for Zalo WS connection

Replace wsJar wrapper with direct cookie injection from chat.zalo.me
base domain. Fixes host-only cookies (zpw_sek) not matching WS
subdomains (ws*-msg.chat.zalo.me) due to Go cookiejar limitations.

* fix(channels): harden Zalo Personal channel security and concurrency

- Add SSRF protection to downloadFile using CheckSSRF (URL validation,
  private IP blocking, DNS pinning) with context and 30s timeout
- Protect c.sess/c.listener with sync.RWMutex to eliminate data races
  during restart; add thread-safe session()/getListener() accessors
- Add stopped flag + reconnTimer to Listener to prevent zombie reconnects
  after Stop(); timer cancelled on Stop(), checked before Start()
- Fix QR flow using context.Background() detached from WS client; now
  derives from parent ctx so flow cancels on client disconnect
- Set initial 30s read deadline for cipher key handshake to prevent
  indefinite blocking before ping loop starts
- Use defer in WSClient.Close() to prevent connection leak on panic
- Document ReadMessage ctx limitation and two-layer reconnect design

* chore: remove unused gobwas/ws dependency from go.mod

gobwas/ws was a leftover from the previous coder/websocket usage,
no longer imported by any Go source files.

* fix(channels): align Zalo Personal policy defaults across UI and backend

Policy defaults were inconsistent across three layers causing group/DM
allowlist enforcement to silently fail. New() applied "allowlist" default
to local vars but never wrote back to config; checkGroupPolicy() then
read empty string and defaulted to "open", bypassing the allowlist.
UI Select components displayed schema defaults visually without
persisting them to configValues, so DB config never stored the policy.

* feat: Resolve Feishu message mentions by stripping bot mentions and replacing user mentions with names.

* fix(zalo_personal): data races in policy, directory perms, Makefile --no-cache

- Fix 2 data races in policy.go: sendPairingReply and checkGroupPolicy
  accessed c.sess without the read lock — use c.session() accessor
- Fix credentials directory permissions: 0755 → 0700 to prevent other
  users from listing contents
- Revert Makefile --no-cache (debugging leftover that disables Docker
  layer caching)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: mid-loop context compaction + team task user scoping

Add mid-loop compaction to prevent context overflow during long-running
delegated agent runs (e.g. 225K+ tokens causing DashScope timeouts).
Uses same threshold as maybeSummarize (contextWindow * historyShare)
with actual PromptTokens from LLM response. Only compacts the in-memory
messages slice; pendingMsgs preserves full history for session flush.

Add user_id/channel columns to team_tasks so end users only see their
own tasks. Delegate/system channels bypass the filter to see all tasks.
Group chats use the group-scoped UserID (group:channel:chatID) so all
members share visibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: bump RequiredSchemaVersion to 8 for team_tasks user_id migration

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: Handle bot commands before enriching content to prevent parsing issues with reply/forward context.

* fix(web_fetch): replace regex HTML parsing with DOM-based extraction

Regex-based htmlToMarkdown/htmlToText leaked CSS, JS, and non-content
elements. Replaced with golang.org/x/net/html DOM parser that extracts
<body> only and skips 16 non-content element types (script, style,
noscript, svg, template, iframe, form, nav, footer, etc.).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(delegation): add pending tasks hint when team_task_id not found

LLM models often hallucinate UUIDs when delegating, passing a wrong
team_task_id that doesn't exist. Previously the error was bare
("task not found") with no guidance, causing the model to get stuck.

Now the error includes a list of pending tasks so the model can
self-correct. Also refactored prepareDelegation to resolve team once
instead of 3 separate GetTeamForAgent calls, and extracted
pendingTasksHint() to deduplicate hint-building logic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(web_fetch): increase read limit and add empty content detection

- Read limit: maxChars*4 → max(maxChars*10, 512KB) to handle pages with
  large <head> sections (WordPress sites often have 30-50KB+ heads)
- Add warning message when HTML extraction returns empty despite non-empty
  response body (bot protection, JS-only pages)
- Enable HTTP/2 via ForceAttemptHTTP2 on custom Transport

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(delegation): notify user on failure before retry + pending tasks hint

Two improvements to delegation UX:

1. When a delegation fails, the announce now instructs the coordinator
   to send a brief friendly message to the user before retrying, so
   users aren't left waiting in silence for minutes.

2. When spawn is called with a wrong team_task_id (LLM hallucinated
   UUID), the error now includes a list of pending tasks so the model
   can self-correct. Also refactored prepareDelegation to resolve team
   once instead of 3 separate GetTeamForAgent calls.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(delegation): clear senderID in sync delegation context

Sync delegations inherited the caller's senderID, causing the delegate
agent to check group writer permissions against its own (empty) writer
list instead of bypassing like async delegations do. This resulted in
"permission denied: only file writers can modify files" errors when
delegate agents tried to write files.

Fix: clear senderID from the sync delegation context so it behaves
consistently with async delegations (context.Background has no
senderID). All 4 downstream usages of SenderIDFromContext are
group-writer-related and correctly bypass when senderID is empty.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: overhaul documentation for v0.2–v0.5 features

Add new docs for agent teams (11) and extended thinking (12).
Major rewrite of channels/messaging (05) with Telegram forum topics,
Feishu streaming cards, Zalo Personal. Update providers (02), tools (03),
bootstrap/skills (07), security (09), architecture (00), scheduling (08),
and tracing (10) with current implementation details.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: Update README

* fix(teams): scope ListTasks by userID to prevent cross-group task leaking

Tasks from one Telegram group were being injected into another group's
session because the pending-task hint and /tasks command queried all
tasks team-wide without filtering by the group-scoped userID.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(telegram): add draft streaming infrastructure + split dm/group stream config

- Add sendMessageDraft transport (disabled pending Telegram client fix for
  "reply to deleted message" artifact — tdesktop#10315, bugs.telegram.org/c/561)
- Split stream_mode into dm_stream/group_stream boolean flags (both default false)
- DM messages no longer set reply_to_message_id (cleaner UX, matching TS)
- Progressive placeholder editing for DMs: "Thinking..." → stream chunks → final
- Update web UI with separate DM/Group streaming toggles

fix(agent): prevent false MEDIA: detection in tool output

parseMediaResult() used strings.Index to find "MEDIA:" anywhere in tool output,
causing false positives when external content (e.g. GitHub releases page)
contained commit messages like "return MEDIA: path from screenshot".
Changed to strings.HasPrefix to only match at start of output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(delegation): inject dependency results + guard completed task reuse

- Add injectDependencyResults() to auto-inject blocked_by task results
  into delegation context, so delegatees receive prior results without
  needing to search for them (orchestrator-worker pattern)
- Guard against spawning with completed/cancelled team_task_id to
  enforce one-task-per-delegation rule
- Add cross-user scope guard in prepareDelegation() to prevent
  cross-group task leak (delegate/system channels bypass by design)
- Track CompletedTaskIDs in DelegateArtifacts and include them in
  announce messages so lead agent knows not to reuse completed IDs
- UI: reduce trace detail preview heights for better readability

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: Prioritize timeout error handling and renumber error classification.

* fix(chat): show all active agents in chat dropdown (#48)

* fix(channels): start outbound dispatcher before channel check

StartAll() returned early when no channels existed at boot,
skipping the dispatchOutbound goroutine. Channels loaded later
via Reload() assumed the dispatcher was running, causing outbound
messages (agent responses) to never reach Telegram.

Move dispatcher startup before the empty-channel early return so
dynamically loaded channels always have a running consumer.

* feat(ui): add LLM provider warning on overview page and ignore plans dir

Show alert when no providers configured or all disabled, linking to provider settings. Add plans/ to .gitignore.

* feat(onboard): add provider connectivity verification and placeholder seeding

- Add onboard_verify.go: verify API keys via POST to chat/completions
  endpoint (401/403 = fatal, 400/422 = key valid, 5xx = warn)
- Verify all configured providers before seeding in auto-onboard
- Seed disabled placeholder providers (OpenRouter, Synthetic, AliCloud
  API/Sub) for UI discoverability after managed data seeding

* fix: use model ID as display name in OpenAI-compatible provider list

The `owned_by` field (e.g. "system") was incorrectly used as the model
display name, causing all models to show as "system" in the UI dropdown
for providers like AliCloud DashScope.

* fix(chat): show all active agents in chat dropdown

Chat agent selector showed "No agents available" because:
- WS agents.list only returned in-memory router cache (empty in managed mode)
- useEffect had stale [ws] dep that never re-fired after connect

Frontend: switch agent-selector from WS to HTTP /v1/agents API with
proper access control (ListAccessible). Backend: add store-backed
agents.list for WS consumers + Router.IsRunning() helper.

* fix(security): prevent agent list leaking on empty userID or store error

- Return error instead of falling through to unfiltered router cache
  when userID is empty or DB query fails in managed mode
- Add empty-string guard to isOwnerUser to prevent false owner match

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: ntduc <ntduc@cpp.ai.vn>
Co-authored-by: viettranx <viettranx@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: correct cron job delivery to Discord channels (#47)

- Override LLM-provided channel ID with context value to prevent
  misrouted deliveries (LLM was confusing guild ID with channel ID)
- Send cron reminder message directly instead of agent response
  so reminders appear as bot notifications in Discord

* fix(teams): filter empty chat_id scopes to prevent Select crash

Radix UI Select.Item requires non-empty value prop. Scope entries
with empty chat_id caused uncaught error on team detail page.

---------

Co-authored-by: ntduc <ntduc@cpp.ai.vn>
Co-authored-by: viettranx <viettranx@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Thieu Nguyen <79964592+thieung@users.noreply.github.com>
Co-authored-by: Duc Nguyen <me@vanducng.dev>
Co-authored-by: Winter279 <103654924+Winter279@users.noreply.github.com>
mrgoonie added a commit that referenced this pull request May 20, 2026
* fix(security): harden upstream critical surfaces

Refs #30

* fix(security): close pre-landing review gaps

Refs #30

* fix(security): close official release blockers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: implement Zalo Personal channel

2 participants