Skip to content

fix(memory): prevent silent vector index degradation when embedding provider temporarily unavailable#85704

Merged
steipete merged 6 commits into
openclaw:mainfrom
yaaboo-gif:fix/memory-vector-degradation-guard
May 25, 2026
Merged

fix(memory): prevent silent vector index degradation when embedding provider temporarily unavailable#85704
steipete merged 6 commits into
openclaw:mainfrom
yaaboo-gif:fix/memory-vector-degradation-guard

Conversation

@yaaboo-gif
Copy link
Copy Markdown
Contributor

@yaaboo-gif yaaboo-gif commented May 23, 2026

Summary

Prevents memory sync from silently downgrading an existing semantic vector index to FTS-only when the configured embedding provider is temporarily unavailable.

This keeps fresh/default FTS-only indexes working, but refuses destructive fallback when existing metadata proves the index was semantic.

Changes

  • Clear stale provider-null initialization state before aborting so a later sync can retry provider discovery after the embedding provider recovers.
  • Guard both normal sync and safe reindex paths, including embedding-error fallback paths, before rebuilding in FTS-only mode.
  • Add regression coverage for the semantic-index outage guard and update direct sync-op test harnesses for the new retry hook.
  • Add changelog credit for @yaaboo-gif.

Real behavior proof

Behavior addressed: Existing semantic memory indexes should not be rewritten as FTS-only indexes just because the configured embedding provider is temporarily unavailable.

Real environment tested: OpenClaw 2026.5.20 on macOS arm64 with a local MLX embedding server using jina-embeddings-v5-text-small on 127.0.0.1:8123, plus local OpenClaw source checkout on macOS for the maintainer refresh at branch head b5f0441.

Exact steps or command run after this patch:

Contributor live proof:

  1. Applied the same provider-retry and semantic-index guard as a production hot-fix to the built memory-core bundle.
  2. Restarted the OpenClaw Gateway.
  3. Ran openclaw memory status --deep against a real memory store.
  4. Intentionally stopped the MLX embedding server.
  5. Triggered memory sync/search activity.
  6. Restarted the MLX embedding server without restarting the Gateway.
  7. Ran memory_search("宙狗 方案").

Maintainer refresh proof:

OPENCLAW_VITEST_MAX_WORKERS=1 pnpm test extensions/memory-core/src/memory/manager.fts-only-reindex.test.ts extensions/memory-core/src/memory/manager-sync-ops.startup-catchup.test.ts extensions/memory-core/src/memory/manager-sync-ops.archive-delta-bypass.test.ts extensions/memory-core/src/memory/manager-sync-yield.test.ts
.agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main --parallel-tests "OPENCLAW_VITEST_MAX_WORKERS=1 pnpm test extensions/memory-core/src/memory/manager.fts-only-reindex.test.ts extensions/memory-core/src/memory/manager-sync-ops.startup-catchup.test.ts extensions/memory-core/src/memory/manager-sync-ops.archive-delta-bypass.test.ts extensions/memory-core/src/memory/manager-sync-yield.test.ts"

Evidence after fix:

Contributor live runtime output from the affected setup:

$ openclaw memory status --deep
main: 11,686 chunks, semantic vectors: ready, vector dims: 1024
codex: 11,673 chunks, semantic vectors: ready

$ # MLX embedding server intentionally stopped, then memory sync/search triggered
Memory sync aborted: embedding provider "openai" is configured but unavailable.
Refusing to run sync in fts-only fallback mode to protect existing vector index (current model: jina-embeddings-v5-text-small).

$ # MLX embedding server restarted; no Gateway restart
$ memory_search("宙狗 方案")
returned 6 results with vectorScore > 0.50

Maintainer refresh output: 4 memory-core test files passed, 11 tests passed. Autoreview completed clean with no accepted/actionable findings and reported the patch correct.

Observed result after fix: The real semantic vector store was preserved during an embedding-provider outage, sync failed loudly instead of rewriting the semantic index as FTS-only, provider initialization retried after the MLX server returned, and semantic search recovered without a Gateway restart. The refreshed branch also proves fresh and already-FTS-only indexes still sync in FTS-only mode.

What was not tested: A second live MLX/Jina outage run after the maintainer refresh commits; the refreshed code path is covered by regression tests and autoreview, and the contributor live proof covers the affected production scenario.

@openclaw-barnacle openclaw-barnacle Bot added extensions: memory-core Extension: memory-core size: XS triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 23, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 23, 2026

Codex review: needs maintainer review before merge. Reviewed May 25, 2026, 9:51 AM ET / 13:51 UTC.

Summary
Adds a memory-core guard that retries embedding provider initialization and aborts FTS-only fallback during normal and safe reindex paths when existing metadata shows a semantic index, with regression coverage and a changelog entry.

PR surface: Source +42, Tests +30, Docs +1. Total +73 across 7 files.

Reproducibility: yes. with high confidence from source inspection and the PR's live proof, though I did not execute the repro locally in this read-only review. Current main treats provider:null plus semantic metadata as a full reindex path, while the supplied live output shows the affected MLX-provider outage scenario.

Review metrics: 1 noteworthy metric.

  • Fallback behavior changed: 1 guard added at 2 sync/reindex entry points. The patch changes memory fallback semantics during embedding-provider outages, which is the compatibility-sensitive part maintainers should notice before merge.

Merge readiness
Overall: 🦞 diamond lobster
Proof: 🦞 diamond lobster
Patch quality: 🦞 diamond lobster
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Risk before merge

  • Compatibility-sensitive behavior change: existing semantic-index users with a configured provider will now see memory sync abort during provider outages instead of rebuilding a lexical-only index, so maintainers should intentionally accept that fail-closed behavior before merge.

Maintainer options:

  1. Land the fail-closed guard (recommended)
    Accept the outage-time sync abort for existing semantic indexes because it preserves existing vectors and retries provider discovery when the provider returns.
  2. Ask for a compatibility mode first
    If maintainers want semantic-index users to keep rebuilding lexical-only data during provider outages, request an explicit config or migration design before merge.

Next step before merge
No ClawSweeper repair is needed; the next action is maintainer review of the compatibility-sensitive fail-closed behavior plus normal head checks.

Security
Cleared: The diff touches memory-core runtime logic, tests, and changelog only; it does not add dependencies, workflows, secrets handling, package scripts, or new supply-chain execution paths.

Review details

Best possible solution:

Land the semantic-index protection if maintainers accept fail-closed sync during provider outages, keeping fresh and already-FTS-only fallback behavior intact.

Do we have a high-confidence way to reproduce the issue?

Yes, with high confidence from source inspection and the PR's live proof, though I did not execute the repro locally in this read-only review. Current main treats provider:null plus semantic metadata as a full reindex path, while the supplied live output shows the affected MLX-provider outage scenario.

Is this the best way to solve the issue?

Yes, this is the narrow maintainable fix: clear stale provider initialization state and guard only existing semantic indexes while preserving fresh and already-FTS-only fallback behavior. The only remaining question is maintainer acceptance of the deliberate fail-closed outage behavior.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 89a21db627f5.

Label changes

Label changes:

  • add proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-fix live output from a real macOS semantic memory store showing the provider outage abort, preserved semantic vectors, and recovered vector search after provider restart.
  • add rating: 🦞 diamond lobster: Overall readiness is 🦞 diamond lobster; proof is 🦞 diamond lobster and patch quality is 🦞 diamond lobster.
  • remove rating: 🐚 platinum hermit: Current PR rating is rating: 🦞 diamond lobster, so this older rating label is no longer current.

Label justifications:

  • P1: The PR addresses a real memory workflow regression where provider outages can silently destroy semantic vector availability for existing users.
  • merge-risk: 🚨 compatibility: Merging changes outage behavior for existing semantic indexes from FTS-only fallback rebuilds to fail-closed sync aborts until the embedding provider recovers.
  • rating: 🦞 diamond lobster: Overall readiness is 🦞 diamond lobster; proof is 🦞 diamond lobster and patch quality is 🦞 diamond lobster.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (live_output): The PR body includes after-fix live output from a real macOS semantic memory store showing the provider outage abort, preserved semantic vectors, and recovered vector search after provider restart.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-fix live output from a real macOS semantic memory store showing the provider outage abort, preserved semantic vectors, and recovered vector search after provider restart.
Evidence reviewed

PR surface:

Source +42, Tests +30, Docs +1. Total +73 across 7 files.

View PR surface stats
Area Files Added Removed Net
Source 2 42 0 +42
Tests 4 30 0 +30
Docs 1 1 0 +1
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 7 73 0 +73

What I checked:

Likely related people:

  • steipete: Peter Steinberger committed the maintainer refresh commits on this PR and appears as committer on recent memory-core history, including the current provider initialization commit and state-helper extraction. (role: recent area contributor and PR refresh committer; confidence: high; commits: 0a98c2d62697, f4d8393bf4c1, be14365df899; files: extensions/memory-core/src/memory/manager.ts, extensions/memory-core/src/memory/manager-sync-ops.ts, extensions/memory-core/src/memory/manager-reindex-state.ts)
  • FullerStackDev: Current-main blame for ensureProviderInitialized and shouldRunFullMemoryReindex attributes the relevant code to commit 0a98c2d authored as FullerStackDev. (role: introduced current provider initialization and reindex behavior; confidence: medium; commits: 0a98c2d62697; files: extensions/memory-core/src/memory/manager.ts, extensions/memory-core/src/memory/manager-reindex-state.ts)
  • osolmaz: Commit 7ff29a9 added nearby local embedding worker safety, provider lifecycle, and fallback behavior that this PR interacts with during degraded provider states. (role: adjacent local embedding fallback contributor; confidence: medium; commits: 7ff29a9e6df6; files: extensions/memory-core/src/memory/manager.ts, extensions/memory-core/src/memory/manager-provider-state.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. labels May 23, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 23, 2026

ClawSweeper PR egg

✨ Hatched: 🥚 common Tiny Review Wisp

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 🥚 common.
Trait: polishes edge cases.
Image traits: location review cove; accessory little merge flag; palette sunrise gold and clean white; mood celebratory; pose peeking out from the egg shell; shell frosted glass shell; lighting soft studio lighting; background miniature CI buoys.
Share on X: post this hatch
Copy: My PR egg hatched a 🥚 common Tiny Review Wisp in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@openclaw-barnacle openclaw-barnacle Bot removed the triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. label May 23, 2026
yaaboo-gif pushed a commit to yaaboo-gif/openclaw that referenced this pull request May 23, 2026
…ic indexes

The previous guard was too broad — it blocked sync for ALL non-none
provider configurations when provider was null, including the default
'auto' path where users without embedding credentials legitimately
build FTS-only indexes.

Narrow the guard to only abort when:
1. provider is null (embedding unavailable)
2. existing index metadata has a semantic model (not 'fts-only')
3. settings.provider is configured and not 'none'

This preserves the legitimate FTS-only fallback for auto/no-provider
users while still protecting existing semantic vector indexes from
silent degradation.

Reported-by: ClawSweeper (PR openclaw#85704 review)
@yaaboo-gif
Copy link
Copy Markdown
Contributor Author

Thanks for the review. The blocking P1 compatibility issue has been addressed in the latest head (d266f062) by narrowing the guard to only abort when an existing semantic index would be downgraded to FTS-only. Fresh/default provider: "auto" users and already-FTS-only indexes remain allowed to sync without an embedding provider.

I also updated the PR body with the V2 behavior matrix and explicit after-fix live output for the real behavior proof check.

@clawsweeper re-review

@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 24, 2026

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

@openclaw-barnacle openclaw-barnacle Bot added triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. proof: supplied External PR includes structured after-fix real behavior proof. and removed proof: sufficient ClawSweeper judged the real behavior proof convincing. triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 24, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels May 24, 2026
@yaaboo-gif
Copy link
Copy Markdown
Contributor Author

Thanks to ClawSweeper and everyone who reviewed! 🙏

V2 narrowed the guard to only protect existing semantic indexes (per review feedback), leaving the legitimate FTS-only fallback intact for auto/no-provider users. All 158 checks are green.

This fix addresses silent vector data loss on Gateway restart when the embedding provider is temporarily unreachable — we've verified it in production (11,715 chunks + 1024-dim vectors fully preserved after restart with MLX offline).

Would appreciate a maintainer review when anyone has time. Happy to make any further adjustments.

yaaboo-gif and others added 6 commits May 25, 2026 14:21
…rovider temporarily unavailable

Two related bugs cause complete loss of semantic vector data:

1. Promise cache deadlock in ensureProviderInitialized():
   When the embedding provider (e.g. local MLX server on port 8123) is
   temporarily unreachable at Gateway startup, loadProviderResult() throws
   and providerInitPromise becomes a permanently-cached Rejected Promise.
   The  block only clears it on success (providerInitialized=true),
   so the stale rejection blocks all future init attempts until Gateway restart.

2. Silent fts-only overwrite in runSync():
   With the provider stuck at null, shouldRunFullMemoryReindex() compares
   the stored meta.model (e.g. 'jina-embeddings-v5-text-small') against the
   runtime provider model, and since provider is null, falls through to the
   'meta.model !== fts-only' check — returning true. This triggers a full
   reindex where every file is written as fts-only, silently erasing all
   existing 11k+ semantic vectors.

Fix 1: Clear providerInitPromise in the catch block so the next call can
retry initialization (self-healing when the provider comes back online).

Fix 2: Guard runSync() — if requestedProvider is set and not 'none', but
the runtime provider is null, throw an error instead of silently degrading
to fts-only. This protects existing vector data by failing loudly.

Tested on production: 11,715 chunks + 1024-dim vectors fully preserved
after Gateway restart with the fix applied. The guard correctly blocks
sync when MLX is offline and allows normal operation when it recovers.
The guard clause in runSync() was referencing this.requestedProvider
which is a private property on the MemoryIndexManager subclass and not
accessible from MemoryManagerSyncOps. Use this.settings.provider
instead, which is the same value and is accessible via the protected
abstract settings property.
…ic indexes

The previous guard was too broad — it blocked sync for ALL non-none
provider configurations when provider was null, including the default
'auto' path where users without embedding credentials legitimately
build FTS-only indexes.

Narrow the guard to only abort when:
1. provider is null (embedding unavailable)
2. existing index metadata has a semantic model (not 'fts-only')
3. settings.provider is configured and not 'none'

This preserves the legitimate FTS-only fallback for auto/no-provider
users while still protecting existing semantic vector indexes from
silent degradation.

Reported-by: ClawSweeper (PR openclaw#85704 review)
@steipete steipete force-pushed the fix/memory-vector-degradation-guard branch from d266f06 to b5f0441 Compare May 25, 2026 13:40
@openclaw-barnacle openclaw-barnacle Bot added size: S triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. proof: supplied External PR includes structured after-fix real behavior proof. and removed size: XS proof: supplied External PR includes structured after-fix real behavior proof. proof: sufficient ClawSweeper judged the real behavior proof convincing. triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels May 25, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. labels May 25, 2026
@steipete steipete merged commit 16ffc25 into openclaw:main May 25, 2026
203 of 220 checks passed
steipete added a commit that referenced this pull request May 25, 2026
…rovider temporarily unavailable (#85704)

* fix(memory): prevent silent vector index degradation when embedding provider temporarily unavailable

Two related bugs cause complete loss of semantic vector data:

1. Promise cache deadlock in ensureProviderInitialized():
   When the embedding provider (e.g. local MLX server on port 8123) is
   temporarily unreachable at Gateway startup, loadProviderResult() throws
   and providerInitPromise becomes a permanently-cached Rejected Promise.
   The  block only clears it on success (providerInitialized=true),
   so the stale rejection blocks all future init attempts until Gateway restart.

2. Silent fts-only overwrite in runSync():
   With the provider stuck at null, shouldRunFullMemoryReindex() compares
   the stored meta.model (e.g. 'jina-embeddings-v5-text-small') against the
   runtime provider model, and since provider is null, falls through to the
   'meta.model !== fts-only' check — returning true. This triggers a full
   reindex where every file is written as fts-only, silently erasing all
   existing 11k+ semantic vectors.

Fix 1: Clear providerInitPromise in the catch block so the next call can
retry initialization (self-healing when the provider comes back online).

Fix 2: Guard runSync() — if requestedProvider is set and not 'none', but
the runtime provider is null, throw an error instead of silently degrading
to fts-only. This protects existing vector data by failing loudly.

Tested on production: 11,715 chunks + 1024-dim vectors fully preserved
after Gateway restart with the fix applied. The guard correctly blocks
sync when MLX is offline and allows normal operation when it recovers.

* fix: use this.settings.provider instead of private requestedProvider

The guard clause in runSync() was referencing this.requestedProvider
which is a private property on the MemoryIndexManager subclass and not
accessible from MemoryManagerSyncOps. Use this.settings.provider
instead, which is the same value and is accessible via the protected
abstract settings property.

* fix(memory): narrow degradation guard to only protect existing semantic indexes

The previous guard was too broad — it blocked sync for ALL non-none
provider configurations when provider was null, including the default
'auto' path where users without embedding credentials legitimately
build FTS-only indexes.

Narrow the guard to only abort when:
1. provider is null (embedding unavailable)
2. existing index metadata has a semantic model (not 'fts-only')
3. settings.provider is configured and not 'none'

This preserves the legitimate FTS-only fallback for auto/no-provider
users while still protecting existing semantic vector indexes from
silent degradation.

Reported-by: ClawSweeper (PR #85704 review)

* test: cover memory semantic index outage guard

* fix: protect semantic memory index fallback paths

* test: update memory sync harnesses

---------

Co-authored-by: Bo Yan <yaaboo-gif@users.noreply.github.com>
Co-authored-by: Yan Bo <yanbo@Mac.lan>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
(cherry picked from commit 16ffc25)
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 26, 2026
…rovider temporarily unavailable (openclaw#85704)

* fix(memory): prevent silent vector index degradation when embedding provider temporarily unavailable

Two related bugs cause complete loss of semantic vector data:

1. Promise cache deadlock in ensureProviderInitialized():
   When the embedding provider (e.g. local MLX server on port 8123) is
   temporarily unreachable at Gateway startup, loadProviderResult() throws
   and providerInitPromise becomes a permanently-cached Rejected Promise.
   The  block only clears it on success (providerInitialized=true),
   so the stale rejection blocks all future init attempts until Gateway restart.

2. Silent fts-only overwrite in runSync():
   With the provider stuck at null, shouldRunFullMemoryReindex() compares
   the stored meta.model (e.g. 'jina-embeddings-v5-text-small') against the
   runtime provider model, and since provider is null, falls through to the
   'meta.model !== fts-only' check — returning true. This triggers a full
   reindex where every file is written as fts-only, silently erasing all
   existing 11k+ semantic vectors.

Fix 1: Clear providerInitPromise in the catch block so the next call can
retry initialization (self-healing when the provider comes back online).

Fix 2: Guard runSync() — if requestedProvider is set and not 'none', but
the runtime provider is null, throw an error instead of silently degrading
to fts-only. This protects existing vector data by failing loudly.

Tested on production: 11,715 chunks + 1024-dim vectors fully preserved
after Gateway restart with the fix applied. The guard correctly blocks
sync when MLX is offline and allows normal operation when it recovers.

* fix: use this.settings.provider instead of private requestedProvider

The guard clause in runSync() was referencing this.requestedProvider
which is a private property on the MemoryIndexManager subclass and not
accessible from MemoryManagerSyncOps. Use this.settings.provider
instead, which is the same value and is accessible via the protected
abstract settings property.

* fix(memory): narrow degradation guard to only protect existing semantic indexes

The previous guard was too broad — it blocked sync for ALL non-none
provider configurations when provider was null, including the default
'auto' path where users without embedding credentials legitimately
build FTS-only indexes.

Narrow the guard to only abort when:
1. provider is null (embedding unavailable)
2. existing index metadata has a semantic model (not 'fts-only')
3. settings.provider is configured and not 'none'

This preserves the legitimate FTS-only fallback for auto/no-provider
users while still protecting existing semantic vector indexes from
silent degradation.

Reported-by: ClawSweeper (PR openclaw#85704 review)

* test: cover memory semantic index outage guard

* fix: protect semantic memory index fallback paths

* test: update memory sync harnesses

---------

Co-authored-by: Bo Yan <yaaboo-gif@users.noreply.github.com>
Co-authored-by: Yan Bo <yanbo@Mac.lan>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
…rovider temporarily unavailable (openclaw#85704)

* fix(memory): prevent silent vector index degradation when embedding provider temporarily unavailable

Two related bugs cause complete loss of semantic vector data:

1. Promise cache deadlock in ensureProviderInitialized():
   When the embedding provider (e.g. local MLX server on port 8123) is
   temporarily unreachable at Gateway startup, loadProviderResult() throws
   and providerInitPromise becomes a permanently-cached Rejected Promise.
   The  block only clears it on success (providerInitialized=true),
   so the stale rejection blocks all future init attempts until Gateway restart.

2. Silent fts-only overwrite in runSync():
   With the provider stuck at null, shouldRunFullMemoryReindex() compares
   the stored meta.model (e.g. 'jina-embeddings-v5-text-small') against the
   runtime provider model, and since provider is null, falls through to the
   'meta.model !== fts-only' check — returning true. This triggers a full
   reindex where every file is written as fts-only, silently erasing all
   existing 11k+ semantic vectors.

Fix 1: Clear providerInitPromise in the catch block so the next call can
retry initialization (self-healing when the provider comes back online).

Fix 2: Guard runSync() — if requestedProvider is set and not 'none', but
the runtime provider is null, throw an error instead of silently degrading
to fts-only. This protects existing vector data by failing loudly.

Tested on production: 11,715 chunks + 1024-dim vectors fully preserved
after Gateway restart with the fix applied. The guard correctly blocks
sync when MLX is offline and allows normal operation when it recovers.

* fix: use this.settings.provider instead of private requestedProvider

The guard clause in runSync() was referencing this.requestedProvider
which is a private property on the MemoryIndexManager subclass and not
accessible from MemoryManagerSyncOps. Use this.settings.provider
instead, which is the same value and is accessible via the protected
abstract settings property.

* fix(memory): narrow degradation guard to only protect existing semantic indexes

The previous guard was too broad — it blocked sync for ALL non-none
provider configurations when provider was null, including the default
'auto' path where users without embedding credentials legitimately
build FTS-only indexes.

Narrow the guard to only abort when:
1. provider is null (embedding unavailable)
2. existing index metadata has a semantic model (not 'fts-only')
3. settings.provider is configured and not 'none'

This preserves the legitimate FTS-only fallback for auto/no-provider
users while still protecting existing semantic vector indexes from
silent degradation.

Reported-by: ClawSweeper (PR openclaw#85704 review)

* test: cover memory semantic index outage guard

* fix: protect semantic memory index fallback paths

* test: update memory sync harnesses

---------

Co-authored-by: Bo Yan <yaaboo-gif@users.noreply.github.com>
Co-authored-by: Yan Bo <yanbo@Mac.lan>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
…rovider temporarily unavailable (openclaw#85704)

* fix(memory): prevent silent vector index degradation when embedding provider temporarily unavailable

Two related bugs cause complete loss of semantic vector data:

1. Promise cache deadlock in ensureProviderInitialized():
   When the embedding provider (e.g. local MLX server on port 8123) is
   temporarily unreachable at Gateway startup, loadProviderResult() throws
   and providerInitPromise becomes a permanently-cached Rejected Promise.
   The  block only clears it on success (providerInitialized=true),
   so the stale rejection blocks all future init attempts until Gateway restart.

2. Silent fts-only overwrite in runSync():
   With the provider stuck at null, shouldRunFullMemoryReindex() compares
   the stored meta.model (e.g. 'jina-embeddings-v5-text-small') against the
   runtime provider model, and since provider is null, falls through to the
   'meta.model !== fts-only' check — returning true. This triggers a full
   reindex where every file is written as fts-only, silently erasing all
   existing 11k+ semantic vectors.

Fix 1: Clear providerInitPromise in the catch block so the next call can
retry initialization (self-healing when the provider comes back online).

Fix 2: Guard runSync() — if requestedProvider is set and not 'none', but
the runtime provider is null, throw an error instead of silently degrading
to fts-only. This protects existing vector data by failing loudly.

Tested on production: 11,715 chunks + 1024-dim vectors fully preserved
after Gateway restart with the fix applied. The guard correctly blocks
sync when MLX is offline and allows normal operation when it recovers.

* fix: use this.settings.provider instead of private requestedProvider

The guard clause in runSync() was referencing this.requestedProvider
which is a private property on the MemoryIndexManager subclass and not
accessible from MemoryManagerSyncOps. Use this.settings.provider
instead, which is the same value and is accessible via the protected
abstract settings property.

* fix(memory): narrow degradation guard to only protect existing semantic indexes

The previous guard was too broad — it blocked sync for ALL non-none
provider configurations when provider was null, including the default
'auto' path where users without embedding credentials legitimately
build FTS-only indexes.

Narrow the guard to only abort when:
1. provider is null (embedding unavailable)
2. existing index metadata has a semantic model (not 'fts-only')
3. settings.provider is configured and not 'none'

This preserves the legitimate FTS-only fallback for auto/no-provider
users while still protecting existing semantic vector indexes from
silent degradation.

Reported-by: ClawSweeper (PR openclaw#85704 review)

* test: cover memory semantic index outage guard

* fix: protect semantic memory index fallback paths

* test: update memory sync harnesses

---------

Co-authored-by: Bo Yan <yaaboo-gif@users.noreply.github.com>
Co-authored-by: Yan Bo <yanbo@Mac.lan>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
…rovider temporarily unavailable (openclaw#85704)

* fix(memory): prevent silent vector index degradation when embedding provider temporarily unavailable

Two related bugs cause complete loss of semantic vector data:

1. Promise cache deadlock in ensureProviderInitialized():
   When the embedding provider (e.g. local MLX server on port 8123) is
   temporarily unreachable at Gateway startup, loadProviderResult() throws
   and providerInitPromise becomes a permanently-cached Rejected Promise.
   The  block only clears it on success (providerInitialized=true),
   so the stale rejection blocks all future init attempts until Gateway restart.

2. Silent fts-only overwrite in runSync():
   With the provider stuck at null, shouldRunFullMemoryReindex() compares
   the stored meta.model (e.g. 'jina-embeddings-v5-text-small') against the
   runtime provider model, and since provider is null, falls through to the
   'meta.model !== fts-only' check — returning true. This triggers a full
   reindex where every file is written as fts-only, silently erasing all
   existing 11k+ semantic vectors.

Fix 1: Clear providerInitPromise in the catch block so the next call can
retry initialization (self-healing when the provider comes back online).

Fix 2: Guard runSync() — if requestedProvider is set and not 'none', but
the runtime provider is null, throw an error instead of silently degrading
to fts-only. This protects existing vector data by failing loudly.

Tested on production: 11,715 chunks + 1024-dim vectors fully preserved
after Gateway restart with the fix applied. The guard correctly blocks
sync when MLX is offline and allows normal operation when it recovers.

* fix: use this.settings.provider instead of private requestedProvider

The guard clause in runSync() was referencing this.requestedProvider
which is a private property on the MemoryIndexManager subclass and not
accessible from MemoryManagerSyncOps. Use this.settings.provider
instead, which is the same value and is accessible via the protected
abstract settings property.

* fix(memory): narrow degradation guard to only protect existing semantic indexes

The previous guard was too broad — it blocked sync for ALL non-none
provider configurations when provider was null, including the default
'auto' path where users without embedding credentials legitimately
build FTS-only indexes.

Narrow the guard to only abort when:
1. provider is null (embedding unavailable)
2. existing index metadata has a semantic model (not 'fts-only')
3. settings.provider is configured and not 'none'

This preserves the legitimate FTS-only fallback for auto/no-provider
users while still protecting existing semantic vector indexes from
silent degradation.

Reported-by: ClawSweeper (PR openclaw#85704 review)

* test: cover memory semantic index outage guard

* fix: protect semantic memory index fallback paths

* test: update memory sync harnesses

---------

Co-authored-by: Bo Yan <yaaboo-gif@users.noreply.github.com>
Co-authored-by: Yan Bo <yanbo@Mac.lan>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
eleboucher pushed a commit to eleboucher/homelab that referenced this pull request May 27, 2026
…026.5.26) (#682)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [ghcr.io/openclaw/openclaw](https://openclaw.ai) ([source](https://github.com/openclaw/openclaw)) | patch | `2026.5.22` → `2026.5.26` |

---

> ⚠️ **Warning**
>
> Some dependencies could not be looked up. Check the [Dependency Dashboard](issues/567) for more information.

---

### Release Notes

<details>
<summary>openclaw/openclaw (ghcr.io/openclaw/openclaw)</summary>

### [`v2026.5.26`](https://github.com/openclaw/openclaw/blob/HEAD/CHANGELOG.md#2026526)

[Compare Source](https://github.com/openclaw/openclaw/compare/v2026.5.22...v2026.5.26)

##### Highlights

- Faster Gateway and replies: startup avoids repeated plugin, channel, session, usage-cost, warning, scheduled-service, and filesystem scans; visible replies separate user-facing sends from slower follow-up work; Gateway runtime/session caches churn less under load.
- Transcripts are core: transcript-backed meeting summaries, source-provider chunks, cleaned user turns, media provenance, Codex mirrors, WebChat replies, and CLI/TUI replay now use one more reliable transcript path.
- More channels are production-ready: Telegram keeps typing/progress context and forum topics, iMessage handles attachment roots, remote media staging, and duplicate local Messages sources, WhatsApp restores group/media behavior, Discord improves voice playback and model picking, and Signal/iMessage/WhatsApp get reaction approvals.
- Better voice and Talk: realtime Talk runs can be inspected, steered, cancelled, or followed up from Web UI and Discord voice; wake-name handling is more tolerant without letting ambient speech trigger agents.
- Safer content boundaries: Browser snapshot reads honor SSRF policy, system-event text cannot spoof nested prompt markers, fetched file text is wrapped as external content, ClickClack inbound sender allowlists run before agent dispatch, stale device tokens are rejected, and serialized tool-call text is scrubbed from replies.
- Providers, Codex, and local models are steadier: named auth profiles, OpenAI sampling params, Codex app-server resume/timeout/usage-limit recovery, dynamic tool-schema guards, xAI usage-limit surfacing, Ollama top-p normalization, and local approval resolution reduce provider-specific dead ends.
- More reliable install/update/release paths: Alpine installs, trusted runtime fallback roots, stable update channels, Docker/package timeouts, Windows Scheduled Tasks, Windows/macOS proof lanes, Testbox/Crabbox delegation, plugin publish checks, and macOS runner bootstraps all got hardened.
- Better observability: Activity tab, gateway secret-prep traces, tool/model stream progress, explicit fast-mode status, systemd Gateway hygiene, OpenTelemetry LLM spans, release performance evidence, and richer telemetry signals make failures easier to inspect.

##### Changes

- Transcripts: add core transcript capture and source-provider support for transcript-backed meeting summaries, including the renamed Transcripts docs, CLI surface, source-provider chunks, and cleaned user-turn persistence.
- Auth: add named model login profiles and supported credential migration for Hermes, OpenCode, and Codex auth profiles, with explicit opt-out and non-interactive controls. ([#&#8203;85667](https://github.com/openclaw/openclaw/issues/85667)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev).
- Diagnostics: trace gateway secret preparation, classify skill/tool usage, surface model stream progress, add OpenTelemetry LLM content spans, and expose alertable telemetry for blocked tools, failover, stale sessions, liveness, oversized payloads, and webhook ingress. ([#&#8203;83019](https://github.com/openclaw/openclaw/issues/83019), [#&#8203;80370](https://github.com/openclaw/openclaw/issues/80370), [#&#8203;86191](https://github.com/openclaw/openclaw/issues/86191))
- Channels: add Signal reaction approvals, iMessage thumb approval reactions, and WhatsApp thumb approval reaction support so mobile approval flows work without textual `/approve` commands. ([#&#8203;85894](https://github.com/openclaw/openclaw/issues/85894), [#&#8203;85952](https://github.com/openclaw/openclaw/issues/85952), [#&#8203;85477](https://github.com/openclaw/openclaw/issues/85477))
- Agents/API: forward OpenAI sampling params through the Gateway and expose estimated context-budget status for active agent runs. ([#&#8203;84094](https://github.com/openclaw/openclaw/issues/84094))
- TUI/status: queue prompts submitted while an agent is busy and show explicit fast-mode state plus richer systemd Gateway hygiene in status output. ([#&#8203;86722](https://github.com/openclaw/openclaw/issues/86722), [#&#8203;87115](https://github.com/openclaw/openclaw/issues/87115), [#&#8203;86976](https://github.com/openclaw/openclaw/issues/86976))
- Exec approvals: hide durable approval actions that are unavailable for the current prompt and keep approval runtime tokens local-only so stale prompts cannot offer misleading controls. ([#&#8203;86270](https://github.com/openclaw/openclaw/issues/86270), [#&#8203;86359](https://github.com/openclaw/openclaw/issues/86359))
- Plugin SDK: add reaction approval helpers and keep diagnostic event root exports discoverable across function-name and alias-bound module graphs. ([#&#8203;86735](https://github.com/openclaw/openclaw/issues/86735), [#&#8203;87084](https://github.com/openclaw/openclaw/issues/87084))
- Android/iOS: add the Android pair-new-gateway action and improve mobile Talk mode surfaces, including iOS realtime Talk mode and Android offline voice/gateway recovery. ([#&#8203;86798](https://github.com/openclaw/openclaw/issues/86798), [#&#8203;86355](https://github.com/openclaw/openclaw/issues/86355)) Thanks [@&#8203;ngutman](https://github.com/ngutman).
- Performance: cache plugin metadata snapshots, package realpaths, stable gateway metadata, model cost indexes, channel resolution, usage-cost indexes, and session/auth hot-path facts so common Gateway and reply paths do less rediscovery. ([#&#8203;84649](https://github.com/openclaw/openclaw/issues/84649), [#&#8203;85843](https://github.com/openclaw/openclaw/issues/85843), [#&#8203;86517](https://github.com/openclaw/openclaw/issues/86517), [#&#8203;86678](https://github.com/openclaw/openclaw/issues/86678))
- Voice: expose shared realtime turn-context tracking through the realtime voice SDK and reuse it for Discord speaker attribution and wake-name context recovery.
- Voice: reuse shared realtime output activity tracking in Google Meet command and node audio bridges, including recent-output checks for local barge-in detection.
- Voice: expose shared realtime output activity tracking through the realtime voice SDK and reuse it for Discord playback activity and barge-in decisions.
- Voice: expose shared realtime consult question matching, speakable-result extraction, and alias-aware forced-consult coordination through the realtime voice SDK, then reuse it in Gateway Talk, Voice Call, and Discord voice paths.
- Voice: share activation-name matching and consult-transcript screening through the realtime voice SDK so Discord, browser voice, and meeting surfaces can reuse one implementation.
- Cron: default `cron.maxConcurrentRuns` to 8 so scheduled automations and their isolated agent turns can make progress in parallel without explicit configuration.
- QA-Lab: add `qa coverage --match <query>` so focused proof selection can discover matching scenarios from existing metadata before running live or remote lanes.
- Discord/model picker: surface an alpha-bucket select (e.g. `A–G (12) · H–N (18) · O–Z (5)`) when the provider list or a provider's model list exceeds 25 items, so configs with `provider/*` wildcards stay one click from the right page instead of paginating through prev/next; falls back to numeric chunks when every item shares the same first letter.
- Control UI: add an ephemeral Activity tab for sanitized live tool activity summaries without persisting raw telemetry. Fixes [#&#8203;12831](https://github.com/openclaw/openclaw/issues/12831). Thanks [@&#8203;BunsDev](https://github.com/BunsDev).
- Build: include `ui:build` in the `full` and `ciArtifacts` profiles of `scripts/build-all.mjs` so `pnpm build` always rebuilds `dist/control-ui` after `tsdown` cleans `dist`, removing the second-command requirement and the missing-asset failure mode for source/runtime installs and CI artifact uploads. ([#&#8203;85206](https://github.com/openclaw/openclaw/issues/85206))
- iOS: improve Talk mode with direct realtime voice sessions, compact toolbar status, and responsive voice waveform feedback. ([#&#8203;86355](https://github.com/openclaw/openclaw/issues/86355)) Thanks [@&#8203;ngutman](https://github.com/ngutman).
- Media: replace the Sharp image backend with Rastermill for metadata, resizing, EXIF orientation, and PNG alpha-preserving optimization so OpenClaw no longer installs Sharp or the WhatsApp Jimp fallback for image processing. ([#&#8203;86437](https://github.com/openclaw/openclaw/issues/86437))
- Codex: update the bundled Codex CLI to 0.134.0 and keep native compaction disabled for budget-triggered app-server turns so OpenClaw owns the recovery boundary. ([#&#8203;86772](https://github.com/openclaw/openclaw/issues/86772))

##### Fixes

- Memory/security: reject prompt-like text submitted through the explicit `memory_store` tool before embedding or storage, matching the existing auto-capture prompt-injection filter. ([#&#8203;87142](https://github.com/openclaw/openclaw/issues/87142))

- Gateway/security: enable the default auth rate limiter for remote non-browser and HTTP gateway auth failures when `gateway.auth.rateLimit` is unset, while preserving the loopback exemption. ([#&#8203;87148](https://github.com/openclaw/openclaw/issues/87148))

- Prompt hardening: route untrusted group prompt metadata through sanitized untrusted structured context while preserving trusted operator-configured group system prompts and aligning the plugin SDK docs/test helpers. ([#&#8203;87144](https://github.com/openclaw/openclaw/issues/87144))

- Security/content boundaries: validate Browser snapshot tab URLs against SSRF policy before ChromeMCP or direct CDP reads, sanitize queued system-event text so untrusted plugin/channel labels cannot spoof nested prompt markers, wrap fetched file text and metadata as external content, apply ClickClack `allowFrom` sender allowlists before agent dispatch, reject RPCs from invalidated device-token clients during rotation, require staged sandbox media refs, and scrub serialized tool-call text from replies. ([#&#8203;78526](https://github.com/openclaw/openclaw/issues/78526), [#&#8203;87094](https://github.com/openclaw/openclaw/issues/87094), [#&#8203;87062](https://github.com/openclaw/openclaw/issues/87062), [#&#8203;83741](https://github.com/openclaw/openclaw/issues/83741), [#&#8203;70707](https://github.com/openclaw/openclaw/issues/70707), [#&#8203;86924](https://github.com/openclaw/openclaw/issues/86924)) Thanks [@&#8203;zsxsoft](https://github.com/zsxsoft), [@&#8203;ttzero25](https://github.com/ttzero25), and [@&#8203;mmaps](https://github.com/mmaps).

- Transcripts/user turns: persist CLI, WebChat, media, follow-up, hook, and Codex-mirror user turns to the admitted session target; keep cleaned transcript text, inline image routing, provenance metadata, replay hooks, and fallback paths idempotent when runtimes fail or restart.

- TUI/status/onboarding/UI: queue busy TUI prompts instead of dropping them, preserve the configured default model during onboarding, show failed tool results as errors, show config-open failures in Control UI, keep status JSON plugin scans healthy, preserve xAI usage-limit errors locally, and expose explicit fast-mode/systemd state. ([#&#8203;86722](https://github.com/openclaw/openclaw/issues/86722), [#&#8203;87000](https://github.com/openclaw/openclaw/issues/87000), [#&#8203;85786](https://github.com/openclaw/openclaw/issues/85786), [#&#8203;87108](https://github.com/openclaw/openclaw/issues/87108), [#&#8203;87001](https://github.com/openclaw/openclaw/issues/87001), [#&#8203;86614](https://github.com/openclaw/openclaw/issues/86614), [#&#8203;87115](https://github.com/openclaw/openclaw/issues/87115), [#&#8203;86976](https://github.com/openclaw/openclaw/issues/86976))

- Plugin commands/SDK: preserve plugin LLM command auth, bind native plugin command dispatch to the host agent's LLM auth, keep `onDiagnosticEvent` exports discoverable through `Function.name`, stabilize diagnostic event root aliases, correlate pathless read diagnostics, suppress transient runner failures in channel command paths, and repair local approval resolution. ([#&#8203;85936](https://github.com/openclaw/openclaw/issues/85936), [#&#8203;87084](https://github.com/openclaw/openclaw/issues/87084), [#&#8203;86977](https://github.com/openclaw/openclaw/issues/86977), [#&#8203;87069](https://github.com/openclaw/openclaw/issues/87069), [#&#8203;86771](https://github.com/openclaw/openclaw/issues/86771))

- Codex/providers: keep WebChat delivery hints out of user prompts, avoid false queued-terminal idle timeouts, share the native hook relay registry, quarantine unsupported dynamic tool schemas, preserve Claude resumed-session system prompts, normalize greedy Ollama `top_p`, preserve per-agent thinking defaults for ingress runs, and avoid native compaction takeover on budget-triggered Codex turns. ([#&#8203;87096](https://github.com/openclaw/openclaw/issues/87096), [#&#8203;73950](https://github.com/openclaw/openclaw/issues/73950), [#&#8203;87049](https://github.com/openclaw/openclaw/issues/87049), [#&#8203;86689](https://github.com/openclaw/openclaw/issues/86689), [#&#8203;86772](https://github.com/openclaw/openclaw/issues/86772))

- Gateway/perf/release: reuse startup-warning metadata and prepared auth stores, avoid cloning live-switch and lifecycle session caches on read paths, defer warning and scheduled-service fallback imports, trim Gateway session/startup/runtime CPU churn, skip duplicate turn session touches, stop chat timeout fallback cascades, drop stale subagent announce history, bound benchmark/watch/kitchen-sink teardown waits, bound macOS/package/onboarding/plugin smoke commands, bound install finalization probes, resolve Parallels npm-update commands from guest `PATH`, and bootstrap raw AWS macOS Node/pnpm commands through `/usr/bin/env`. ([#&#8203;86997](https://github.com/openclaw/openclaw/issues/86997))

- Reply/perf: reduce visible reply delivery latency by preserving Telegram typing/progress context, lazy-loading slash-command startup metadata, avoiding hot-path model hydration, flag-gating Codex profiler timing, deferring context compaction maintenance, and tracking delivery timing. ([#&#8203;86989](https://github.com/openclaw/openclaw/issues/86989), [#&#8203;86990](https://github.com/openclaw/openclaw/issues/86990), [#&#8203;86991](https://github.com/openclaw/openclaw/issues/86991), [#&#8203;86992](https://github.com/openclaw/openclaw/issues/86992), [#&#8203;86993](https://github.com/openclaw/openclaw/issues/86993), [#&#8203;86994](https://github.com/openclaw/openclaw/issues/86994)) Thanks [@&#8203;keshavbotagent](https://github.com/keshavbotagent).

- Reply/source delivery: keep TUI, Control UI, media, TTS, transcript, and Codex source-reply finals live without duplicate terminal events or stale replay artifacts.

- Agents/replay: repair legacy tool results before replay, preserve `sessions_spawn` transcript payloads, restore current guard checks, stage sandboxed workspace media, and keep duplicate transcripts tool display metadata from reappearing. ([#&#8203;82203](https://github.com/openclaw/openclaw/issues/82203), [#&#8203;86934](https://github.com/openclaw/openclaw/issues/86934), [#&#8203;87025](https://github.com/openclaw/openclaw/issues/87025)) Thanks [@&#8203;martingarramon](https://github.com/martingarramon), [@&#8203;vincentkoc](https://github.com/vincentkoc), and [@&#8203;joshavant](https://github.com/joshavant).

- Agents/sessions: handle active-fallback failures in `sessions_send` so fallback routing reports the real failure and does not leave callers with an ambiguous dropped send. ([#&#8203;86638](https://github.com/openclaw/openclaw/issues/86638))

- Agents/hooks/subagents: enforce default hook agent allowlists, recover failed subagent lifecycle completions, and keep node task lifecycle cleanup from closing the Gateway listener. ([#&#8203;86101](https://github.com/openclaw/openclaw/issues/86101))

- Codex: project newer OpenClaw chat history into resumed app-server threads and keep Codex turn timeouts inside the Codex runtime boundary so timeouts do not poison shared app-server clients or fall through to unrelated provider fallback. ([#&#8203;86677](https://github.com/openclaw/openclaw/issues/86677), [#&#8203;86476](https://github.com/openclaw/openclaw/issues/86476)) Thanks [@&#8203;TurboTheTurtle](https://github.com/TurboTheTurtle) and [@&#8203;pashpashpash](https://github.com/pashpashpash).

- Config/doctor/update: narrow profiled tool-section doctor repair, keep runtime-injected legacy web-search provider config out of user-authored config validation, and keep prerelease tags excluded from stable updater resolution. ([#&#8203;87030](https://github.com/openclaw/openclaw/issues/87030), [#&#8203;86818](https://github.com/openclaw/openclaw/issues/86818), [#&#8203;86559](https://github.com/openclaw/openclaw/issues/86559)) Thanks [@&#8203;joshavant](https://github.com/joshavant), [@&#8203;luoyanglang](https://github.com/luoyanglang), and [@&#8203;stevenepalmer](https://github.com/stevenepalmer).

- Doctor/runtime: validate active bundled MCP tool schemas through the same runtime projection path so unsupported MCP input schemas are reported and quarantined instead of poisoning assistant startup.

- CLI/Windows: add a Windows-only stack-size respawn for stack-heavy startup paths, default CLI logs to local timestamps, and validate timeout/banner TTY state more strictly. ([#&#8203;87031](https://github.com/openclaw/openclaw/issues/87031), [#&#8203;85387](https://github.com/openclaw/openclaw/issues/85387)) Thanks [@&#8203;giodl73-repo](https://github.com/giodl73-repo) and [@&#8203;vincentkoc](https://github.com/vincentkoc).

- Locking/security: require owner identity proof before stale plugin lock removal, memoize session lock owner arguments, and avoid writing default exec approval stores unless policy state actually changed. ([#&#8203;86814](https://github.com/openclaw/openclaw/issues/86814), [#&#8203;86964](https://github.com/openclaw/openclaw/issues/86964)) Thanks [@&#8203;Alix-007](https://github.com/Alix-007) and [@&#8203;vincentkoc](https://github.com/vincentkoc).

- Install/release: bound Docker package build, inventory, pack, and tarball preparation with process-group timeouts; pin shrinkwrap patch drift to the pnpm lock; harden macOS restart and dSYM packaging; and run release Docker/live timeout wrappers in the foreground so child processes cannot wedge gates.

- QA/Telegram: bound Telegram user credential tar and broker calls so live proof setup fails with a timeout instead of waiting for the outer Crabbox job deadline.

- QA/Tool Search: bound gateway E2E HTTP probes, run only the fixture plugin, and clean up temporary fixture trees after the compact tool-catalog proof completes.

- Telegram/network: treat `ENETDOWN` as a transient pre-connect network failure so Telegram sends, gateway unhandled-rejection handling, and cron network retries follow the same recovery path as sibling network outages. ([#&#8203;86762](https://github.com/openclaw/openclaw/issues/86762)) Thanks [@&#8203;TurboTheTurtle](https://github.com/TurboTheTurtle).

- Telegram: preserve inbound text entities, overlapping DM replies, account topic cache sidecars, outbound reply context, targeted bot-command mentions, durable group retry targets, forum topic names, and native progress callbacks. ([#&#8203;83873](https://github.com/openclaw/openclaw/issues/83873), [#&#8203;85361](https://github.com/openclaw/openclaw/issues/85361), [#&#8203;85555](https://github.com/openclaw/openclaw/issues/85555), [#&#8203;85656](https://github.com/openclaw/openclaw/issues/85656), [#&#8203;85709](https://github.com/openclaw/openclaw/issues/85709), [#&#8203;86299](https://github.com/openclaw/openclaw/issues/86299), [#&#8203;86553](https://github.com/openclaw/openclaw/issues/86553)) Thanks [@&#8203;SebTardif](https://github.com/SebTardif), [@&#8203;luoyanglang](https://github.com/luoyanglang), and [@&#8203;neeravmakwana](https://github.com/neeravmakwana).

- iMessage: read image attachments from local Messages attachment roots, dedupe duplicate local Messages-source accounts, seed direct DM history, fix image/group media attachment commands, advance catchup cursors after live handling, and keep slash-command acknowledgements in the source conversation. ([#&#8203;82642](https://github.com/openclaw/openclaw/issues/82642), [#&#8203;85475](https://github.com/openclaw/openclaw/issues/85475), [#&#8203;86569](https://github.com/openclaw/openclaw/issues/86569), [#&#8203;86705](https://github.com/openclaw/openclaw/issues/86705), [#&#8203;86706](https://github.com/openclaw/openclaw/issues/86706), [#&#8203;86770](https://github.com/openclaw/openclaw/issues/86770)) Thanks [@&#8203;homer-byte](https://github.com/homer-byte), [@&#8203;TurboTheTurtle](https://github.com/TurboTheTurtle), [@&#8203;swang430](https://github.com/swang430), and [@&#8203;OmarShahine](https://github.com/OmarShahine).

- WhatsApp/QQ/Twitch/IRC/Slack: restore WhatsApp ack identity and group-drop warnings, make QQ Bot media respect `OPENCLAW_HOME`, serialize Twitch auth disconnects, store IRC channel routes canonically, and keep Slack downloaded files out of reply media. ([#&#8203;83833](https://github.com/openclaw/openclaw/issues/83833), [#&#8203;85309](https://github.com/openclaw/openclaw/issues/85309), [#&#8203;85777](https://github.com/openclaw/openclaw/issues/85777), [#&#8203;85794](https://github.com/openclaw/openclaw/issues/85794), [#&#8203;85906](https://github.com/openclaw/openclaw/issues/85906), [#&#8203;86318](https://github.com/openclaw/openclaw/issues/86318), [#&#8203;86697](https://github.com/openclaw/openclaw/issues/86697)) Thanks [@&#8203;sliverp](https://github.com/sliverp), [@&#8203;neeravmakwana](https://github.com/neeravmakwana), and [@&#8203;Kailigithub](https://github.com/Kailigithub).

- Discord/voice: improve voice playback and wake replies, bucket large model picker menus, merge media captions into one message, route metadata through configured proxies, restore numeric channel sends, suppress self-reply echoes, and tighten wake matching without breaking fuzzy wake phrases. ([#&#8203;80227](https://github.com/openclaw/openclaw/issues/80227), [#&#8203;86238](https://github.com/openclaw/openclaw/issues/86238), [#&#8203;86487](https://github.com/openclaw/openclaw/issues/86487), [#&#8203;86571](https://github.com/openclaw/openclaw/issues/86571), [#&#8203;86595](https://github.com/openclaw/openclaw/issues/86595), [#&#8203;86601](https://github.com/openclaw/openclaw/issues/86601))

- Codex: preserve native web-search metadata, keep oversized native thread reuse, bridge CLI API-key auth into the app server, preserve sandbox bootstrap path style, recover context-window prompt errors, honor yolo approval policy, disable native thread personality, and route compaction through Codex auth. ([#&#8203;85378](https://github.com/openclaw/openclaw/issues/85378), [#&#8203;85542](https://github.com/openclaw/openclaw/issues/85542), [#&#8203;85891](https://github.com/openclaw/openclaw/issues/85891), [#&#8203;85909](https://github.com/openclaw/openclaw/issues/85909), [#&#8203;86408](https://github.com/openclaw/openclaw/issues/86408))

- Agents/runtime: enforce session lock max-hold reclaim, release embedded-attempt locks on all exits, treat aborted subagent runs as terminal, avoid runtime model hydration on hot paths, disclose scoped session list counts, derive overflow budgets from provider errors, and keep fallback errors scoped to the active model candidate. ([#&#8203;70473](https://github.com/openclaw/openclaw/issues/70473), [#&#8203;85764](https://github.com/openclaw/openclaw/issues/85764), [#&#8203;86014](https://github.com/openclaw/openclaw/issues/86014), [#&#8203;86134](https://github.com/openclaw/openclaw/issues/86134), [#&#8203;86427](https://github.com/openclaw/openclaw/issues/86427), [#&#8203;86944](https://github.com/openclaw/openclaw/issues/86944)) Thanks [@&#8203;openperf](https://github.com/openperf), [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev), [@&#8203;zhangguiping-xydt](https://github.com/zhangguiping-xydt), and [@&#8203;ferminquant](https://github.com/ferminquant).

- Config/update/doctor: retry config recovery after failed backup restore, skip shell env fallback on Windows, exclude prerelease tags from the stable git channel, support deep config edits, warn instead of aborting on unreadable cron stores, prune stale bundled plugin paths, and avoid duplicate restart prompts when the Gateway is already healthy. ([#&#8203;85739](https://github.com/openclaw/openclaw/issues/85739), [#&#8203;85787](https://github.com/openclaw/openclaw/issues/85787), [#&#8203;86060](https://github.com/openclaw/openclaw/issues/86060), [#&#8203;86260](https://github.com/openclaw/openclaw/issues/86260), [#&#8203;86384](https://github.com/openclaw/openclaw/issues/86384), [#&#8203;86533](https://github.com/openclaw/openclaw/issues/86533)) Thanks [@&#8203;liaoyl830](https://github.com/liaoyl830).

- Install/release: support Alpine CLI installs and runtime floors, prefer trusted startup argv runtime fallback roots, reject stale CLI node runtimes, avoid npm `min-release-age` installer failures, bound npm/package/Docker install phases, restore config parent ownership in Docker, seed Docker lockfile package tarballs before prune, make release/plugin prerelease checks fail closed instead of hanging or false-greening, and use host-visible Crabbox local work roots for Docker-backed proof. ([#&#8203;85491](https://github.com/openclaw/openclaw/issues/85491))

- Windows daemon: keep Scheduled Task gateway launches running on battery power and avoid workgroup-machine prompts for a domain user during task installation. ([#&#8203;59299](https://github.com/openclaw/openclaw/issues/59299))

- Security: avoid printing Gateway tokens in Docker, validate plugin model-pattern regexes safely, escape transcript metadata field names, harden session allowlist glob matching, audit Claude permission overrides under YOLO, and require explicit allow for ACP auto approvals. ([#&#8203;85849](https://github.com/openclaw/openclaw/issues/85849), [#&#8203;85934](https://github.com/openclaw/openclaw/issues/85934), [#&#8203;86046](https://github.com/openclaw/openclaw/issues/86046), [#&#8203;86557](https://github.com/openclaw/openclaw/issues/86557))

- Media/images: replace Sharp with Rastermill, keep EXIF normalization best-effort, normalize HEIC/HEIF before image descriptions, route Codex image API keys through OpenAI, preserve image compression metadata, and auto-scale live tool result caps. ([#&#8203;85776](https://github.com/openclaw/openclaw/issues/85776), [#&#8203;86037](https://github.com/openclaw/openclaw/issues/86037), [#&#8203;86437](https://github.com/openclaw/openclaw/issues/86437), [#&#8203;86857](https://github.com/openclaw/openclaw/issues/86857), [#&#8203;86923](https://github.com/openclaw/openclaw/issues/86923))

- Memory: prevent semantic vector indexes from silently degrading when embeddings are unavailable, stop doctor OOMs on large session stores, preserve sidecar hooks/artifacts, write fallback dream diaries, use CJK-aware dreaming dedupe, and avoid per-file watcher FD fan-out. ([#&#8203;80613](https://github.com/openclaw/openclaw/issues/80613), [#&#8203;82928](https://github.com/openclaw/openclaw/issues/82928), [#&#8203;85060](https://github.com/openclaw/openclaw/issues/85060), [#&#8203;85704](https://github.com/openclaw/openclaw/issues/85704), [#&#8203;85967](https://github.com/openclaw/openclaw/issues/85967), [#&#8203;86701](https://github.com/openclaw/openclaw/issues/86701)) Thanks [@&#8203;brokemac79](https://github.com/brokemac79), [@&#8203;openperf](https://github.com/openperf), and [@&#8203;yaaboo-gif](https://github.com/yaaboo-gif).

- Agents/sessions: include visibility metadata on restricted `sessions_list` results so scoped counts are clearly reported without widening access or exposing hidden-session counts. ([#&#8203;86944](https://github.com/openclaw/openclaw/issues/86944)) Thanks [@&#8203;ferminquant](https://github.com/ferminquant).

- Gateway/DNS: validate wide-area discovery domains before deriving zone paths or writing zone files, so invalid `discovery.wideArea.domain` and `dns setup --domain` values fail with a DNS-name diagnostic instead of falling through to unrelated configuration errors. Thanks [@&#8203;mmaps](https://github.com/mmaps).

- Agents/BTW: route fallback side-question streams through the embedded stream resolver so Anthropic-compatible MiniMax requests use the same capped transport as normal chat. ([#&#8203;86312](https://github.com/openclaw/openclaw/issues/86312)) Thanks [@&#8203;neeravmakwana](https://github.com/neeravmakwana).

- Telegram: treat `/command@TargetBot` bot-command entities as explicit mentions for the addressed bot so `requireMention` groups no longer drop targeted commands or captions. Fixes [#&#8203;84462](https://github.com/openclaw/openclaw/issues/84462). ([#&#8203;86553](https://github.com/openclaw/openclaw/issues/86553)) Thanks [@&#8203;luoyanglang](https://github.com/luoyanglang).

- CI: bound Docker/Bash E2E tarball npm installs with `OPENCLAW_E2E_NPM_INSTALL_TIMEOUT` so package, onboarding, plugin, and upgrade lanes fail instead of hanging on a stuck npm install.

- CI: fail Parallels npm-update smoke jobs after the guest command timeout and cleanup backstop instead of only logging a timeout line.

- CI: bound kitchen-sink RPC HTTP probes so stalled gateway readiness or response bodies fail and retry instead of wedging the walker.

- CI: bound Telegram user Crabbox proof Bot API calls so stalled Telegram responses fail instead of wedging credential and desktop proof cleanup.

- CI: bound MCP channel stdio client initialization so Docker channel proof fails and closes the bridge transport instead of waiting for the outer job timeout.

- CI: keep `OPENCLAW_TESTBOX=1 pnpm check:changed` delegating to Blacksmith Testbox through Crabbox without forwarding local Testbox or worker env into the remote command.

- CI: send KILL after the TERM grace period for manual checkout fetch timeouts so stuck Testbox and workflow checkout retries cannot hang behind a wedged `git fetch`.

- CI: send KILL after the TERM grace period for Bun global install smoke command timeouts so trapped `openclaw` child processes cannot wedge the scheduled install smoke.

- iMessage: thread current channel/account inbound attachment roots into the image tool so iMessage-saved attachments under `~/Library/Messages/Attachments` (including the wildcard `/Users/*/Library/Messages/Attachments` root) are read through the existing inbound path policy instead of being rejected as `path-not-allowed`. Literal `localRoots` stays workspace-scoped. Fixes [#&#8203;30170](https://github.com/openclaw/openclaw/issues/30170). ([#&#8203;86569](https://github.com/openclaw/openclaw/issues/86569))

- QQ Bot: respect `OPENCLAW_HOME` for outbound media path resolution so `<qqmedia>` sends no longer silently fail when `HOME` and `OPENCLAW_HOME` differ (Docker / multi-user hosts). Persisted QQ Bot data (sessions, known users, refs) stays anchored on the OS home for upgrade compatibility. Fixes [#&#8203;83562](https://github.com/openclaw/openclaw/issues/83562). Thanks [@&#8203;sliverp](https://github.com/sliverp).

- Update: report the primary malformed `openclaw.extensions` payload error without adding a duplicate missing-main diagnostic. ([#&#8203;86596](https://github.com/openclaw/openclaw/issues/86596)) Thanks [@&#8203;ferminquant](https://github.com/ferminquant).

- Control UI: keep host-local Markdown file paths inert while preserving app-relative links. ([#&#8203;86620](https://github.com/openclaw/openclaw/issues/86620)) Thanks [@&#8203;BryanTegomoh](https://github.com/BryanTegomoh).

- Gateway: dampen repeated unauthenticated device-required probes per URL while preserving explicit-auth and paired recovery paths. ([#&#8203;86575](https://github.com/openclaw/openclaw/issues/86575)) Thanks [@&#8203;ferminquant](https://github.com/ferminquant).

- IRC: store inbound channel routes with the canonical `channel:#name` target and join transient channel sends before writing. ([#&#8203;85906](https://github.com/openclaw/openclaw/issues/85906)) Thanks [@&#8203;Kailigithub](https://github.com/Kailigithub).

- Usage: surface unknown all-zero model pricing as missing cost entries instead of a confident `$0` total. ([#&#8203;85882](https://github.com/openclaw/openclaw/issues/85882)) Thanks [@&#8203;MichaelZelbel](https://github.com/MichaelZelbel).

- Agents/Codex: honor yolo app-server approval policy only for the full `never` plus `danger-full-access` case. ([#&#8203;85909](https://github.com/openclaw/openclaw/issues/85909)) Thanks [@&#8203;earlvanze](https://github.com/earlvanze).

- Gateway/Gmail: clear Gmail watcher renewal intervals on re-entry so hot reloads do not leak lifecycle timers. ([#&#8203;82947](https://github.com/openclaw/openclaw/issues/82947)) Thanks [@&#8203;SebTardif](https://github.com/SebTardif).

- Logging: exit cleanly on broken stdout/stderr pipes without masking existing failure exit codes. ([#&#8203;80059](https://github.com/openclaw/openclaw/issues/80059)) Thanks [@&#8203;pavelzak](https://github.com/pavelzak).

- Gateway/security: escape transcript metadata field names while extracting oversized session line prefixes. ([#&#8203;85934](https://github.com/openclaw/openclaw/issues/85934)) Thanks [@&#8203;SebTardif](https://github.com/SebTardif).

- Plugins/security: validate manifest model pattern regexes with the safe-regex compiler so unsafe patterns are ignored before matching. ([#&#8203;86046](https://github.com/openclaw/openclaw/issues/86046)) Thanks [@&#8203;SebTardif](https://github.com/SebTardif).

- Discord: route gateway metadata REST lookups through the configured Discord proxy so proxied accounts do not fall back to direct `discord.com` connections before opening the WebSocket. Fixes [#&#8203;80227](https://github.com/openclaw/openclaw/issues/80227). Thanks [@&#8203;Clivilwalker](https://github.com/Clivilwalker).

- Agents/media: hydrate current-turn image attachments from filename-derived MIME types so active vision can see generated or forwarded images whose source omitted an image content type. ([#&#8203;84812](https://github.com/openclaw/openclaw/issues/84812)) Thanks [@&#8203;marchpure](https://github.com/marchpure).

- Agents/fs: point workspace-only scratch-path guidance at in-workspace temp directories while keeping host-root writes rejected by the tool guard. ([#&#8203;86501](https://github.com/openclaw/openclaw/issues/86501)) Thanks [@&#8203;tianxiaochannel-oss88](https://github.com/tianxiaochannel-oss88).

- Agents/media: keep async cron media completions scoped to their run session while preserving direct delivery for stale generated-media success and failure notifications. ([#&#8203;86529](https://github.com/openclaw/openclaw/issues/86529)) Thanks [@&#8203;ai-hpc](https://github.com/ai-hpc).

- Gateway: emit plugin `session_end`/`session_start` hooks when `agent.send` rotates or replaces a session id, keeping hook lifecycle state aligned with `sessions.changed` notifications. Fixes [#&#8203;83507](https://github.com/openclaw/openclaw/issues/83507). ([#&#8203;85875](https://github.com/openclaw/openclaw/issues/85875)) Thanks [@&#8203;brokemac79](https://github.com/brokemac79).

- OpenShell/SSH: reject malformed generated exec commands before sandbox/session setup so unresolved workflow placeholders fail fast instead of reaching the remote shell. Fixes [#&#8203;72373](https://github.com/openclaw/openclaw/issues/72373). Thanks [@&#8203;brokemac79](https://github.com/brokemac79).

- Google: stop normalizing `gemini-3.1-flash-lite` to the retired preview endpoint and update Flash Lite alias guidance to the GA model id. Fixes [#&#8203;86151](https://github.com/openclaw/openclaw/issues/86151). ([#&#8203;86240](https://github.com/openclaw/openclaw/issues/86240)) Thanks [@&#8203;SebTardif](https://github.com/SebTardif).

- Installer: make Alpine apk installs cover Git, verify the Node runtime floor, try `nodejs-current`, and report Alpine version guidance when repositories only provide older Node packages.

- Agents/status: prefer the active Claude CLI OAuth auth label over an unused Anthropic env API-key label for equivalent runtime aliases. Fixes [#&#8203;80184](https://github.com/openclaw/openclaw/issues/80184). ([#&#8203;86570](https://github.com/openclaw/openclaw/issues/86570)) Thanks [@&#8203;brokemac79](https://github.com/brokemac79).

- Agents/media: send direct fallback for generated media still missing after an active requester wake fails. ([#&#8203;85489](https://github.com/openclaw/openclaw/issues/85489)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev).

- Agents: derive overflow compaction budgets from provider-reported and synthetic over-budget token counts so confirmed context overflows compact before retrying. ([#&#8203;70473](https://github.com/openclaw/openclaw/issues/70473)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev).

- Agents/Codex: recover Codex context-window prompt errors through overflow compaction and surface reset guidance when recovery is exhausted. ([#&#8203;85542](https://github.com/openclaw/openclaw/issues/85542)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev).

- Agents/Codex: allow Codex app-server runs to bootstrap from `CODEX_API_KEY` or `OPENAI_API_KEY` when no Codex auth profile is configured.

- Agents/Codex: keep selected Codex runtime routing on OpenAI-Codex while preserving direct OpenAI API-key compaction fallback. ([#&#8203;86408](https://github.com/openclaw/openclaw/issues/86408)) Thanks [@&#8203;funmerlin](https://github.com/funmerlin) and [@&#8203;VACInc](https://github.com/VACInc).

- Agent transcript: include OpenClaw agent session logs when finding local transcript candidates.

- Crabbox: bootstrap raw AWS macOS shell commands wrapped in absolute `time` paths so RSS probes can run Node and pnpm on fresh macOS runners.

- Crabbox: bootstrap raw AWS macOS shell commands even when setup statements precede Node or pnpm usage.

- TUI/local: skip unnecessary secret resolution, gateway model catalog loading, bootstrap, and skill scans in explicit local-model runs so startup reaches the model request faster.

- Sessions/doctor: load large session stores without clone amplification during read-only doctor checks and reclaim stale `sessions.json.*.tmp` sidecars. Fixes [#&#8203;56827](https://github.com/openclaw/openclaw/issues/56827). Thanks [@&#8203;openperf](https://github.com/openperf).

- Tests: clean successful plugin gateway gauntlet isolated temp roots while keeping an explicit preservation switch for failed/debug runs.

- Plugins/perf: reuse derived plugin metadata snapshots for the lifetime of the process so reply-time skill setup no longer rescans plugin metadata on every turn.

- Discord/OpenAI voice: keep wake-name master consults using the current speaker context after ignored ambient transcripts and shorten the default capture silence grace.

- Doctor: skip redundant Gateway restart prompts when a recent supervisor restart leaves the Gateway healthy. Fixes [#&#8203;86518](https://github.com/openclaw/openclaw/issues/86518). ([#&#8203;86533](https://github.com/openclaw/openclaw/issues/86533)) Thanks [@&#8203;liaoyl830](https://github.com/liaoyl830).

- Cron: restore suspended cron lanes to the configured/default concurrency instead of falling back to one after quota or circuit-breaker auto-resume.

- Gateway: keep session-only Control UI tool-start mirrors flowing during diagnostic queue pressure instead of silently dropping non-terminal tool updates.

- Agents/memory: return optional not-found context for missing date-only daily memory reads instead of logging benign first-run `ENOENT` failures. Fixes [#&#8203;82928](https://github.com/openclaw/openclaw/issues/82928). Thanks [@&#8203;galiniliev](https://github.com/galiniliev).

- Discord: merge streamed text captions into following media block replies so captions and attachments send as one message. ([#&#8203;86487](https://github.com/openclaw/openclaw/issues/86487)) Thanks [@&#8203;neeravmakwana](https://github.com/neeravmakwana).

- Gateway: avoid sending duplicate tool-event frames to Control UI connections that are subscribed by both run and session.

- Discord/OpenAI voice: accept broader edge-position fuzzy wake-name transcripts while keeping ambient speech gated.

- Discord/OpenAI voice: accept longer leading wake-name mistranscripts such as "Open Club" for OpenClaw.

- Agents/OpenAI-compatible: stop ModelStudio-compatible chat requests before sending system/tool-only payloads that have no usable user or assistant turn. ([#&#8203;86177](https://github.com/openclaw/openclaw/issues/86177)) Thanks [@&#8203;TurboTheTurtle](https://github.com/TurboTheTurtle).

- Gateway/plugins: reuse plugin package realpath checks while building installed plugin indexes so startup avoids repeated filesystem resolution work.

- Kilo Gateway: send string `stop` sequences as arrays so Kilo accepts OpenAI-compatible chat completions. ([#&#8203;86461](https://github.com/openclaw/openclaw/issues/86461)) Thanks [@&#8203;SebTardif](https://github.com/SebTardif).

- Discord/OpenAI voice: accept leading fuzzy wake-name transcripts such as "Monty" or "Moti" for a Molty agent while keeping ambient speech gated.

- Media understanding: convert HEIC and HEIF images to JPEG before image description providers run so iPhone photos work in direct and configured image-description flows. ([#&#8203;86037](https://github.com/openclaw/openclaw/issues/86037))

- Agents: release embedded-attempt session locks from outer teardown so post-prompt exceptions cannot wedge later requests behind `SessionWriteLockTimeoutError`. Fixes [#&#8203;86014](https://github.com/openclaw/openclaw/issues/86014). Thanks [@&#8203;openperf](https://github.com/openperf).

- Discord/OpenAI voice: rotate Realtime sessions at provider max duration without logging the expected session-expiry event as an error.

- Sessions: skip metadata-only entries during QMD-slugified session lookup so one incomplete row does not block transcript hit resolution. ([#&#8203;86327](https://github.com/openclaw/openclaw/issues/86327)) Thanks [@&#8203;abnershang](https://github.com/abnershang).

- Agents/media: derive bundled plugin local-media trust from plugin tool metadata instead of importing the full plugin registry on subscription paths. ([#&#8203;84409](https://github.com/openclaw/openclaw/issues/84409)) Thanks [@&#8203;samzong](https://github.com/samzong).

- Image tool: keep config-backed custom-provider API keys usable for auto-discovered vision models, including deferred image-tool execution without env keys or auth profiles. ([#&#8203;85733](https://github.com/openclaw/openclaw/issues/85733))

- Memory/local embeddings: run local GGUF embeddings in an isolated worker sidecar and degrade to configured fallback or keyword search on worker failure so native embedding crashes do not take down the Gateway. ([#&#8203;85348](https://github.com/openclaw/openclaw/issues/85348)) Thanks [@&#8203;osolmaz](https://github.com/osolmaz).

- Gateway: clear the runtime config snapshot before `SIGUSR1` in-process restarts so config changes survive the next gateway loop. ([#&#8203;86388](https://github.com/openclaw/openclaw/issues/86388)) Thanks [@&#8203;XuZehan-iCenter](https://github.com/XuZehan-iCenter).

- Models: show OAuth delegation markers as configured `models.json` auth while keeping runtime route usability checks strict. ([#&#8203;86378](https://github.com/openclaw/openclaw/issues/86378)) Thanks [@&#8203;rohitjavvadi](https://github.com/rohitjavvadi).

- Cron: seed active scheduled and manual cron task rows with a progress summary so status surfaces do not look blank while jobs run. ([#&#8203;86313](https://github.com/openclaw/openclaw/issues/86313)) Thanks [@&#8203;ferminquant](https://github.com/ferminquant).

- Cron: preserve unsupported persisted cron payload rows during routine store writes while keeping those rows non-runnable. Fixes [#&#8203;84922](https://github.com/openclaw/openclaw/issues/84922). ([#&#8203;86415](https://github.com/openclaw/openclaw/issues/86415)) Thanks [@&#8203;IWhatsskill](https://github.com/IWhatsskill).

- Updater: exclude prerelease git tags from stable channel resolution so source updates do not check out newer alpha/rc/preview/canary tags. ([#&#8203;86260](https://github.com/openclaw/openclaw/issues/86260)) Thanks [@&#8203;stevenepalmer](https://github.com/stevenepalmer).

- Security/Audit: flag webhook `hooks.token` reuse of active Gateway password auth in `openclaw security audit` while keeping password-mode startup compatibility. ([#&#8203;84338](https://github.com/openclaw/openclaw/issues/84338)) Thanks [@&#8203;coygeek](https://github.com/coygeek).

- QQBot: derive the outbound reply watchdog from configured agent and provider timeouts so slow local model replies are not cut off at five minutes. Fixes [#&#8203;85267](https://github.com/openclaw/openclaw/issues/85267). ([#&#8203;85271](https://github.com/openclaw/openclaw/issues/85271)) Thanks [@&#8203;SymbolStar](https://github.com/SymbolStar).

- Agents/heartbeat: stop heartbeat turns after the first valid `heartbeat_respond` so repeated response loops do not burn tokens. ([#&#8203;86357](https://github.com/openclaw/openclaw/issues/86357)) Thanks [@&#8203;udaymanish6](https://github.com/udaymanish6).

- Tasks: keep retained lost tasks out of default status health counts, explain their cleanup window during maintenance, and prune lost task records after 24 hours instead of the general 7-day terminal retention.

- Memory-core: keep REM dreaming focused on live light-staged memories and mark staged entries as considered so old recall history no longer dominates fresh candidates. ([#&#8203;86302](https://github.com/openclaw/openclaw/issues/86302)) Thanks [@&#8203;SebTardif](https://github.com/SebTardif).

- Memory: abort sync instead of downgrading an existing semantic vector index to FTS-only when the configured embedding provider is temporarily unavailable. ([#&#8203;85704](https://github.com/openclaw/openclaw/issues/85704)) Thanks [@&#8203;yaaboo-gif](https://github.com/yaaboo-gif).

- Telegram: propagate forum topic names through the account-scoped topic cache for native command context and topic create/edit actions. ([#&#8203;86299](https://github.com/openclaw/openclaw/issues/86299)) Thanks [@&#8203;SebTardif](https://github.com/SebTardif).

- Slack: keep downloaded read-only files out of reply media so Slack file reads do not echo files back to the conversation. ([#&#8203;86318](https://github.com/openclaw/openclaw/issues/86318)) Thanks [@&#8203;neeravmakwana](https://github.com/neeravmakwana).

- Cron: accept leading-plus relative durations such as `+5m` for one-shot `--at` schedules. ([#&#8203;86341](https://github.com/openclaw/openclaw/issues/86341)) Thanks [@&#8203;mushuiyu886](https://github.com/mushuiyu886).

- Agents/media: preserve async-started media tool metadata so background generation starts no longer surface generic incomplete-turn warnings while replay stays unsafe. ([#&#8203;85933](https://github.com/openclaw/openclaw/issues/85933)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev).

- Docker E2E: dedupe scheduler lane resources so npm/service package lanes are not over-counted and serialized unnecessarily.

- QA/diagnostics: add a collector-backed OpenTelemetry smoke lane, make the OTLP payload leak check scenario-aware, and keep source QA builds from failing on optional dependency imports resolved through pnpm's temp module path.

- Crabbox: bootstrap Git metadata for sparse remote changed gates so raw synced workspaces can run `pnpm check:changed` from the intended diff.

- xAI/LM Studio: avoid buffering ordinary bracketed or `final` prose until stream completion while watching for plain-text tool-call fallbacks.

- Doctor: warn and continue when the cron job store exists but cannot be read so later health checks still run. Fixes [#&#8203;86102](https://github.com/openclaw/openclaw/issues/86102). ([#&#8203;86384](https://github.com/openclaw/openclaw/issues/86384)) Thanks [@&#8203;1052326311](https://github.com/1052326311).

- Discord: suppress a bot's previous reply body and referenced media from prompt context when a user replies to that bot message, while keeping reply metadata for routing. ([#&#8203;86238](https://github.com/openclaw/openclaw/issues/86238)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev).

- Discord: restore bare numeric channel IDs for outbound message-tool sends while keeping explicit DM targets unambiguous. ([#&#8203;86571](https://github.com/openclaw/openclaw/issues/86571)) Thanks [@&#8203;joshavant](https://github.com/joshavant).

- Docker E2E: avoid rebuilding the Control UI twice while preparing the shared OpenClaw package tarball for package-backed scenario runs.

- Tests: avoid rebuilding the Control UI twice during the installer Docker smoke now that `pnpm build` includes `ui:build`.

- Tests: give QA config mutation RPCs enough native Windows budget to finish gateway config writes and restart settle after hot scenario runs.

- Tests: keep the gateway restart-inflight QA scenario focused on restart recovery on native Windows by allowing expected embedded prompt handoff errors and using the Windows-safe timeout budget.

- QA-Lab: make the synthetic OpenAI provider honor generic `reply exactly:` directives after required kickoff reads so restart-recovery scenarios do not fall through to generic repo-summary prose.

- Gateway: abort active `agent` RPC runs during forced restart shutdown so stale in-process turns cannot keep writing a session after the Gateway lifecycle restarts.

- Crabbox: sync clean sparse worktrees through a temporary full checkout even when reusing an existing lease so tracked build-time files are not omitted.

- Build: route `scripts/ui.js` through the shared pnpm runner and keep Control UI chunking helpers in sparse-included source so native Windows Corepack builds can produce `dist/control-ui`.

- Tests: give the memory fallback QA scenario enough turn budget to exercise native Windows gateway runs instead of failing on the client timeout while the mock agent is still dispatching.

- Tests: collect QA gateway CPU/RSS metrics on native Windows and give the channel baseline enough turn budget to report slow gateway runs instead of timing out before proof.

- Install/update: bypass npm `min-release-age` policies with `--min-release-age=0` instead of `--before` so hosted installers keep working on npm versions that reject the combined config. ([#&#8203;84749](https://github.com/openclaw/openclaw/issues/84749)) Thanks [@&#8203;TeodoroRodrigo](https://github.com/TeodoroRodrigo).

- Diagnostics: reclaim wedged session lanes when stale active-run bookkeeping blocks queued work despite no forward progress. Fixes [#&#8203;85639](https://github.com/openclaw/openclaw/issues/85639). Thanks [@&#8203;openperf](https://github.com/openperf).

- WebChat: keep message-tool replies visible in the chat while still summarizing internal tool results for the model. Fixes [#&#8203;86347](https://github.com/openclaw/openclaw/issues/86347). Thanks [@&#8203;shakkernerd](https://github.com/shakkernerd).

- Gateway/perf: fail startup benchmark samples when the Gateway process exits before benchmark teardown, including signal deaths after readiness probes.

- Gateway/perf: fail restart benchmark samples when the Gateway exits before benchmark teardown, including clean exits and signal deaths after successful restart probes.

- Agents/tests: keep model catalog visibility on static selection helpers so catalog visibility checks avoid the broad model-selection barrel import.

- Agents/commitments: serialize commitment store load-modify-save writes so concurrent heartbeat and CLI updates no longer lose dismissal, sent, or attempt state. ([#&#8203;81153](https://github.com/openclaw/openclaw/issues/81153)) Thanks [@&#8203;ai-hpc](https://github.com/ai-hpc).

- xAI/LM Studio: promote plain-text tool-call fallbacks into structured tool calls and strip leaked internal tool syntax before user-facing delivery. ([#&#8203;86222](https://github.com/openclaw/openclaw/issues/86222)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev).

- CLI: suppress benign self-update version-skew warnings during package post-update finalization.

- Gateway/perf: tighten restart and startup benchmark failure handling so long profiling runs, failed probes, and fresh Linux runners no longer produce false passing or `n/a` results.

- Checks: keep intentional Knip unused-file findings optional so full CI and sparse proof workspaces stay aligned.

- Docker: restore writable `~/.config` in runtime images. Fixes [#&#8203;85968](https://github.com/openclaw/openclaw/issues/85968). Thanks [@&#8203;hkoessler](https://github.com/hkoessler) and [@&#8203;Bartok9](https://github.com/Bartok9).

- Plugin SDK: keep legacy root diagnostic subscriptions connected when built plugin SDK aliases resolve diagnostic helpers through a separate module graph.

- Diagnostics: export alertable OTel and Prometheus signals for blocked tools, model failover, stale sessions, liveness warnings, oversized payloads, and webhook ingress while fixing shared OTLP endpoints with query strings.

- Tests: normalize macOS canonical temp paths in exec allowlists, fs-safe trash assertions, installed plugin matching, Telegram topic-name stores, and built ACPX MCP server expectations so native macOS proof runners cover the intended behavior.

- Codex/app-server: preserve message-tool-only source reply delivery mode on active runs so sub-agent completion wakeups can steer the active Codex turn instead of being rejected. ([#&#8203;86287](https://github.com/openclaw/openclaw/issues/86287)) Thanks [@&#8203;ferminquant](https://github.com/ferminquant).

- Tests: sample the Windows kitchen-sink RPC gateway directly and serialize RSS probes so native runs keep the memory guard active.

- Tests: normalize bundled plugin lifecycle probe paths and state-root lookup so native Windows release sweeps accept valid packaged plugin installs.

- Agents/Claude CLI: route live native Bash permission requests through OpenClaw exec policy so Claude turns no longer stall on `control_request`, and document that OpenClaw exec policy is authoritative. Fixes [#&#8203;80819](https://github.com/openclaw/openclaw/issues/80819). ([#&#8203;86330](https://github.com/openclaw/openclaw/issues/86330), from [#&#8203;81971](https://github.com/openclaw/openclaw/issues/81971)) Thanks [@&#8203;guthirry](https://github.com/guthirry) and [@&#8203;sallyom](https://github.com/sallyom).

- Security audit: warn when YOLO OpenClaw exec policy overrides a restrictive raw Claude `--permission-mode` for managed live sessions. ([#&#8203;86557](https://github.com/openclaw/openclaw/issues/86557)) Thanks [@&#8203;sallyom](https://github.com/sallyom).

- Config: keep benign legacy metadata write anomalies out of default doctor and config command output while preserving explicit anomaly logging for diagnostics.

- Codex: log when implicit app-server `never` approvals are promoted for OpenClaw tool policy, including whether the trigger was a `before_tool_call` hook or trusted tool policy.

- Codex harness: make subscription usage-limit errors without reset times explain that OpenClaw cannot determine the reset and point users to wait until Codex is available, use another Codex account, or switch to another configured model/provider. Thanks [@&#8203;amknight](https://github.com/amknight).

- Google Vertex: support production ADC modes such as Workload Identity Federation, service-account credentials, and metadata-server ADC for the native Vertex transport. ([#&#8203;83971](https://github.com/openclaw/openclaw/issues/83971)) Thanks [@&#8203;damianFelixPago](https://github.com/damianFelixPago).

- Telegram: route normal `[telegram][diag]` polling diagnostics through `runtime.log` while keeping non-diag warnings and persistence failures on `runtime.error`, so healthy polling startup no longer looks like an error. Fixes [#&#8203;82957](https://github.com/openclaw/openclaw/issues/82957). ([#&#8203;82958](https://github.com/openclaw/openclaw/issues/82958)) Thanks [@&#8203;galiniliev](https://github.com/galiniliev).

- Providers/Ollama: strip inline Kimi cloud reasoning prefixes from streamed and final visible replies while keeping ordinary Kimi answers append-only. ([#&#8203;86286](https://github.com/openclaw/openclaw/issues/86286)) Thanks [@&#8203;jason-allen-oneal](https://github.com/jason-allen-oneal).

- Gateway: require Talk secret authority before setup-code handoff can include Talk secrets. ([#&#8203;85690](https://github.com/openclaw/openclaw/issues/85690)) Thanks [@&#8203;ngutman](https://github.com/ngutman).

- Agents: keep fallback error reporting scoped to the active model candidate so stale prior-provider quota/auth text is not reported for later fallback attempts. ([#&#8203;86134](https://github.com/openclaw/openclaw/issues/86134)) Thanks [@&#8203;zhangguiping-xydt](https://github.com/zhangguiping-xydt).

- iMessage: dedupe watcher startup when `channels.imessage.accounts` lists both `default` and a named account that point at the same local Messages source, so the gateway no longer spawns two `imsg rpc` processes or doubles inbound replies; the dedupe is scoped to watcher startup, leaving duplicate accounts addressable for outbound sends, status, and capability listings, and `openclaw doctor` flags the redundant account with a rebinding hint. Fixes [#&#8203;65141](https://github.com/openclaw/openclaw/issues/65141). ([#&#8203;86705](https://github.com/openclaw/openclaw/issues/86705)) Thanks [@&#8203;swang430](https://github.com/swang430).

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about these updates again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4xMDEuMSIsInVwZGF0ZWRJblZlciI6IjQzLjEwMS4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9jb250YWluZXIiLCJ0eXBlL3BhdGNoIl19-->

Reviewed-on: https://git.erwanleboucher.dev/eleboucher/homelab/pulls/682
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

extensions: memory-core Extension: memory-core merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. P1 High-priority user-facing bug, regression, or broken workflow. proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. size: S status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants