memory: strip inbound metadata envelopes from user messages in session corpus by zqchris · Pull Request #66548 · openclaw/openclaw

zqchris · 2026-04-14T12:52:41Z

Summary

Session corpus ingestion was feeding raw Telegram/Discord/Slack inbound envelopes into the dreaming corpus unchanged. Each user message in the transcript carries a ~338-char `Conversation info` + `Sender` JSON prefix built by `buildInboundUserContextPrefix`, which exceeds the `SESSION_INGESTION_MAX_SNIPPET_CHARS` (280) cap used downstream in `dreaming-phases.ts`. Result: the user's actual words never made it into the corpus, and REM topic extraction latched onto envelope words like `assistant` / `untrusted metadata` as the top "topics".

Root cause is ordering in `buildSessionEntry` → `extractSessionText`: `normalizeSessionText` collapses newlines to spaces per text block, and once newlines are gone, `stripInboundMetadata` can no longer locate sentinel lines or fenced-json blocks. So stripping must happen before normalization, and only for `user` role (assistant messages may legitimately discuss envelope formats).

Changes

`src/memory-host-sdk/host/session-files.ts`: add `role` parameter to `extractSessionText`; strip inbound metadata on user-role text blocks before `normalizeSessionText` runs.
`src/memory-host-sdk/host/session-files.test.ts`: new test using a real multi-line Telegram envelope asserts the corpus entry contains only the actual user text; separate test confirms assistant messages containing sentinel-like text are preserved untouched.

Test plan

`pnpm test src/memory-host-sdk/host/session-files.test.ts` — 8/8 passing (6 existing + 2 new)
`pnpm check` — clean (lint, format, type, import cycles, madge, webhook/auth guards)
Manual verification on next dreaming cycle: REM topics should reflect real user words instead of envelope noise

greptile-apps · 2026-04-14T12:55:06Z

Greptile Summary

This PR fixes a corpus-ingestion bug where raw inbound metadata envelopes (prepended by buildInboundUserContextPrefix for Telegram/Discord/Slack messages) were being fed into the dreaming corpus unchanged, causing SESSION_INGESTION_MAX_SNIPPET_CHARS truncation to cut the actual user text entirely. The fix — stripping the envelope before normalizeSessionText collapses newlines — is correct and well-targeted, and the new tests directly cover the root-cause ordering constraint.

Confidence Score: 5/5

Safe to merge; the fix is correct and well-tested with no production-affecting issues.
All remaining findings are P2. The only notable gap is that the parallel packages/memory-host-sdk/src/host/session-files.ts copy was not updated, but it is not exported or used in any production code path so it poses no current risk.
packages/memory-host-sdk/src/host/session-files.ts — parallel standalone copy not updated with this fix.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: src/memory-host-sdk/host/session-files.ts
Line: 188

Comment:
**Parallel copy in `packages/` not updated**

`packages/memory-host-sdk/src/host/session-files.ts` is a separate standalone implementation (not a re-export facade like the rest of the package) and was not patched. It still calls `extractSessionText(message.content)` without passing `message.role` and has no `stripInboundMetadata` import. While that copy is currently unexported and not on any production code path, if it is ever promoted or wired up the corpus-truncation bug will silently return.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "memory: strip inbound metadata envelopes..." | Re-trigger Greptile}

greptile-apps · 2026-04-14T12:55:10Z

        continue;
      }
-      const text = extractSessionText(message.content);
+      const text = extractSessionText(message.content, message.role);


Parallel copy in packages/ not updated

packages/memory-host-sdk/src/host/session-files.ts is a separate standalone implementation (not a re-export facade like the rest of the package) and was not patched. It still calls extractSessionText(message.content) without passing message.role and has no stripInboundMetadata import. While that copy is currently unexported and not on any production code path, if it is ever promoted or wired up the corpus-truncation bug will silently return.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/memory-host-sdk/host/session-files.ts Line: 188 Comment: **Parallel copy in `packages/` not updated** `packages/memory-host-sdk/src/host/session-files.ts` is a separate standalone implementation (not a re-export facade like the rest of the package) and was not patched. It still calls `extractSessionText(message.content)` without passing `message.role` and has no `stripInboundMetadata` import. While that copy is currently unexported and not on any production code path, if it is ever promoted or wired up the corpus-truncation bug will silently return. How can I resolve this? If you propose a fix, please make it concise.

zqchris · 2026-04-14T13:08:55Z

Addressed Greptile's P2: applied the same strip to the parallel packages/memory-host-sdk/src/host/session-files.ts copy in ab90812. That file isn't currently exported or on any production code path, but keeping the two copies consistent prevents a silent regression if it ever gets wired up later.

…n corpus Session ingestion was feeding raw Telegram/Discord/Slack inbound envelopes into the dreaming corpus. The 338-char Conversation info + Sender JSON prefix on every user message blew past SESSION_INGESTION_MAX_SNIPPET_CHARS (280), so the user's actual words never made it in and REM extraction latched onto envelope words like 'assistant' as top topics. Strip inbound metadata on user-role text blocks BEFORE normalizeSessionText collapses newlines. stripInboundMetadata needs the line structure and fenced-json markers to find sentinels, so the order matters. Assistant messages are left alone — they may legitimately discuss the envelope format. Fixes openclaw#63921

…-sdk copy Greptile flagged the `packages/memory-host-sdk/src/host/session-files.ts` copy as a P2 gap: the parallel standalone implementation was not updated with the same fix, so if it ever gets wired up the corpus-truncation bug returns silently. The file isn't currently on any production code path, but keeping the two copies consistent prevents a future regression.

jalehman · 2026-04-16T18:15:50Z

Merged via squash.

Prepared head SHA: 98562b2a84450b039d78034fcb10122edafc235f
Merge commit: 82e349a48ad9b672c18d0eec5057d51c8ceafbd8

Thanks @zqchris!

@jalehman

…n corpus (openclaw#66548) Merged via squash. Prepared head SHA: 98562b2 Co-authored-by: zqchris <4436110+zqchris@users.noreply.github.com> Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com> Reviewed-by: @jalehman

openclaw-barnacle bot added the size: S label Apr 14, 2026

greptile-apps bot reviewed Apr 14, 2026

View reviewed changes

jalehman self-assigned this Apr 15, 2026

openclaw-barnacle bot added size: M and removed size: S labels Apr 15, 2026

jalehman force-pushed the fix/dreaming-corpus-strip-inbound-envelope branch 3 times, most recently from 3c92f0b to 4cfe940 Compare April 15, 2026 22:16

Chris Zhang and others added 5 commits April 16, 2026 11:10

memory: handle split inbound envelopes in session corpus

62b50d4

memory: document session corpus envelope stripping

d8ce992

changelog: drop rebase parent marker

98562b2

jalehman force-pushed the fix/dreaming-corpus-strip-inbound-envelope branch from 4cfe940 to 98562b2 Compare April 16, 2026 18:11

jalehman merged commit 82e349a into openclaw:main Apr 16, 2026
42 checks passed

github-actions bot mentioned this pull request Apr 16, 2026

📡 Upstream Digest — 2026-04-16 19:01 UTC curtismercier/openclaw-mods#592

Open

jalehman mentioned this pull request Apr 16, 2026

docs: unify duplicated 2026.4.15-beta.1 changelog block #67827

Merged

24 tasks

zqchris mentioned this pull request Apr 17, 2026

REM topic extraction produces role-tag / boilerplate "themes" on chat transcripts (frequency-based signal is structurally unsuited to chat corpus) #67942

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

memory: strip inbound metadata envelopes from user messages in session corpus#66548

memory: strip inbound metadata envelopes from user messages in session corpus#66548
jalehman merged 5 commits intoopenclaw:mainfrom
zqchris:fix/dreaming-corpus-strip-inbound-envelope

zqchris commented Apr 14, 2026

Uh oh!

greptile-apps bot commented Apr 14, 2026

Uh oh!

greptile-apps bot Apr 14, 2026

Uh oh!

zqchris commented Apr 14, 2026

Uh oh!

Uh oh!

jalehman commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

zqchris commented Apr 14, 2026

Summary

Changes

Test plan

Uh oh!

greptile-apps bot commented Apr 14, 2026

Greptile Summary

Confidence Score: 5/5

Uh oh!

greptile-apps bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

zqchris commented Apr 14, 2026

Uh oh!

Uh oh!

jalehman commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants