feat(memory): add heading-aware chunking option for improved search by chinar-amrutkar · Pull Request #59154 · openclaw/openclaw

chinar-amrutkar · 2026-04-01T16:46:02Z

Introduces a headingAware config option in memory search chunking that preserves markdown heading structure during text chunking. Instead of splitting purely by token count, the chunker splits on heading boundaries first, then sub-chunks within sections.

Add headingAware boolean to chunking config schema and merge logic
chunkMarkdown() flushes current chunk on heading lines when enabled
Track chunkHeadingAware in MemoryIndexMeta for reindex detection
Add config label and help text for the new option

Closes #45545

Introduces a headingAware config option in memory search chunking that preserves markdown heading structure during text chunking. Instead of splitting purely by token count, the chunker splits on heading boundaries first, then sub-chunks within sections. - Add headingAware boolean to chunking config schema and merge logic - chunkMarkdown() flushes current chunk on heading lines when enabled - Track chunkHeadingAware in MemoryIndexMeta for reindex detection - Add config label and help text for the new option Closes openclaw#45545

greptile-apps · 2026-04-01T16:48:34Z

Greptile Summary

This PR adds an opt-in headingAware boolean to the memory-search chunking config that flushes the current chunk whenever a Markdown heading is encountered, keeping section content together. The schema, label, help text, mergeConfig wiring, chunkMarkdown implementation, and MemoryIndexMeta persistence all look cohesive.

Remaining concerns:

Fenced code block false positives (P1 — new): isHeading has no awareness of fenced code blocks. A # line inside a ``` block (Python/shell comment, C preprocessor directive, etc.) will be treated as a heading and trigger a spurious chunk split. This makes headingAware actively harmful for content that mixes prose and fenced code. See the inline comment for a minimal fence-tracking fix.
Spurious full reindex on first upgrade (P1 — flagged in prior thread, still unresolved): meta.chunkHeadingAware is undefined in any existing persisted index; comparing it against the resolved default false via !== evaluates to true, triggering a full reindex for every existing user immediately after upgrading.
.trim() misidentifies indented code block lines as headings (P1 — flagged in prior thread, still unresolved): Stripping leading whitespace before the heading regex means a line with four or more leading spaces (an indented code block per the CommonMark spec) can falsely match as a heading.

Confidence Score: 4/5

The feature is opt-in and defaults to off, so no regression for existing users in production behaviour, but three P1 correctness bugs remain unaddressed.

Three P1 issues exist: one newly identified (fenced code block false positives that corrupt chunk boundaries) and two from prior threads that are still unresolved (spurious full reindex on upgrade, indented-code-block false heading detection). All affect users who enable headingAware; the reindex bug also silently fires for all users on first upgrade. Score held at 4 rather than lower because the feature is opt-in and the codebase is otherwise clean.

src/memory/internal.ts (heading detection logic) and src/memory/manager-sync-ops.ts (undefined vs false reindex comparison) need the most attention before merging.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: src/memory/internal.ts
Line: 348-350

Comment:
**Fenced code blocks not excluded from heading detection**

`isHeading` has no awareness of fenced code blocks (`` ``` `` or `~~~` delimiters). Any line inside a fenced code block that happens to start with `#` — e.g. a Python/shell comment (`# install deps`), a C preprocessor directive (`#include`), or a Ruby comment — will be incorrectly treated as a Markdown heading and trigger a spurious chunk flush in the middle of the code block.

This is a distinct problem from the indented-code-block `.trim()` issue: fenced blocks are far more common in practice and the `#` character appears frequently inside them.

A minimal fix is to track the fence state as you iterate over lines:

```typescript
let inFencedBlock = false;
for (let i = 0; i < lines.length; i += 1) {
  const line = lines[i] ?? "";
  const lineNo = i + 1;

  if (/^(`{3,}|~{3,})/.test(line)) {
    inFencedBlock = !inFencedBlock;
  }

  // Heading-aware: flush on heading (unless it's the first heading)
  if (headingAware && !inFencedBlock && isHeading(line) && current.length > 0) {
    flush();
    carryOverlap();
  }
  // ... rest of loop
}
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (2): Last reviewed commit: "feat(memory): add heading-aware chunking..." | Re-trigger Greptile}

greptile-apps · 2026-04-01T16:48:38Z

src/memory/manager-sync-ops.ts

      meta.scopeHash !== configuredScopeHash ||
      meta.chunkTokens !== this.settings.chunking.tokens ||
      meta.chunkOverlap !== this.settings.chunking.overlap ||
+      meta.chunkHeadingAware !== this.settings.chunking.headingAware ||


Undefined vs false comparison triggers spurious full reindex

chunkHeadingAware is declared optional (chunkHeadingAware?: boolean) in MemoryIndexMeta, so any existing persisted index will have undefined for this field. The default resolved value of this.settings.chunking.headingAware is false (from mergeConfig). This means every existing user will hit undefined !== false → true on the very first run after this upgrade, triggering a full reindex even though they never enabled headingAware.

The fix is to normalise meta.chunkHeadingAware before comparing:

Suggested change

meta.chunkHeadingAware !== this.settings.chunking.headingAware ||

(meta.chunkHeadingAware ?? false) !== this.settings.chunking.headingAware ||

Prompt To Fix With AI

This is a comment left during a code review. Path: src/memory/manager-sync-ops.ts Line: 1006 Comment: **Undefined vs false comparison triggers spurious full reindex** `chunkHeadingAware` is declared optional (`chunkHeadingAware?: boolean`) in `MemoryIndexMeta`, so any existing persisted index will have `undefined` for this field. The default resolved value of `this.settings.chunking.headingAware` is `false` (from `mergeConfig`). This means every existing user will hit `undefined !== false` → `true` on the very first run after this upgrade, triggering a full reindex even though they never enabled `headingAware`. The fix is to normalise `meta.chunkHeadingAware` before comparing: ```suggestion (meta.chunkHeadingAware ?? false) !== this.settings.chunking.headingAware || ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-04-01T16:48:39Z

src/memory/internal.ts

+  const isHeading = (line: string): boolean => {
+    return /^#{1,6}\s+/.test(line.trim());
+  };


line.trim() causes false heading detection inside indented code blocks

Calling line.trim() strips all leading whitespace before applying the heading regex. In Markdown (CommonMark spec), a line with 4 or more leading spaces is treated as an indented code block, not a heading — so # not a heading inside a code block should not trigger a chunk boundary. After trimming, it becomes # not a heading and matches the regex, producing an incorrect split.

The fix is to drop .trim() and instead allow only 0–3 optional leading spaces (per spec):

Suggested change

const isHeading = (line: string): boolean => {

return /^#{1,6}\s+/.test(line.trim());

};

return /^ {0,3}#{1,6}[ \t]/.test(line);

This matches valid Markdown headings (≤3 spaces indent, followed by a space/tab after the # run) while correctly ignoring # content inside indented code blocks.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/memory/internal.ts Line: 348-350 Comment: **`line.trim()` causes false heading detection inside indented code blocks** Calling `line.trim()` strips all leading whitespace before applying the heading regex. In Markdown (CommonMark spec), a line with 4 or more leading spaces is treated as an indented code block, not a heading — so ` # not a heading` inside a code block should not trigger a chunk boundary. After trimming, it becomes `# not a heading` and matches the regex, producing an incorrect split. The fix is to drop `.trim()` and instead allow only 0–3 optional leading spaces (per spec): ```suggestion return /^ {0,3}#{1,6}[ \t]/.test(line); ``` This matches valid Markdown headings (≤3 spaces indent, followed by a space/tab after the `#` run) while correctly ignoring `#` content inside indented code blocks. How can I resolve this? If you propose a fix, please make it concise.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 32e45e501d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-01T16:49:44Z

src/memory/manager-sync-ops.ts

      meta.scopeHash !== configuredScopeHash ||
      meta.chunkTokens !== this.settings.chunking.tokens ||
      meta.chunkOverlap !== this.settings.chunking.overlap ||
+      meta.chunkHeadingAware !== this.settings.chunking.headingAware ||


Treat absent headingAware metadata as false in reindex check

This comparison forces a full reindex for every pre-existing index after upgrade because old metadata does not have chunkHeadingAware (so it is undefined) while the resolved setting defaults to false; undefined !== false evaluates true and triggers reindex even though chunking behavior is unchanged. On large memory stores this adds unnecessary startup/sync cost and can look like a regression. Compare with a defaulted value (e.g. meta.chunkHeadingAware ?? false) to avoid one-time mass reindexing.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-01T16:49:44Z

src/memory/internal.ts

+      flush();
+      carryOverlap();


Avoid carrying overlap when splitting on new headings

In heading-aware mode this keeps overlap lines from the previous section and prepends them to the next heading chunk, so chunks can cross heading boundaries despite the option’s stated behavior (“split on heading boundaries first”). With non-zero overlap, searches may retrieve mixed-section chunks and reduce relevance for structured docs. Heading-triggered flushes should start a fresh section chunk instead of reusing prior-section overlap.

Useful? React with 👍 / 👎.

Adds a headingAware config option that splits on level-2 and level-3 markdown headings (##, ###) before falling back to token-based chunking. This keeps heading-delimited sections as discrete chunks, improving retrieval precision for structured memory files. - chunkMarkdown() flushes current chunk on ## and ### headings when enabled - Tracks chunkHeadingAware in MemoryIndexMeta for reindex detection - Config schema, labels, and help text for the new option - Backward-compatible: files without headings behave exactly as today Closes openclaw#44395

openclaw-barnacle bot added agents Agent runtime and tooling size: XS labels Apr 1, 2026

chinar-amrutkar closed this Apr 1, 2026

greptile-apps bot reviewed Apr 1, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Apr 1, 2026

View reviewed changes

chinar-amrutkar reopened this Apr 1, 2026

chinar-amrutkar closed this Apr 1, 2026

chinar-amrutkar deleted the fix/memory-heading-aware-chunking branch April 1, 2026 21:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(memory): add heading-aware chunking option for improved search#59154

feat(memory): add heading-aware chunking option for improved search#59154
chinar-amrutkar wants to merge 2 commits intoopenclaw:mainfrom
chinar-amrutkar:fix/memory-heading-aware-chunking

chinar-amrutkar commented Apr 1, 2026

Uh oh!

greptile-apps bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

greptile-apps bot Apr 1, 2026

Uh oh!

greptile-apps bot Apr 1, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 1, 2026

Uh oh!

chatgpt-codex-connector bot Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	meta.chunkHeadingAware !== this.settings.chunking.headingAware \|\|
	(meta.chunkHeadingAware ?? false) !== this.settings.chunking.headingAware \|\|

Uh oh!

Conversation

chinar-amrutkar commented Apr 1, 2026

Uh oh!

greptile-apps bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Uh oh!

greptile-apps bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps bot commented Apr 1, 2026 •

edited

Loading