Semantic search ignores note body — `firstParagraph` misreads gray-matter output, embeddings built from title+tags+title

## Summary

Two related bugs cause semantic search to operate on title+tags only, with note bodies effectively invisible to the embedder. Root cause is the same in both sites: `p.startsWith('#')` checks before `trim()`, so a paragraph that begins with a newline (which is what `gray-matter` returns after stripping YAML frontmatter) is not recognized as a heading and slips through.

## Observed behavior

Running `kg_search` over a vault where every note has shape:

```
---
<YAML frontmatter>
---

# Note title

First body paragraph...

## Section
```

produces:

- Excerpts in results equal the `# Heading` line rather than the first body paragraph.
- Top-1 result score is high (≈0.35) when query matches the title literally; ranks 2-3 collapse toward zero or negative, because the embedding vector does not cover the note body at all.
- Notes whose title does not mention the query keyword are effectively unreachable via semantic search, even when their body is a strong semantic match.

Minimal JSON-RPC repro (after vault is indexed):

```bash
echo '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"kg_search","arguments":{"query":"<topic-in-body-only>","limit":3}}}' | node dist/mcp/index.js
```

Inspect the returned `excerpt` fields — they are `# <title>` rather than the first body paragraph.

## Root cause

### Site 1 — `src/lib/store.ts:283` (`firstParagraph` helper)

```ts
function firstParagraph(content: string, maxLen: number): string {
  const para = content.split(/\n\n+/).find(
    p => p.trim().length > 0 && !p.startsWith('#')
  );
  ...
}
```

`gray-matter` returns content with a leading newline after stripping the frontmatter closing `---`. So `content.split(/\n\n+/)[0]` is commonly `\n# Title`, not `# Title`. The predicate `p.startsWith('#')` evaluates `false` on `\n# Title`, so the title paragraph is *not* skipped and becomes the "first paragraph".

### Site 2 — `src/lib/embedder.ts:37` (`buildEmbeddingText`) — **higher-impact**

```ts
static buildEmbeddingText(title, tags, content): string {
  const firstParagraph = content.split(/\n\n+/)[0] ?? '';
  const parts = [title];
  if (tags.length > 0) parts.push(tags.join(', '));
  if (firstParagraph) parts.push(firstParagraph);
  return parts.join('\n');
}
```

Here `split[0]` is taken unconditionally. Given the gray-matter behavior above, `split[0]` is effectively the note's `# Title` line for virtually every note. The embedding text then becomes `title + tags + # title` — the body never enters the vector. This is the core reason semantic recall collapses outside literal title matches.

## Suggested fix

Same predicate in both sites: `!p.trim().startsWith('#')`, applied via a shared helper that picks the first non-empty, non-heading paragraph.

```ts
function firstBodyParagraph(content: string): string {
  return content.split(/\n\n+/).find(
    p => p.trim().length > 0 && !p.trim().startsWith('#')
  ) ?? '';
}
```

- `store.ts:283` — use the helper and then cap length.
- `embedder.ts:37` — use the helper instead of `split[0]`.

After the patch, existing indexes need a full rebuild because embeddings change (`kg index --force`).

## Impact

- Search recall: any note whose title omits the query keyword is currently invisible to `kg_search` even when strongly relevant in body. After fix, body-level matches surface.
- Score distribution: top-K scores should cluster less tightly around title matches alone.
- Excerpts become meaningful intro text instead of duplicating the title.

## Environment

- `knowledge-graph` HEAD: commit `1d2481e` (feat: add write operations)
- `@modelcontextprotocol/sdk` 1.27.1
- Node 24.13.1, macOS
- Vault: ~70 notes, every note authored with a YAML frontmatter block, a single `# Title` as the first line of the body, then `## Section` headings

Happy to open a PR if the maintainer agrees with the approach.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Semantic search ignores note body — `firstParagraph` misreads gray-matter output, embeddings built from title+tags+title #6

Summary

Observed behavior

Root cause

Site 1 — `src/lib/store.ts:283` (`firstParagraph` helper)

Site 2 — `src/lib/embedder.ts:37` (`buildEmbeddingText`) — higher-impact

Suggested fix

Impact

Environment

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Semantic search ignores note body — firstParagraph misreads gray-matter output, embeddings built from title+tags+title #6

Description

Summary

Observed behavior

Root cause

Site 1 — src/lib/store.ts:283 (firstParagraph helper)

Site 2 — src/lib/embedder.ts:37 (buildEmbeddingText) — higher-impact

Suggested fix

Impact

Environment

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Semantic search ignores note body — `firstParagraph` misreads gray-matter output, embeddings built from title+tags+title #6

Site 1 — `src/lib/store.ts:283` (`firstParagraph` helper)

Site 2 — `src/lib/embedder.ts:37` (`buildEmbeddingText`) — higher-impact