feat(bench): --adapter + --use-raw-turns flags + skill adapter schema fix by KailasMahavarkar · Pull Request #154 · orkait/graphstore

KailasMahavarkar · 2026-04-20T07:23:58Z

Three bench-side changes that should ride v0.4.

1. Skill adapter schema fix (bug)

`GraphStoreSkillAdapter.reset()` previously unregistered `message` kind and re-registered it without `content` REQUIRED + without `EMBED content`. Told the LLM to emit `DOCUMENT "text"` clause.

Problem: parent's query strategies read `n.get("content")`, which was empty because DOCUMENT populates blob/FTS5/vector but not the column. Retrieval dropped those rows. `retrieved_memories` bled entity `name` fields into context. Conv-26 F1 crashed to 0.023.

Fix: keep the baseline schema (`content:string` REQUIRED + `EMBED content`). Update the skill-adapter prompt to instruct the LLM to emit `content = "..."` as a typed field. A/B parity restored.

2. `--adapter {graphstore,skill}` flag on run_locomo

Lets users pick baseline vs skill adapter from CLI without editing Python. Default `graphstore` preserves prior behaviour.

Extras, only meaningful with `--adapter skill`:

`--skill-dump-dir PATH` - dump raw LLM output per session for audit
`--no-carry-facts` - disable cross-session fact memory

3. `--use-raw-turns` flag on run_locomo + `use_raw_turns` param on `load_locomo`

Forces the raw dialogue turns branch instead of the default author-distilled observations. Needed for a fair A/B of any LLM-ingest adapter: feeding pre-distilled facts to a distiller is a distillation-of-distillation test.

Default `False` preserves prior behaviour.

Verified

1766 tests pass on main after these changes; no regressions
Parser-roundtrip verified for new skill-adapter prompt
Dataset loader switch verified on conv-26: observations path = 7 msgs per session; raw path = 18 msgs per session

🤖 Generated with Claude Code

Adds --adapter {graphstore,skill} to run_locomo CLI so both the deterministic baseline and the LLM-driven skill adapter can be exercised from one entry point. Extras (only relevant to --adapter skill): --skill-dump-dir PATH save raw LLM output per session --no-carry-facts disable cross-session fact memory Default adapter unchanged (graphstore). Tests and prior benchmark invocations keep working as-is. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two coupled fixes needed for a fair LLM-ingest A/B test on LoCoMo. 1. datasets.py: add `use_raw_turns` parameter to load_locomo (default False matches prior behaviour - feeds author-distilled observations). When True, forces the raw-turns branch: ~20 dialogue turns per session with actual speaker/text instead of the 9 pre-extracted facts. run_locomo.py: expose as --use-raw-turns CLI flag. Why this matters: LoCoMo's observations are hand-distilled by the dataset authors. Feeding observations to a Mem0-style "LLM distills conversations into facts" adapter is distillation-of-distillation; you cannot tell if the LLM is adding value. The fair test is: both adapters ingest raw turns. Baseline stores turns verbatim; skill adapter runs LLM to produce its own facts. Then their F1 delta measures what LLM-at-ingest is actually worth. 2. graphstore_skill.py: drop the schema override that re-registered `message` without the `content` field. Parent schema (`content` REQUIRED + EMBED content) is load-bearing because parent's query strategies call `n.get("content")`. Earlier version switched to `DOCUMENT "..."` clause which populates the blob but leaves the column empty; retrieval dropped those rows, retrieved_memories bled entity `name` fields ("Melanie" as answer text), F1 crashed to 0.023 on conv-26. Prompt updated accordingly: tell the LLM to emit `content = "text"` field directly, no DOCUMENT clause. No API break. Both observations and raw-turns paths tested on conv-26: observations = 7 msgs/s1 (distilled facts) raw turns = 18 msgs/s1 (real dialogue), 419 total for the 19 sessions Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

KailasMahavarkar and others added 2 commits April 20, 2026 02:37

KailasMahavarkar merged commit 625551c into main Apr 20, 2026
4 checks passed

KailasMahavarkar deleted the feat/locomo-ab-run branch April 20, 2026 07:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(bench): --adapter + --use-raw-turns flags + skill adapter schema fix#154

feat(bench): --adapter + --use-raw-turns flags + skill adapter schema fix#154
KailasMahavarkar merged 2 commits intomainfrom
feat/locomo-ab-run

KailasMahavarkar commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KailasMahavarkar commented Apr 20, 2026

1. Skill adapter schema fix (bug)

2. `--adapter {graphstore,skill}` flag on run_locomo

3. `--use-raw-turns` flag on run_locomo + `use_raw_turns` param on `load_locomo`

Verified

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant