feat: improve search quality with new semantic+keyword search (#343)#346

Merged

kiyotis merged 381 commits into

mainfrom

343-improve-search-quality

May 25, 2026

Contributor

kiyotis commented May 19, 2026

Closes #343

Approach

2-stage search architecture replacing the legacy full-text-search approach:

Semantic search: 2-stage (Stage1: page selection from index.md, Stage2: section selection from knowledge JSON) using LLM judgment
Keyword search: deterministic term→section_id lookup via RBKC-generated terms.json
QA workflow: hearing → semantic search → read sections → answer generation → hallucination verify
Code analysis workflow: keyword search → read sections → report

Phase A built and benchmarked each component independently. Phase B deployed to skill and ran E2E benchmarks.

Tasks

Expert Review

AI-driven expert reviews conducted before PR creation (see .claude/rules/expert-review.md):

Software Engineer (Phase A) - 0 Findings
QA Engineer (Phase A) - 0 Findings
Prompt Engineer (Answer) - 0 Findings (after fixes)
Software Engineer (Answer) - 0 Findings

Success Criteria Check

Criterion	Status	Evidence
E2E benchmark runs without errors (all 30 scenarios)	❌ Not Met	13 errors in run-1 (B-4-1 in progress)
New search accuracy ≥ baseline-current (83.7%)	❌ Not Met	Pending error fix + 3 runs
Hallucination PASS ≥ baseline-current (14.4%)	❌ Not Met	Pending error fix + 3 runs

🤖 Generated with Claude Code

kiyotis added the enhancement label

kiyotis commented

View reviewed changes

tools/rbkc/docs/rbkc-verify-quality-design.md Outdated

tools/rbkc/docs/rbkc-verify-quality-design.md Outdated

tools/rbkc/docs/rbkc-verify-quality-design.md Outdated

tools/rbkc/docs/rbkc-verify-quality-design.md Outdated

tools/rbkc/docs/rbkc-verify-quality-design.md Outdated

.gitignore

.claude/skills/nabledge-6/workflows/qa.md Outdated

.claude/skills/nabledge-6/assets/answer-generation.md Outdated

kiyotis and others added 28 commits

May 25, 2026 10:18


          docs: update tasks.md — rewrite B-4-1 as run-1 stabilization loop

6767dc8

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          docs: update tasks.md — add workflow rules

024abbb

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          docs: update tasks.md — B-0-4 list all 6 design docs

6769e85

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          docs: update tasks.md — remove benchmark-design from B-0-4 scope

41fdafd

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          docs: fix semantic-search-design.md — stage prompts are inlined in wo…

6ee1223

…rkflow

Also update tasks.md B-0-4 steps to reflect actual file structure.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          docs: remove assets/ from search-design.md directory layout

86478a4

Prompts are inlined in workflow MDs — assets/ directory no longer exists.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          docs: update tasks.md — session save 2026-05-19

aa261cf

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          docs: update tasks.md — add B-0-4 subtasks from design/impl gap analysis

c73c366

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          feat: remove Step 0 pre-supplied context from qa.md; expand hearing_a…

ae8ce90

…nswer into question text in benchmark

qa.md Step 0 injected a structured context section to bypass hearing,
coupling the skill to a benchmark-internal convention. Removed it so
the skill always runs hearing for every call.

run_e2e.py now appends hearing_answer (processing_type + goal) directly
to the question text, letting the hearing workflow classify it naturally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          docs: update tasks.md — B-0-4-A complete

e22f728

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          feat: replace verify FAIL warning with retry in qa.md

0b754aa

On FAIL, qa.md now re-runs answer.md with verify issues as exclusions
(max 1 retry) instead of appending a warning to the answer.
verify.md returns raw JSON; qa.md owns the retry logic.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          docs: update tasks.md — B-0-4-B complete

57be904

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          feat: centralize read-sections.sh call in qa.md Step 3

c298e6d

qa.md now fetches sections_content once (max 10, high-first) and passes
it to both answer.md and verify.md. answer.md and verify.md no longer
call read-sections.sh themselves, ensuring both use identical data.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          docs: update tasks.md — B-0-4-C complete

c7e106d

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          feat: hearing.md returns only hearing_answer_str, not status/questions

2f87702

status and questions were internal classification state leaking into
the downstream interface. hearing.md now returns only the formatted
hearing_answer string that callers actually need.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          docs: update tasks.md — B-0-4-D complete

69a7827

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          feat: add low_confidence_note to Stage 1 trace when fewer than 3 page…

8a9e353

…s selected

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          docs: update tasks.md — B-0-4-E complete

cb5a39e

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          feat: add candidates field to hearing.md trace

ecb148e

Records the list of choices presented to the user (empty for skip,
processing_types list for ask), enabling quality evaluation of what
options were shown in each hearing session.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          docs: update tasks.md — B-0-4-F complete

c38fda9

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          docs: add trace schema to semantic-search-design.md output spec

fa9a853

Documents trace.excluded, trace.low_confidence_note, and trace.stage1_files
fields that are present in the implementation but were missing from the spec.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          docs: update tasks.md — B-0-4-G complete, all impl fixes done

1040d32

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          fix: align hearing_answer variable name in answer.md and qa.md

dc140fc

answer.md declared {hearing_answer_str} but the prompt body used
{hearing_answer}, causing the hearing context to never reach the LLM.
Renamed the input to {hearing_answer} throughout (answer.md input,
qa.md Step 4 and Step 6 calls).

Also documents the retry in Step 6 as "once only" to make the bound
self-documenting.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          docs: update tasks.md — SE review complete, Finding fixed

429f32b

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          docs: update tasks.md — session save 2026-05-19

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          feat: remove Critical Constraints, Error Handling, Knowledge Structur…

ac8e752

…e from SKILL.md

These sections belong in the individual workflow files (answer.md,
verify.md, etc.) where the LLM executes them. Keeping them in SKILL.md
gives the LLM a shortcut to respond without executing the workflow.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          docs: update tasks.md — add B-0-4-H workflow boilerplate removal

0e70be0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          refactor: remove boilerplate from verify.md workflow

a66ca56

Remove '**Tool**: In-memory (LLM generation)', 'Call LLM with the
following prompt, substituting the variables:', and '---' separators.
Prompt content is preserved as direct step instructions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

kiyotis and others added 28 commits

May 25, 2026 11:53


          docs: update tasks.md — C-2b semantic-search PASS, code-analysis待ち

0f2efbf

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          docs: update tasks.md — C-2b code-analysis PASS, 全フロー完了

c8f0c44


          docs: update tasks.md — session end, C-3 next

77aba26


          docs: update tasks.md — C-3に他バージョン動作確認を追加

c2bc36a


          docs: update tasks.md — C-3 checklist with full cross-version diff re…

f2989f1

…sults

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          feat: expand code-analysis.md and command examples to v5/v1.4/v1.3/v1.2

f1c5119

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          feat: expand read-sections.sh and prefill-template.sh updates to v5/v…

5ad00ee

…1.4/v1.3/v1.2

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          docs: update tasks.md — C-3 step1 all checks done

0f10748

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          docs: update tasks.md — C-3 step2 all flows verified (v5/v1.4/v1.3/v1.2)

56cf1ae

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          feat: add code-analysis template to workflows/code-analysis/ for v5/v…

62aec4f

…1.4/v1.3/v1.2

prefill-template.sh was updated (session 4) to reference
workflows/code-analysis/template.md (matching v6 layout), but the
template file was not placed at that path — only assets/code-analysis-template.md
existed. Copy template to the path the script expects.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          fix: align code-analysis template paths with v6 for v5/v1.4/v1.3/v1.2

code-analysis.md referenced assets/code-analysis-template*.md while
prefill-template.sh referenced workflows/code-analysis/template.md.
Add template-guide.md to workflows/code-analysis/ and update
code-analysis.md to use workflows/code-analysis/ consistently (matching v6).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          fix: update self-reference paths in template-guide.md for v5/v1.4/v1.…

11364ae

…3/v1.2

template-guide.md was copied from assets/ but still contained
assets/code-analysis-template.md references internally.
Update to workflows/code-analysis/template.md to match actual path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          fix: remove version name from Knowledge Base heading in code-analysis…

a0469e7

… template

### Knowledge Base (Nabledge-X) is unnecessary in user-facing output.
Replace with ### Knowledge Base across all versions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          fix: simplify dependency classification in code-analysis.md (all vers…

6cc6204

…ions)

Merge JDK/Jakarta EE and third-party libraries into a single
"Others" category since both have the same handling: note but don't trace.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          fix: normalize trailing newline in SKILL.md for v5/v1.4/v1.3/v1.2

24f09ab

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          fix: remove invalid official docs example from template-guide for v1.…

f89c2ee

…4/v1.3/v1.2

v1.4/v1.3/v1.2 have no public official documentation. The sample example
contained 5-LATEST URLs that could mislead LLM to include invalid links
in generated output. Remove the example and clarify in placeholder description.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          docs: update tasks.md — C-3 diff check completed (session 6 additiona…

…l fixes)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          fix: make prefill-template.sh work regardless of current directory

01e3c69

Output path and file searches used relative paths, causing wrong output
location when Bash tool's cwd differed from repo root. Use PROJECT_ROOT
(from git rev-parse) for all path calculations.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          fix: strip PROJECT_ROOT from file paths in prefill-template.sh links

d3cd043

find with PROJECT_ROOT returned absolute paths; strip the prefix so
generated source and knowledge base links remain relative.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          docs: update tasks.md — C-3 step2 v1.4 code-analysis verified + prefi…

2b6cf10

…ll-template.sh bugs recorded

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          docs: update tasks.md — C-3 approved, move to Done

57f2605

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          refactor: remove _scenario_field from test-setup.sh, hardcode questio…

157d1ee

…ns/keywords

Removes dependency on nabledge-test scenarios.json before B-7 deletes it.
Questions and keywords are hardcoded directly (qa-002 for v6/v5, qa-001 for v1.4/v1.3/v1.2).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          chore: remove nabledge-test references from settings.json and rules

f9d0dfe

- settings.json: remove Skill(nabledge-test) from allowedTools
- rules/nabledge-skill.md: remove nabledge-test cross-version and baseline rules
- rules/temporary-files.md: update example to reflect test-setup.sh workspace structure

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          docs: update tasks.md — B-7 complete

8b62a35

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          docs: update tasks.md — B-8-pre complete

c015d17

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          fix: remove marker-based prompt extraction from test-setup.sh dynamic…

a2b1aba

… checks

CC: use /n{v} slash command directly instead of extracting from n{v}.md
GHC: use full prompt file content instead of extracting from #runSubagent marker

The marker approach broke when subagent delegation was removed (commit 20838f2).
File paths are determined by version alone — no extraction needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          fix: remove old GHC prompt files before installing new version in set…

f80be35

…up-ghc.sh

Without this, a filename change leaves the old prompt file in the user's
.github/prompts/ directory. rm -f n{v}*.prompt.md before copy catches it.

Also update nabledge-skill.md rules: remove stale marker references
(markers removed in 20838f2), document the GHC prompt filename risk.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


          docs: update tasks.md — session 9 fixes recorded

e80798b

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

kiyotis merged commit 44f206e into main

kiyotis deleted the 343-improve-search-quality branch

May 25, 2026 07:00

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels