Skip to content

feat: improve search quality with new semantic+keyword search (#343)#346

Merged
kiyotis merged 381 commits into
mainfrom
343-improve-search-quality
May 25, 2026
Merged

feat: improve search quality with new semantic+keyword search (#343)#346
kiyotis merged 381 commits into
mainfrom
343-improve-search-quality

Conversation

@kiyotis
Copy link
Copy Markdown
Contributor

@kiyotis kiyotis commented May 19, 2026

Closes #343

Approach

2-stage search architecture replacing the legacy full-text-search approach:

  • Semantic search: 2-stage (Stage1: page selection from index.md, Stage2: section selection from knowledge JSON) using LLM judgment
  • Keyword search: deterministic term→section_id lookup via RBKC-generated terms.json
  • QA workflow: hearing → semantic search → read sections → answer generation → hallucination verify
  • Code analysis workflow: keyword search → read sections → report

Phase A built and benchmarked each component independently. Phase B deployed to skill and ran E2E benchmarks.

Tasks

See tasks.md.

Expert Review

AI-driven expert reviews conducted before PR creation (see .claude/rules/expert-review.md):

Success Criteria Check

Criterion Status Evidence
E2E benchmark runs without errors (all 30 scenarios) ❌ Not Met 13 errors in run-1 (B-4-1 in progress)
New search accuracy ≥ baseline-current (83.7%) ❌ Not Met Pending error fix + 3 runs
Hallucination PASS ≥ baseline-current (14.4%) ❌ Not Met Pending error fix + 3 runs

🤖 Generated with Claude Code

@kiyotis kiyotis added the enhancement New feature or request label May 19, 2026
Comment thread tools/rbkc/docs/rbkc-verify-quality-design.md Outdated
Comment thread tools/rbkc/docs/rbkc-verify-quality-design.md Outdated
Comment thread tools/rbkc/docs/rbkc-verify-quality-design.md Outdated
Comment thread tools/rbkc/docs/rbkc-verify-quality-design.md Outdated
Comment thread tools/rbkc/docs/rbkc-verify-quality-design.md Outdated
Comment thread .gitignore
Comment thread .claude/skills/nabledge-6/workflows/qa.md Outdated
Comment thread .claude/skills/nabledge-6/assets/answer-generation.md Outdated
kiyotis and others added 28 commits May 25, 2026 10:18
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rkflow

Also update tasks.md B-0-4 steps to reflect actual file structure.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Prompts are inlined in workflow MDs — assets/ directory no longer exists.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nswer into question text in benchmark

qa.md Step 0 injected a structured context section to bypass hearing,
coupling the skill to a benchmark-internal convention. Removed it so
the skill always runs hearing for every call.

run_e2e.py now appends hearing_answer (processing_type + goal) directly
to the question text, letting the hearing workflow classify it naturally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
On FAIL, qa.md now re-runs answer.md with verify issues as exclusions
(max 1 retry) instead of appending a warning to the answer.
verify.md returns raw JSON; qa.md owns the retry logic.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
qa.md now fetches sections_content once (max 10, high-first) and passes
it to both answer.md and verify.md. answer.md and verify.md no longer
call read-sections.sh themselves, ensuring both use identical data.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
status and questions were internal classification state leaking into
the downstream interface. hearing.md now returns only the formatted
hearing_answer string that callers actually need.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s selected

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Records the list of choices presented to the user (empty for skip,
processing_types list for ask), enabling quality evaluation of what
options were shown in each hearing session.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents trace.excluded, trace.low_confidence_note, and trace.stage1_files
fields that are present in the implementation but were missing from the spec.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
answer.md declared {hearing_answer_str} but the prompt body used
{hearing_answer}, causing the hearing context to never reach the LLM.
Renamed the input to {hearing_answer} throughout (answer.md input,
qa.md Step 4 and Step 6 calls).

Also documents the retry in Step 6 as "once only" to make the bound
self-documenting.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e from SKILL.md

These sections belong in the individual workflow files (answer.md,
verify.md, etc.) where the LLM executes them. Keeping them in SKILL.md
gives the LLM a shortcut to respond without executing the workflow.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove '**Tool**: In-memory (LLM generation)', 'Call LLM with the
following prompt, substituting the variables:', and '---' separators.
Prompt content is preserved as direct step instructions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
kiyotis and others added 28 commits May 25, 2026 11:53
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…sults

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…1.4/v1.3/v1.2

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…1.4/v1.3/v1.2

prefill-template.sh was updated (session 4) to reference
workflows/code-analysis/template.md (matching v6 layout), but the
template file was not placed at that path — only assets/code-analysis-template.md
existed. Copy template to the path the script expects.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
code-analysis.md referenced assets/code-analysis-template*.md while
prefill-template.sh referenced workflows/code-analysis/template.md.
Add template-guide.md to workflows/code-analysis/ and update
code-analysis.md to use workflows/code-analysis/ consistently (matching v6).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…3/v1.2

template-guide.md was copied from assets/ but still contained
assets/code-analysis-template.md references internally.
Update to workflows/code-analysis/template.md to match actual path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… template

### Knowledge Base (Nabledge-X) is unnecessary in user-facing output.
Replace with ### Knowledge Base across all versions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ions)

Merge JDK/Jakarta EE and third-party libraries into a single
"Others" category since both have the same handling: note but don't trace.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…4/v1.3/v1.2

v1.4/v1.3/v1.2 have no public official documentation. The sample example
contained 5-LATEST URLs that could mislead LLM to include invalid links
in generated output. Remove the example and clarify in placeholder description.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…l fixes)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Output path and file searches used relative paths, causing wrong output
location when Bash tool's cwd differed from repo root. Use PROJECT_ROOT
(from git rev-parse) for all path calculations.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
find with PROJECT_ROOT returned absolute paths; strip the prefix so
generated source and knowledge base links remain relative.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ll-template.sh bugs recorded

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ns/keywords

Removes dependency on nabledge-test scenarios.json before B-7 deletes it.
Questions and keywords are hardcoded directly (qa-002 for v6/v5, qa-001 for v1.4/v1.3/v1.2).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- settings.json: remove Skill(nabledge-test) from allowedTools
- rules/nabledge-skill.md: remove nabledge-test cross-version and baseline rules
- rules/temporary-files.md: update example to reflect test-setup.sh workspace structure

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… checks

CC: use /n{v} slash command directly instead of extracting from n{v}.md
GHC: use full prompt file content instead of extracting from #runSubagent marker

The marker approach broke when subagent delegation was removed (commit 20838f2).
File paths are determined by version alone — no extraction needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…up-ghc.sh

Without this, a filename change leaves the old prompt file in the user's
.github/prompts/ directory. rm -f n{v}*.prompt.md before copy catches it.

Also update nabledge-skill.md rules: remove stale marker references
(markers removed in 20838f2), document the GHC prompt filename risk.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@kiyotis kiyotis merged commit 44f206e into main May 25, 2026
@kiyotis kiyotis deleted the 343-improve-search-quality branch May 25, 2026 07:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

As a nabledge user, I want search quality held to the same standard as RBKC so that answers are accurate and hallucination-free

1 participant