feat: improve search quality with new semantic+keyword search (#343)#346
Merged
Conversation
kiyotis
commented
May 19, 2026
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rkflow Also update tasks.md B-0-4 steps to reflect actual file structure. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Prompts are inlined in workflow MDs — assets/ directory no longer exists. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nswer into question text in benchmark qa.md Step 0 injected a structured context section to bypass hearing, coupling the skill to a benchmark-internal convention. Removed it so the skill always runs hearing for every call. run_e2e.py now appends hearing_answer (processing_type + goal) directly to the question text, letting the hearing workflow classify it naturally. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
On FAIL, qa.md now re-runs answer.md with verify issues as exclusions (max 1 retry) instead of appending a warning to the answer. verify.md returns raw JSON; qa.md owns the retry logic. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
qa.md now fetches sections_content once (max 10, high-first) and passes it to both answer.md and verify.md. answer.md and verify.md no longer call read-sections.sh themselves, ensuring both use identical data. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
status and questions were internal classification state leaking into the downstream interface. hearing.md now returns only the formatted hearing_answer string that callers actually need. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s selected Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Records the list of choices presented to the user (empty for skip, processing_types list for ask), enabling quality evaluation of what options were shown in each hearing session. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents trace.excluded, trace.low_confidence_note, and trace.stage1_files fields that are present in the implementation but were missing from the spec. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
answer.md declared {hearing_answer_str} but the prompt body used
{hearing_answer}, causing the hearing context to never reach the LLM.
Renamed the input to {hearing_answer} throughout (answer.md input,
qa.md Step 4 and Step 6 calls).
Also documents the retry in Step 6 as "once only" to make the bound
self-documenting.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e from SKILL.md These sections belong in the individual workflow files (answer.md, verify.md, etc.) where the LLM executes them. Keeping them in SKILL.md gives the LLM a shortcut to respond without executing the workflow. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove '**Tool**: In-memory (LLM generation)', 'Call LLM with the following prompt, substituting the variables:', and '---' separators. Prompt content is preserved as direct step instructions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…sults Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…1.4/v1.3/v1.2 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…1.4/v1.3/v1.2 prefill-template.sh was updated (session 4) to reference workflows/code-analysis/template.md (matching v6 layout), but the template file was not placed at that path — only assets/code-analysis-template.md existed. Copy template to the path the script expects. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
code-analysis.md referenced assets/code-analysis-template*.md while prefill-template.sh referenced workflows/code-analysis/template.md. Add template-guide.md to workflows/code-analysis/ and update code-analysis.md to use workflows/code-analysis/ consistently (matching v6). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…3/v1.2 template-guide.md was copied from assets/ but still contained assets/code-analysis-template.md references internally. Update to workflows/code-analysis/template.md to match actual path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… template ### Knowledge Base (Nabledge-X) is unnecessary in user-facing output. Replace with ### Knowledge Base across all versions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ions) Merge JDK/Jakarta EE and third-party libraries into a single "Others" category since both have the same handling: note but don't trace. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…4/v1.3/v1.2 v1.4/v1.3/v1.2 have no public official documentation. The sample example contained 5-LATEST URLs that could mislead LLM to include invalid links in generated output. Remove the example and clarify in placeholder description. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…l fixes) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Output path and file searches used relative paths, causing wrong output location when Bash tool's cwd differed from repo root. Use PROJECT_ROOT (from git rev-parse) for all path calculations. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
find with PROJECT_ROOT returned absolute paths; strip the prefix so generated source and knowledge base links remain relative. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ll-template.sh bugs recorded Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ns/keywords Removes dependency on nabledge-test scenarios.json before B-7 deletes it. Questions and keywords are hardcoded directly (qa-002 for v6/v5, qa-001 for v1.4/v1.3/v1.2). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- settings.json: remove Skill(nabledge-test) from allowedTools - rules/nabledge-skill.md: remove nabledge-test cross-version and baseline rules - rules/temporary-files.md: update example to reflect test-setup.sh workspace structure Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… checks
CC: use /n{v} slash command directly instead of extracting from n{v}.md
GHC: use full prompt file content instead of extracting from #runSubagent marker
The marker approach broke when subagent delegation was removed (commit 20838f2).
File paths are determined by version alone — no extraction needed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…up-ghc.sh
Without this, a filename change leaves the old prompt file in the user's
.github/prompts/ directory. rm -f n{v}*.prompt.md before copy catches it.
Also update nabledge-skill.md rules: remove stale marker references
(markers removed in 20838f2), document the GHC prompt filename risk.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #343
Approach
2-stage search architecture replacing the legacy full-text-search approach:
Phase A built and benchmarked each component independently. Phase B deployed to skill and ran E2E benchmarks.
Tasks
See tasks.md.
Expert Review
AI-driven expert reviews conducted before PR creation (see
.claude/rules/expert-review.md):Success Criteria Check
🤖 Generated with Claude Code