refactor: feature-weight model with lextool tune#210
Conversation
Introduce PathFeatures and FeatureWeights to separate feature extraction from weight application. The reranker now computes features via extract_features() and applies weighted_cost() instead of inline penalty calculations. Features: - structure_cost: capped transition cost sum - length_variance: segment length uniformity - script_cost: kanji+kana preference - te_kanji_count: te-form kanji penalty occurrences - single_kanji_count: single-char kanji penalty occurrences Weights are loaded from settings and can be tuned independently. This is preparation for automated weight tuning via accuracy corpus. Accuracy: 61/61 pass (4 skip), history 6/6 — identical to baseline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a tune subcommand that pre-computes Viterbi candidates once per reading, then evaluates all weight combinations using pure arithmetic. Default grid: 6×6×6×6 = 1,296 combos, finishes in under a second. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update defaults based on lextool tune results against accuracy corpus: structure: 25 → 0 length_variance: 2000 → 1000 te_kanji: 3500 → 2000 single_kanji: 4000 → 0 Accuracy unchanged (100%, 61/61). Reranker tests updated to verify feature extraction directly rather than depending on specific weight magnitudes for ranking assertions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR refactors reranker scoring into an explicit feature/weight model and adds a new lextool tune subcommand to grid-search feature weights against the accuracy corpus, then updates the default reranker weights based on tuning results.
Changes:
- Split reranker cost adjustments into
PathFeaturesextraction andFeatureWeightsapplication. - Add
lex-core::converter::tune+lextool tuneto precompute candidates and evaluate many weight combinations quickly. - Update default reranker weights in settings/tests (
lv=1000,te=2000,sk=0).
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| engine/crates/lex-core/src/settings.rs | Updates settings test expectations for tuned reranker defaults. |
| engine/crates/lex-core/src/default_settings.toml | Updates default reranker weights. |
| engine/crates/lex-core/src/converter/features.rs | Introduces feature extraction + weight application for reranking. |
| engine/crates/lex-core/src/converter/reranker.rs | Switches reranker step 3 to feature-based weighted adjustment; updates tests accordingly. |
| engine/crates/lex-core/src/converter/tune.rs | Adds precompute + grid-search implementation for weight tuning. |
| engine/crates/lex-core/src/converter/mod.rs | Exposes features internally and tune publicly. |
| engine/crates/lex-core/src/converter/explain.rs | Adjusts explain logic after refactor, but currently contains TODO/placeholder behavior. |
| engine/crates/lex-cli/src/bin/lextool.rs | Adds the tune CLI subcommand and text/JSON reporting. |
Comments suppressed due to low confidence (1)
engine/crates/lex-core/src/converter/explain.rs:157
explain_segmentssetssingle_char_kanji_penaltyto a hard-coded0with a TODO. This changes the diagnostic output to no longer reflect the actual reranker behavior for single-char kanji penalties. Either restore the previous per-segment penalty calculation (including the dictionary compound exemption), or rename/remove the field fromExplainSegmentuntil it can be reported accurately.
// te_form and single_char are path-level counts; for per-segment
// display we inline the check here.
let te_penalty = if let Some(c) = conn {
if let Some(prev) = prev_seg {
if c.is_function_word(prev.left_id)
&& (prev.surface == "て" || prev.surface == "で")
&& seg.surface.chars().any(crate::unicode::is_kanji)
{
settings().reranker.te_form_kanji_penalty
} else {
0
}
} else {
0
}
} else {
0
};
let _ = &features; // suppress unused; features used for validation
let sc_penalty = 0i64; // TODO: per-segment single_char needs refactor
ExplainSegment {
reading: seg.reading.clone(),
surface: seg.surface.clone(),
word_cost: seg.word_cost as i64,
segment_penalty: settings().cost.segment_penalty,
script_cost: script_cost(&seg.surface, seg.reading.chars().count()),
connection_cost: connection,
te_form_kanji_penalty: te_penalty,
single_char_kanji_penalty: sc_penalty,
left_id: seg.left_id,
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Remove stale "Default: 25" doc comment on FeatureWeights.structure - Return &str from top1_surface to avoid 130K string clones in grid search - Split evaluate_weights into count_passes (hot loop) and collect_surfaces (called only for default + best) to avoid redundant evaluation - Remove unused extract_features call in explain_segments Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fall back to default weights instead of panicking when the grid produces zero combinations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract is_single_char_kanji_penalised() in features.rs and reuse it from both extract_features() and explain_segments(), eliminating the duplicated logic. The per-segment penalty was hardcoded to 0 after the feature extraction refactor. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 8 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Extract compute_structure_cost() and pass precomputed value to extract_features() to avoid duplicate transition-cost computation in the reranker hot path - Add deterministic tie-break to grid_search sort (prefer weights closer to defaults when pass_count is equal) - Clarify from_settings() doc: structure/script are compile-time constants, not loaded from settings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 8 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Extract is_te_form_kanji_penalised() in features.rs, reuse from both extract_features() and explain_segments() (eliminates duplication) - Use FeatureWeights::from_settings() as grid_search baseline instead of Default, so the tune command matches actual production behaviour Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Avoids silent divergence if the constant values are changed later. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 8 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Remove structure/script from WeightGrid (fixed in from_settings) - Add 5 unit tests for tune module (precompute, grid_search, empty grid, top1_surface, weight_distance) - Remove redundant extract_features import in reranker tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The serde default functions for te_form_kanji_penalty (3500 → 2000) and single_char_kanji_penalty (4000 → 0) still had old values, which would silently apply if a custom settings TOML omitted those keys. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract DEFAULT_LENGTH_VARIANCE_WEIGHT, DEFAULT_TE_FORM_KANJI_PENALTY, DEFAULT_SINGLE_CHAR_KANJI_PENALTY, and DEFAULT_STRUCTURE_COST_TRANSITION_CAP as pub constants in settings.rs. Both the serde fallback functions and FeatureWeights::default() now reference these constants, eliminating the risk of values drifting out of sync. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
When both te_kanji and single_kanji weights are 0 (current defaults), pass dict=None to extract_features so the per-segment loop skips dictionary compound lookups entirely. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
te-form scoring uses connection matrix only, not dictionary lookups. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 8 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary
PathFeatures/FeatureWeightsに分離し、重みの独立管理を可能にしたBefore (default weights)
After (tuned weights)
tune で当たらない 1 件 (
にじゅうさん) は NumericRewriter が補っており、フルパイプラインでは pass。Test plan
cargo fmt --all --check && cargo clippy --workspace --all-features -- -D warningscargo test --workspace --all-features— 全 411 テスト passmise run accuracy— 100.0% (61/61)lextool tune— 実辞書+コーパスで実行確認🤖 Generated with Claude Code