refactor: feature-weight model with lextool tune by send · Pull Request #210 · send/lexime

send · 2026-04-05T16:14:26Z

Summary

features.rs: reranker のコスト調整ロジックを PathFeatures / FeatureWeights に分離し、重みの独立管理を可能にした
lextool tune: accuracy コーパスに対して FeatureWeights の grid search を行うサブコマンドを追加。Viterbi 候補を事前計算し、1,296 組の重みを <1秒で評価する
重み適用: tune 結果に基づきデフォルト重みを更新（structure=0, lv=1000, te=2000, sk=0）。accuracy 100% 維持

Before (default weights)

structure=25, length_variance=2000, te_kanji=3500, single_kanji=4000
tune pass rate: 96.7% (59/61)

After (tuned weights)

structure=0, length_variance=1000, te_kanji=2000, single_kanji=0
tune pass rate: 98.4% (60/61)

tune で当たらない 1 件 (にじゅうさん) は NumericRewriter が補っており、フルパイプラインでは pass。

Test plan

cargo fmt --all --check && cargo clippy --workspace --all-features -- -D warnings
cargo test --workspace --all-features — 全 411 テスト pass
mise run accuracy — 100.0% (61/61)
lextool tune — 実辞書+コーパスで実行確認

🤖 Generated with Claude Code

Introduce PathFeatures and FeatureWeights to separate feature extraction from weight application. The reranker now computes features via extract_features() and applies weighted_cost() instead of inline penalty calculations. Features: - structure_cost: capped transition cost sum - length_variance: segment length uniformity - script_cost: kanji+kana preference - te_kanji_count: te-form kanji penalty occurrences - single_kanji_count: single-char kanji penalty occurrences Weights are loaded from settings and can be tuned independently. This is preparation for automated weight tuning via accuracy corpus. Accuracy: 61/61 pass (4 skip), history 6/6 — identical to baseline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds a tune subcommand that pre-computes Viterbi candidates once per reading, then evaluates all weight combinations using pure arithmetic. Default grid: 6×6×6×6 = 1,296 combos, finishes in under a second. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Update defaults based on lextool tune results against accuracy corpus: structure: 25 → 0 length_variance: 2000 → 1000 te_kanji: 3500 → 2000 single_kanji: 4000 → 0 Accuracy unchanged (100%, 61/61). Reranker tests updated to verify feature extraction directly rather than depending on specific weight magnitudes for ranking assertions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR refactors reranker scoring into an explicit feature/weight model and adds a new lextool tune subcommand to grid-search feature weights against the accuracy corpus, then updates the default reranker weights based on tuning results.

Changes:

Split reranker cost adjustments into PathFeatures extraction and FeatureWeights application.
Add lex-core::converter::tune + lextool tune to precompute candidates and evaluate many weight combinations quickly.
Update default reranker weights in settings/tests (lv=1000, te=2000, sk=0).

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
engine/crates/lex-core/src/settings.rs	Updates settings test expectations for tuned reranker defaults.
engine/crates/lex-core/src/default_settings.toml	Updates default reranker weights.
engine/crates/lex-core/src/converter/features.rs	Introduces feature extraction + weight application for reranking.
engine/crates/lex-core/src/converter/reranker.rs	Switches reranker step 3 to feature-based weighted adjustment; updates tests accordingly.
engine/crates/lex-core/src/converter/tune.rs	Adds precompute + grid-search implementation for weight tuning.
engine/crates/lex-core/src/converter/mod.rs	Exposes `features` internally and `tune` publicly.
engine/crates/lex-core/src/converter/explain.rs	Adjusts explain logic after refactor, but currently contains TODO/placeholder behavior.
engine/crates/lex-cli/src/bin/lextool.rs	Adds the `tune` CLI subcommand and text/JSON reporting.

Comments suppressed due to low confidence (1)

engine/crates/lex-core/src/converter/explain.rs:157

explain_segments sets single_char_kanji_penalty to a hard-coded 0 with a TODO. This changes the diagnostic output to no longer reflect the actual reranker behavior for single-char kanji penalties. Either restore the previous per-segment penalty calculation (including the dictionary compound exemption), or rename/remove the field from ExplainSegment until it can be reported accurately.

            // te_form and single_char are path-level counts; for per-segment
            // display we inline the check here.
            let te_penalty = if let Some(c) = conn {
                if let Some(prev) = prev_seg {
                    if c.is_function_word(prev.left_id)
                        && (prev.surface == "て" || prev.surface == "で")
                        && seg.surface.chars().any(crate::unicode::is_kanji)
                    {
                        settings().reranker.te_form_kanji_penalty
                    } else {
                        0
                    }
                } else {
                    0
                }
            } else {
                0
            };
            let _ = &features; // suppress unused; features used for validation
            let sc_penalty = 0i64; // TODO: per-segment single_char needs refactor
            ExplainSegment {
                reading: seg.reading.clone(),
                surface: seg.surface.clone(),
                word_cost: seg.word_cost as i64,
                segment_penalty: settings().cost.segment_penalty,
                script_cost: script_cost(&seg.surface, seg.reading.chars().count()),
                connection_cost: connection,
                te_form_kanji_penalty: te_penalty,
                single_char_kanji_penalty: sc_penalty,
                left_id: seg.left_id,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Remove stale "Default: 25" doc comment on FeatureWeights.structure - Return &str from top1_surface to avoid 130K string clones in grid search - Split evaluate_weights into count_passes (hot loop) and collect_surfaces (called only for default + best) to avoid redundant evaluation - Remove unused extract_features call in explain_segments Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fall back to default weights instead of panicking when the grid produces zero combinations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Extract is_single_char_kanji_penalised() in features.rs and reuse it from both extract_features() and explain_segments(), eliminating the duplicated logic. The per-segment penalty was hardcoded to 0 after the feature extraction refactor. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Extract compute_structure_cost() and pass precomputed value to extract_features() to avoid duplicate transition-cost computation in the reranker hot path - Add deterministic tie-break to grid_search sort (prefer weights closer to defaults when pass_count is equal) - Clarify from_settings() doc: structure/script are compile-time constants, not loaded from settings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Extract is_te_form_kanji_penalised() in features.rs, reuse from both extract_features() and explain_segments() (eliminates duplication) - Use FeatureWeights::from_settings() as grid_search baseline instead of Default, so the tune command matches actual production behaviour Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Avoids silent divergence if the constant values are changed later. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Remove structure/script from WeightGrid (fixed in from_settings) - Add 5 unit tests for tune module (precompute, grid_search, empty grid, top1_surface, weight_distance) - Remove redundant extract_features import in reranker tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

The serde default functions for te_form_kanji_penalty (3500 → 2000) and single_char_kanji_penalty (4000 → 0) still had old values, which would silently apply if a custom settings TOML omitted those keys. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Extract DEFAULT_LENGTH_VARIANCE_WEIGHT, DEFAULT_TE_FORM_KANJI_PENALTY, DEFAULT_SINGLE_CHAR_KANJI_PENALTY, and DEFAULT_STRUCTURE_COST_TRANSITION_CAP as pub constants in settings.rs. Both the serde fallback functions and FeatureWeights::default() now reference these constants, eliminating the risk of values drifting out of sync. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

When both te_kanji and single_kanji weights are 0 (current defaults), pass dict=None to extract_features so the per-segment loop skips dictionary compound lookups entirely. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

te-form scoring uses connection matrix only, not dictionary lookups. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

send and others added 3 commits April 6, 2026 00:41

Copilot AI review requested due to automatic review settings April 5, 2026 16:14

Copilot started reviewing on behalf of send April 5, 2026 16:15 View session