refactor: absorb 3 reranker heuristics into compile-time dictionary costs by send · Pull Request #209 · send/lexime

send · 2026-04-05T14:36:37Z

Summary

reranker の5つのヒューリスティックのうち3つを辞書コンパイル時のコスト調整に移行し、ランタイムコードを削減。

person_name_penalty (+2000): 人名エントリのコストを辞書コンパイル時に加算
pronoun_bonus (-3500): 代名詞エントリのコストを辞書コンパイル時に減算
non_independent_kanji_penalty (+1500): 非自立語で漢字表層のエントリのコストを加算

dictool compile に --id-def オプションを追加し、id.def のロール情報を使ってコスト調整。216,994 エントリが対象。

残りの2つ（te_form_kanji_penalty, single_char_kanji_penalty）は文脈依存のため reranker に残留。

変更点

dict_ops.rs: compile 時コスト調整（--id-def オプション）
reranker.rs: 3関数削除（pronoun_bonus, person_name_penalty, non_independent_kanji_penalty）
settings.rs / default_settings.toml: 対応する3パラメータ削除
explain.rs: コスト内訳表示から削除済みペナルティを除去
mise.toml: dictool compile に --id-def を追加

Test plan

cargo fmt/clippy/test 全 pass（323 + 68 + 20 tests）
mise run accuracy — 61/61 pass (4 skip) ← ベースラインと同一
mise run accuracy-history — 6/6 pass ← ベースラインと同一

🤖 Generated with Claude Code

…y costs Move two reranker heuristics into compile-time dictionary cost adjustments: - person_name (role 6): +2000 cost offset at compile time - pronoun (role 5): -3500 cost offset at compile time This eliminates two post-hoc reranker passes, making the Viterbi search see more accurate costs during beam search. 216,105 entries adjusted. Changes: - dict compile: add --id-def option for role-based cost adjustment - reranker: remove pronoun_bonus() and person_name_penalty() - settings: remove pronoun_cost_bonus and person_name_penalty params - explain: remove pronoun_bonus from cost breakdown display Accuracy: 61/61 pass (4 skip), history 6/6 — identical to baseline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Move non-independent kanji penalty to compile-time: entries with role=NON_INDEPENDENT and kanji surface get +1500 cost offset. 889 additional entries adjusted (216,994 total with person_name/pronoun). Removes non_independent_kanji_penalty from reranker, settings, and explain output. te_form_kanji_penalty remains (context-dependent, requires role expansion to absorb). Accuracy: 61/61 pass (4 skip), history 6/6 — identical to baseline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR moves 3 reranker heuristics (person-name penalty, pronoun bonus, non-independent-kanji penalty) into dictionary compile-time cost offsets using Mozc id.def role information, reducing runtime reranker complexity while keeping the remaining context-dependent heuristics in the reranker.

Changes:

Add dictool compile --id-def support and apply compile-time cost offsets based on morpheme roles.
Remove the corresponding 3 settings knobs and runtime reranker/explain accounting for those heuristics.
Update the mise dictionary build task to pass --id-def.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
mise.toml	Update dictionary build task to pass `--id-def` to `dictool compile`.
engine/crates/lex-core/src/settings.rs	Remove 3 reranker settings parameters and validation/tests for them.
engine/crates/lex-core/src/default_settings.toml	Remove the 3 deleted reranker parameters from defaults.
engine/crates/lex-core/src/converter/reranker.rs	Remove 3 heuristics from runtime reranking and delete associated tests.
engine/crates/lex-core/src/converter/explain.rs	Remove display/breakdown fields for the deleted heuristics.
engine/crates/lex-cli/src/commands/dict_ops.rs	Add compile-time cost adjustment logic driven by `id.def` roles.
engine/crates/lex-cli/src/bin/dictool.rs	Add `--id-def` option to the `compile` subcommand and plumb through.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Address Copilot review: - Auto-detect id.def in input_dir when --id-def is not specified - Validate left_id against roles table size instead of silently defaulting to role 0 - Add id.def to mise.toml dict-mozc sources for rebuild tracking Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Update stale reranker comment (removed heuristics still mentioned) - Use PathBuf instead of String for id_def resolution - Remove dead code: is_non_independent(), is_pronoun(), is_person_name() on ConnectionMatrix (no callers after compile-time cost absorption) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

send and others added 2 commits April 5, 2026 23:27

Copilot AI review requested due to automatic review settings April 5, 2026 14:36

Copilot started reviewing on behalf of send April 5, 2026 14:37 View session

Copilot AI reviewed Apr 5, 2026

View reviewed changes

Comment thread engine/crates/lex-cli/src/commands/dict_ops.rs Outdated

Comment thread mise.toml

send and others added 2 commits April 5, 2026 23:43

send requested a review from Copilot April 5, 2026 14:52

Copilot started reviewing on behalf of send April 5, 2026 14:53 View session

Copilot AI reviewed Apr 5, 2026

View reviewed changes

Comment thread engine/crates/lex-core/src/settings.rs

Comment thread engine/crates/lex-cli/src/commands/dict_ops.rs Outdated

style: rename shadowed entries variable for clarity

164b4e6

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

send merged commit 16061bd into main Apr 5, 2026
10 checks passed

send deleted the refactor/absorb-reranker-costs branch April 5, 2026 15:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: absorb 3 reranker heuristics into compile-time dictionary costs#209

refactor: absorb 3 reranker heuristics into compile-time dictionary costs#209
send merged 5 commits intomainfrom
refactor/absorb-reranker-costs

send commented Apr 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

send commented Apr 5, 2026

Summary

変更点

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants