Skip to content

refactor: absorb 3 reranker heuristics into compile-time dictionary costs#209

Merged
send merged 5 commits intomainfrom
refactor/absorb-reranker-costs
Apr 5, 2026
Merged

refactor: absorb 3 reranker heuristics into compile-time dictionary costs#209
send merged 5 commits intomainfrom
refactor/absorb-reranker-costs

Conversation

@send
Copy link
Copy Markdown
Owner

@send send commented Apr 5, 2026

Summary

reranker の5つのヒューリスティックのうち3つを辞書コンパイル時のコスト調整に移行し、ランタイムコードを削減。

  • person_name_penalty (+2000): 人名エントリのコストを辞書コンパイル時に加算
  • pronoun_bonus (-3500): 代名詞エントリのコストを辞書コンパイル時に減算
  • non_independent_kanji_penalty (+1500): 非自立語で漢字表層のエントリのコストを加算

dictool compile--id-def オプションを追加し、id.def のロール情報を使ってコスト調整。216,994 エントリが対象。

残りの2つ(te_form_kanji_penalty, single_char_kanji_penalty)は文脈依存のため reranker に残留。

変更点

  • dict_ops.rs: compile 時コスト調整(--id-def オプション)
  • reranker.rs: 3関数削除(pronoun_bonus, person_name_penalty, non_independent_kanji_penalty
  • settings.rs / default_settings.toml: 対応する3パラメータ削除
  • explain.rs: コスト内訳表示から削除済みペナルティを除去
  • mise.toml: dictool compile--id-def を追加

Test plan

  • cargo fmt/clippy/test 全 pass(323 + 68 + 20 tests)
  • mise run accuracy — 61/61 pass (4 skip) ← ベースラインと同一
  • mise run accuracy-history — 6/6 pass ← ベースラインと同一

🤖 Generated with Claude Code

send and others added 2 commits April 5, 2026 23:27
…y costs

Move two reranker heuristics into compile-time dictionary cost adjustments:
- person_name (role 6): +2000 cost offset at compile time
- pronoun (role 5): -3500 cost offset at compile time

This eliminates two post-hoc reranker passes, making the Viterbi search
see more accurate costs during beam search. 216,105 entries adjusted.

Changes:
- dict compile: add --id-def option for role-based cost adjustment
- reranker: remove pronoun_bonus() and person_name_penalty()
- settings: remove pronoun_cost_bonus and person_name_penalty params
- explain: remove pronoun_bonus from cost breakdown display

Accuracy: 61/61 pass (4 skip), history 6/6 — identical to baseline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move non-independent kanji penalty to compile-time: entries with
role=NON_INDEPENDENT and kanji surface get +1500 cost offset.
889 additional entries adjusted (216,994 total with person_name/pronoun).

Removes non_independent_kanji_penalty from reranker, settings, and
explain output. te_form_kanji_penalty remains (context-dependent,
requires role expansion to absorb).

Accuracy: 61/61 pass (4 skip), history 6/6 — identical to baseline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 5, 2026 14:36
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR moves 3 reranker heuristics (person-name penalty, pronoun bonus, non-independent-kanji penalty) into dictionary compile-time cost offsets using Mozc id.def role information, reducing runtime reranker complexity while keeping the remaining context-dependent heuristics in the reranker.

Changes:

  • Add dictool compile --id-def support and apply compile-time cost offsets based on morpheme roles.
  • Remove the corresponding 3 settings knobs and runtime reranker/explain accounting for those heuristics.
  • Update the mise dictionary build task to pass --id-def.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
mise.toml Update dictionary build task to pass --id-def to dictool compile.
engine/crates/lex-core/src/settings.rs Remove 3 reranker settings parameters and validation/tests for them.
engine/crates/lex-core/src/default_settings.toml Remove the 3 deleted reranker parameters from defaults.
engine/crates/lex-core/src/converter/reranker.rs Remove 3 heuristics from runtime reranking and delete associated tests.
engine/crates/lex-core/src/converter/explain.rs Remove display/breakdown fields for the deleted heuristics.
engine/crates/lex-cli/src/commands/dict_ops.rs Add compile-time cost adjustment logic driven by id.def roles.
engine/crates/lex-cli/src/bin/dictool.rs Add --id-def option to the compile subcommand and plumb through.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread engine/crates/lex-cli/src/commands/dict_ops.rs Outdated
Comment thread mise.toml
send and others added 2 commits April 5, 2026 23:43
Address Copilot review:
- Auto-detect id.def in input_dir when --id-def is not specified
- Validate left_id against roles table size instead of silently
  defaulting to role 0
- Add id.def to mise.toml dict-mozc sources for rebuild tracking

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Update stale reranker comment (removed heuristics still mentioned)
- Use PathBuf instead of String for id_def resolution
- Remove dead code: is_non_independent(), is_pronoun(), is_person_name()
  on ConnectionMatrix (no callers after compile-time cost absorption)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread engine/crates/lex-core/src/settings.rs
Comment thread engine/crates/lex-cli/src/commands/dict_ops.rs Outdated
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@send send merged commit 16061bd into main Apr 5, 2026
10 checks passed
@send send deleted the refactor/absorb-reranker-costs branch April 5, 2026 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants