Skip to content

feat(cli): add lextool history-audit subcommand#256

Merged
send merged 3 commits into
mainfrom
feat/history-audit
Jul 2, 2026
Merged

feat(cli): add lextool history-audit subcommand#256
send merged 3 commits into
mainfrom
feat/history-audit

Conversation

@send

@send send commented Jul 2, 2026

Copy link
Copy Markdown
Owner

Summary

変換品質改善のデータ駆動化・第一弾。ユーザーの学習履歴 (checkpoint + WAL) をマイニングして、「素のエンジン top-1」と「ユーザーが実際に最も多く確定してきた surface」を突合する lextool history-audit を追加する。

  • lex-core: UserHistory に public な unigrams() イテレータを追加(オフラインツール用の読み取り口)
  • lextool: 新サブコマンド history-audit
    • 各ミスに raw N-best 内の rank を付与(ランキング問題 vs 候補欠落の切り分け)
    • 履歴ブースト適用後 top-1 で直っているかを判定(history: fixed / NOT fixed
    • 同一読みで複数 surface を常用している flip-flop を別リストで報告
    • text / --json 出力
  • mise: mise run history-audit(実機の履歴パスがデフォルト)

読み取り専用で、履歴ファイルへの書き込みは行わない。

実データでの実行結果(集計のみ)

Readings audited: 3971 (dominant freq >= 2)
Raw top-1 agreement: 82.3% (3268/3971)
Misses: 703 (history fixes 623, leaves 80)
Flip-flops: 162

履歴ブーストでも直っていない 80 件が今後の改善候補の一次ソースになる(内訳の傾向: 補助表現のかな好みに対する過剰漢字化、カタカナ語ペナルティ過剰、分節崩れ)。個別内容は入力履歴由来のため PR には貼らない。

Test plan

  • cargo fmt --all --check / cargo clippy --workspace --all-features -- -D warnings pass
  • cargo test --workspace --all-features pass(unigrams() イテレータのテスト追加含む)
  • 実機の user_history.lxud (1.1MB + WAL 47KB) に対して mise run history-audit を実行し、text / JSON 両出力を確認

🤖 Generated with Claude Code

Mine the user's learning history (checkpoint + WAL) for conversion
misses: for each learned reading, compare the raw (no-history) top-1
against the surface the user actually committed most often.

Each miss reports the dominant surface's rank in the raw N-best
(distinguishing ranking problems from missing candidates) and whether
the history boost currently fixes it. Readings regularly committed
with multiple surfaces are listed separately as flip-flops.

- lex-core: expose a public unigrams() iterator on UserHistory
- lextool: new history-audit subcommand (text/JSON output)
- mise: history-audit task wired to the live history path

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings July 2, 2026 05:41

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an offline auditing workflow to mine a user’s conversion history (checkpoint + WAL) and compare the user’s dominant committed surface per reading against the raw engine top-1, helping identify ranking vs missing-candidate problems and “flip-flop” readings.

Changes:

  • Exposes a public UserHistory::unigrams() iterator for offline tooling to read unigram history records.
  • Adds lextool history-audit subcommand with text/--json output, miss classification (rank/absent), history-fixed detection, and flip-flop reporting.
  • Adds a mise run history-audit task that runs the tool against the default on-device history path.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
mise.toml Adds tasks.history-audit to run the new audit command against the default user history path.
engine/crates/lex-core/src/user_history/tests.rs Adds a unit test validating unigrams() output and frequencies.
engine/crates/lex-core/src/user_history/mod.rs Adds public unigrams() iterator to expose unigram records for offline consumers.
engine/crates/lex-cli/src/bin/lextool.rs Implements the history-audit subcommand, report types, and text/JSON output.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@send

send commented Jul 2, 2026

Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d6a4ebf9b5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread engine/crates/lex-cli/src/bin/lextool.rs Outdated
Comment thread engine/crates/lex-cli/src/bin/lextool.rs Outdated
Comment thread engine/crates/lex-cli/src/bin/lextool.rs Outdated
- Limit rank search to the requested N-best depth: post-Viterbi
  rewriters can append candidates beyond n, which misreported
  beyond-depth hits as ranked (e.g. rank 11 with -n 10)
- Add surface-text tie-breaker so the dominant pick is deterministic
  when frequency and last_used collide
- Fail fast when neither the history checkpoint nor its WAL exists,
  instead of auditing an implicitly-created empty history

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@send

send commented Jul 2, 2026

Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. What shall we delve into next?

Reviewed commit: 233d76a164

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants