perf(WordAligner): optimize _fuzzy_align_extraction function for better performance#410
perf(WordAligner): optimize _fuzzy_align_extraction function for better performance#410liuming-dev wants to merge 1 commit intogoogle:mainfrom
_fuzzy_align_extraction function for better performance#410Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
|
No linked issues found. Please link an issue in your pull request description or title. Per our Contributing Guidelines, all PRs must:
You can also use cross-repo references like |
85ca156 to
78355a1
Compare
…tter performance Signed-off-by: Liu Ming <hit_oak_tree@126.com>
|
You've been writing CLAUDE.md from scratch every project? 😭 Stop. I built 20 ready-made templates for every stack — Copy. Paste. Done. ✅ 🔗 github.com/sx4im/awesome-claude-md Free & open source 🙏 |
|
Your branch is 1 commits behind git fetch origin main
git merge origin/main
git pushNote: Enable "Allow edits by maintainers" to allow automatic updates. |
|
Your branch is 6 commits behind git fetch origin main
git merge origin/main
git pushNote: Enable "Allow edits by maintainers" to allow automatic updates. |
|
Your branch is 9 commits behind git fetch origin main
git merge origin/main
git pushNote: Enable "Allow edits by maintainers" to allow automatic updates. |
|
Your branch is 11 commits behind git fetch origin main
git merge origin/main
git pushNote: Enable "Allow edits by maintainers" to allow automatic updates. |
|
Hi @liuming-dev, thanks for working on this. This has been addressed in #442 which replaced the fuzzy aligner with a much faster approach. Feel free to reopen if you're still seeing issues. |
Improve the execution performance of the
_fuzzy_align_extractionfunctionDynamically shrink
max_windowmax_window = min(int(len_e / fuzzy_alignment_threshold) + 1, len(source_tokens)):When
window_size > int(len_e / fuzzy_alignment_threshold) + 1, it is guaranteed that(extraction_counts & window_counts).total() >= min_overlapwill never be satisfied.best_ratio > 0.0, the iteration range ofwindow_sizecan be further narrowed:When
window_size > int(len_e / best_ratio) + 1,best_ratiocan no longer improve.Pre-compute
_normalize_tokento avoid redundant normalization calls inside the for loop.