Skip to content

perf(WordAligner): optimize _fuzzy_align_extraction function for better performance#410

Closed
liuming-dev wants to merge 1 commit intogoogle:mainfrom
liuming-dev:main
Closed

perf(WordAligner): optimize _fuzzy_align_extraction function for better performance#410
liuming-dev wants to merge 1 commit intogoogle:mainfrom
liuming-dev:main

Conversation

@liuming-dev
Copy link
Copy Markdown

Improve the execution performance of the _fuzzy_align_extraction function

  1. Dynamically shrink max_window

    • Initial max_window = min(int(len_e / fuzzy_alignment_threshold) + 1, len(source_tokens)):
      When window_size > int(len_e / fuzzy_alignment_threshold) + 1, it is guaranteed that
      (extraction_counts & window_counts).total() >= min_overlap will never be satisfied.
    • Once best_ratio > 0.0, the iteration range of window_size can be further narrowed:
      When window_size > int(len_e / best_ratio) + 1, best_ratio can no longer improve.
  2. Pre-compute _normalize_token to avoid redundant normalization calls inside the for loop.

@google-cla
Copy link
Copy Markdown

google-cla Bot commented Mar 5, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 5, 2026

No linked issues found. Please link an issue in your pull request description or title.

Per our Contributing Guidelines, all PRs must:

  • Reference an issue with one of:
    • Closing keywords: Fixes #123, Closes #123, Resolves #123 (auto-closes on merge in the same repository)
    • Reference keywords: Related to #123, Refs #123, Part of #123, See #123 (links without closing)
  • The linked issue should have 5+ 👍 reactions from unique users (excluding bots and the PR author)
  • Include discussion demonstrating the importance of the change

You can also use cross-repo references like owner/repo#123 or full URLs.

@github-actions github-actions Bot added the size/XS Pull request with less than 50 lines changed label Mar 5, 2026
@liuming-dev liuming-dev force-pushed the main branch 5 times, most recently from 85ca156 to 78355a1 Compare March 5, 2026 15:22
…tter performance

Signed-off-by: Liu Ming <hit_oak_tree@126.com>
@sx4im
Copy link
Copy Markdown

sx4im commented Mar 7, 2026

You've been writing CLAUDE.md from scratch every project? 😭

Stop. I built 20 ready-made templates for every stack —
Next.js, React, Python, Flutter, Go, Rust and more.

Copy. Paste. Done. ✅

🔗 github.com/sx4im/awesome-claude-md

Free & open source 🙏

@github-actions
Copy link
Copy Markdown

⚠️ Branch Update Required

Your branch is 1 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

@github-actions
Copy link
Copy Markdown

⚠️ Branch Update Required

Your branch is 6 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 6, 2026

⚠️ Branch Update Required

Your branch is 9 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

@github-actions
Copy link
Copy Markdown

⚠️ Branch Update Required

Your branch is 11 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

@aksg87
Copy link
Copy Markdown
Collaborator

aksg87 commented Apr 14, 2026

Hi @liuming-dev, thanks for working on this. This has been addressed in #442 which replaced the fuzzy aligner with a much faster approach. Feel free to reopen if you're still seeing issues.

@aksg87 aksg87 closed this Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XS Pull request with less than 50 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants