perf(WordAligner): optimize `_fuzzy_align_extraction` function for better performance by liuming-dev · Pull Request #410 · google/langextract

liuming-dev · 2026-03-05T14:49:14Z

Improve the execution performance of the _fuzzy_align_extraction function

Dynamically shrink max_window
- Initial max_window = min(int(len_e / fuzzy_alignment_threshold) + 1, len(source_tokens)):
  When window_size > int(len_e / fuzzy_alignment_threshold) + 1, it is guaranteed that
  (extraction_counts & window_counts).total() >= min_overlap will never be satisfied.
- Once best_ratio > 0.0, the iteration range of window_size can be further narrowed:
  When window_size > int(len_e / best_ratio) + 1, best_ratio can no longer improve.
Pre-compute _normalize_token to avoid redundant normalization calls inside the for loop.

google-cla · 2026-03-05T14:49:19Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

github-actions · 2026-03-05T14:49:24Z

No linked issues found. Please link an issue in your pull request description or title.

Per our Contributing Guidelines, all PRs must:

Reference an issue with one of:
- Closing keywords: Fixes #123, Closes #123, Resolves #123 (auto-closes on merge in the same repository)
- Reference keywords: Related to #123, Refs #123, Part of #123, See #123 (links without closing)
The linked issue should have 5+ 👍 reactions from unique users (excluding bots and the PR author)
Include discussion demonstrating the importance of the change

You can also use cross-repo references like owner/repo#123 or full URLs.

…tter performance Signed-off-by: Liu Ming <hit_oak_tree@126.com>

sx4im · 2026-03-07T11:22:20Z

You've been writing CLAUDE.md from scratch every project? 😭

Stop. I built 20 ready-made templates for every stack —
Next.js, React, Python, Flutter, Go, Rust and more.

Copy. Paste. Done. ✅

🔗 github.com/sx4im/awesome-claude-md

Free & open source 🙏

github-actions · 2026-03-21T06:39:19Z

⚠️ Branch Update Required

Your branch is 1 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

github-actions · 2026-03-29T03:03:43Z

⚠️ Branch Update Required

Your branch is 6 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

github-actions · 2026-04-06T01:56:05Z

⚠️ Branch Update Required

Your branch is 9 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

github-actions · 2026-04-13T03:04:43Z

⚠️ Branch Update Required

Your branch is 11 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

aksg87 · 2026-04-14T23:01:23Z

Hi @liuming-dev, thanks for working on this. This has been addressed in #442 which replaced the fuzzy aligner with a much faster approach. Feel free to reopen if you're still seeing issues.

github-actions Bot added the size/XS Pull request with less than 50 lines changed label Mar 5, 2026

liuming-dev force-pushed the main branch 5 times, most recently from 85ca156 to 78355a1 Compare March 5, 2026 15:22

perf(WordAligner): optimize _fuzzy_align_extraction function for be…

1964849

…tter performance Signed-off-by: Liu Ming <hit_oak_tree@126.com>

liuming-dev force-pushed the main branch from 78355a1 to 1964849 Compare March 6, 2026 02:15

aksg87 closed this Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(WordAligner): optimize `_fuzzy_align_extraction` function for better performance#410

perf(WordAligner): optimize `_fuzzy_align_extraction` function for better performance#410
liuming-dev wants to merge 1 commit intogoogle:mainfrom
liuming-dev:main

liuming-dev commented Mar 5, 2026

Uh oh!

google-cla Bot commented Mar 5, 2026

Uh oh!

github-actions Bot commented Mar 5, 2026

Uh oh!

sx4im commented Mar 7, 2026

Uh oh!

github-actions Bot commented Mar 21, 2026

Uh oh!

github-actions Bot commented Mar 29, 2026

Uh oh!

github-actions Bot commented Apr 6, 2026

Uh oh!

github-actions Bot commented Apr 13, 2026

Uh oh!

aksg87 commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

liuming-dev commented Mar 5, 2026

Uh oh!

google-cla Bot commented Mar 5, 2026

Uh oh!

github-actions Bot commented Mar 5, 2026

Uh oh!

sx4im commented Mar 7, 2026

Uh oh!

github-actions Bot commented Mar 21, 2026

Uh oh!

github-actions Bot commented Mar 29, 2026

Uh oh!

github-actions Bot commented Apr 6, 2026

Uh oh!

github-actions Bot commented Apr 13, 2026

Uh oh!

aksg87 commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants