Skip to content

Optimize Floki.text/2 by extracting text in a single pass#684

Merged
philss merged 1 commit intophilss:mainfrom
preciz:optimize-text-extraction
Apr 16, 2026
Merged

Optimize Floki.text/2 by extracting text in a single pass#684
philss merged 1 commit intophilss:mainfrom
preciz:optimize-text-extraction

Conversation

@preciz
Copy link
Copy Markdown
Contributor

@preciz preciz commented Apr 16, 2026

This change improves performance by removing redundant tree
filtering passes that were building full HTMLTree structures.

Benchmarks show:

  • ~10X speedup for small documents
  • ~400x speedup for large documents
  • ~600x less memory usage for large documents

The filtering of script and style tags is now performed during the initial
traversal in DeepText and FlatText strategies.

This change improves performance by removing redundant tree
filtering passes that were building full HTMLTree structures.

Benchmarks show:
- ~474x speedup for large documents
- ~644x less memory usage

The filtering of script and style tags is now performed during the initial
traversal in DeepText and FlatText strategies.
@philss philss merged commit a06531c into philss:main Apr 16, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants