Skip to content

Bitset-based TagSet refactor: precompute tag masks and speed up hook type filtering#1665

Merged
j178 merged 2 commits intomasterfrom
bitset-tagset
Feb 19, 2026
Merged

Bitset-based TagSet refactor: precompute tag masks and speed up hook type filtering#1665
j178 merged 2 commits intomasterfrom
bitset-tagset

Conversation

@j178
Copy link
Owner

@j178 j178 commented Feb 19, 2026

No description provided.

Copilot AI review requested due to automatic review settings February 19, 2026 03:28
@j178 j178 added the enhancement New feature or request label Feb 19, 2026
@j178 j178 changed the title bitset tagset Bitset-based TagSet refactor: precompute tag masks and speed up hook type filtering Feb 19, 2026
@j178 j178 changed the base branch from master to gen-identify February 19, 2026 03:29
@codecov
Copy link

codecov bot commented Feb 19, 2026

Codecov Report

❌ Patch coverage is 97.52066% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.61%. Comparing base (25a02de) to head (ddd9663).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
crates/prek-identify/src/lib.rs 97.27% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1665      +/-   ##
==========================================
+ Coverage   91.44%   91.61%   +0.16%     
==========================================
  Files          96       96              
  Lines       18529    18583      +54     
==========================================
+ Hits        16944    17024      +80     
+ Misses       1585     1559      -26     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request refactors the TagSet implementation from a SmallVec-based dynamic collection to a fixed-size bitset representation for improved performance. The changes also introduce code generation from the Python identify package to maintain compatibility.

Changes:

  • Replaced SmallVec<[&'static str; 8]> with a fixed-size bitset array [u64; TAG_WORDS] in TagSet
  • Generated tag mappings and constants from Python's identify package via gen.py
  • Changed hook type fields from Vec<String> to TagSet for direct bitset operations
  • Removed smallvec dependency and added iter-without-into-iter clippy allow

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
crates/prek-identify/src/lib.rs Rewrote TagSet as a bitset with const constructor, iterator, and set operations (intersects, is_subset_of)
crates/prek-identify/src/tags.rs Generated file containing tag constants, tag-to-ID mappings, and precomputed TagSets for interpreters/extensions/names
crates/prek-identify/gen.py Added tag ID mapping logic and TagSet generation for all lookup tables
crates/prek/src/hook.rs Changed types/types_or/exclude_types from Vec to TagSet with TAG_FILE default
crates/prek/src/cli/run/filter.rs Updated filter logic to use TagSet bitset operations instead of iterator-based checks
crates/prek-identify/Cargo.toml Removed smallvec dependency
Cargo.toml Removed smallvec from workspace dependencies, allowed iter-without-into-iter lint
Cargo.lock Updated lockfile to reflect dependency removal

@prek-ci-bot
Copy link

prek-ci-bot bot commented Feb 19, 2026

📦 Cargo Bloat Comparison

Binary size change: +0.00% (23.8 MiB → 23.8 MiB)

Expand for cargo-bloat output

Head Branch Results

 File  .text    Size             Crate Name
 0.3%   0.8% 79.5KiB             prek? <prek::cli::Command as clap_builder::derive::Subcommand>::augment_subcommands
 0.3%   0.7% 70.9KiB              prek prek::languages::<impl prek::config::Language>::run::{{closure}}::{{closure}}
 0.3%   0.6% 65.6KiB              prek prek::languages::<impl prek::config::Language>::run::{{closure}}::{{closure}}
 0.2%   0.5% 51.2KiB annotate_snippets annotate_snippets::renderer::render::render
 0.2%   0.5% 50.5KiB              prek prek::languages::<impl prek::config::Language>::install::{{closure}}
 0.2%   0.4% 41.7KiB              prek prek::cli::run::run::run::{{closure}}
 0.2%   0.4% 39.1KiB              prek prek::run::{{closure}}
 0.1%   0.3% 31.3KiB             prek? <prek::cli::RunArgs as clap_builder::derive::Args>::augment_args
 0.1%   0.3% 28.5KiB      serde_saphyr saphyr_parser_bw::scanner::Scanner<T>::fetch_more_tokens
 0.1%   0.2% 25.0KiB             prek? <prek::config::_::<impl serde_core::de::Deserialize for prek::config::Config>::deserialize::__Visitor as serde_core::de::Visitor>::visit_map
 0.1%   0.2% 22.8KiB      serde_saphyr saphyr_parser_bw::scanner::Scanner<T>::fetch_more_tokens
 0.1%   0.2% 22.5KiB              prek prek::hooks::meta_hooks::MetaHooks::run::{{closure}}
 0.1%   0.2% 21.1KiB      clap_builder clap_builder::parser::parser::Parser::get_matches_with
 0.1%   0.2% 20.4KiB              prek prek::hooks::meta_hooks::MetaHooks::run::{{closure}}
 0.1%   0.2% 20.0KiB   cargo_metadata? <cargo_metadata::_::<impl serde_core::de::Deserialize for cargo_metadata::Package>::deserialize::__Visitor as serde_core::de::Visitor>::visit_map
 0.1%   0.2% 19.5KiB              prek prek::archive::unzip::{{closure}}
 0.1%   0.2% 19.3KiB               std core::ptr::drop_in_place<prek::languages::<impl prek::config::Language>::install::{{closure}}>
 0.1%   0.2% 19.2KiB              prek <prek::languages::ruby::ruby::Ruby as prek::languages::LanguageImpl>::install::{{closure}}
 0.1%   0.2% 18.7KiB     serde_saphyr? <serde_saphyr::de::YamlDeserializer as serde_core::de::Deserializer>::deserialize_map
 0.1%   0.2% 18.6KiB     serde_saphyr? <serde_saphyr::de::YamlDeserializer as serde_core::de::Deserializer>::deserialize_map
38.3%  91.7%  9.1MiB                   And 20977 smaller methods. Use -n N to show more.
41.7% 100.0%  9.9MiB                   .text section size, the file size is 23.8MiB

Base Branch Results

 File  .text    Size             Crate Name
 0.3%   0.7% 71.3KiB              prek prek::languages::<impl prek::config::Language>::run::{{closure}}::{{closure}}
 0.3%   0.7% 68.4KiB             prek? <prek::cli::Command as clap_builder::derive::Subcommand>::augment_subcommands
 0.3%   0.6% 65.6KiB              prek prek::languages::<impl prek::config::Language>::run::{{closure}}::{{closure}}
 0.2%   0.5% 51.2KiB annotate_snippets annotate_snippets::renderer::render::render
 0.2%   0.5% 50.5KiB              prek prek::languages::<impl prek::config::Language>::install::{{closure}}
 0.2%   0.4% 41.6KiB              prek prek::cli::run::run::run::{{closure}}
 0.2%   0.4% 39.8KiB              prek prek::run::{{closure}}
 0.1%   0.3% 31.8KiB             prek? <prek::cli::RunArgs as clap_builder::derive::Args>::augment_args
 0.1%   0.3% 28.5KiB      serde_saphyr saphyr_parser_bw::scanner::Scanner<T>::fetch_more_tokens
 0.1%   0.2% 25.0KiB             prek? <prek::config::_::<impl serde_core::de::Deserialize for prek::config::Config>::deserialize::__Visitor as serde_core::de::Visitor>::visit_map
 0.1%   0.2% 22.8KiB      serde_saphyr saphyr_parser_bw::scanner::Scanner<T>::fetch_more_tokens
 0.1%   0.2% 22.7KiB              prek prek::hooks::meta_hooks::MetaHooks::run::{{closure}}
 0.1%   0.2% 21.1KiB      clap_builder clap_builder::parser::parser::Parser::get_matches_with
 0.1%   0.2% 20.6KiB              prek prek::hooks::meta_hooks::MetaHooks::run::{{closure}}
 0.1%   0.2% 20.0KiB   cargo_metadata? <cargo_metadata::_::<impl serde_core::de::Deserialize for cargo_metadata::Package>::deserialize::__Visitor as serde_core::de::Visitor>::visit_map
 0.1%   0.2% 19.5KiB              prek prek::archive::unzip::{{closure}}
 0.1%   0.2% 19.3KiB               std core::ptr::drop_in_place<prek::languages::<impl prek::config::Language>::install::{{closure}}>
 0.1%   0.2% 19.2KiB              prek <prek::languages::ruby::ruby::Ruby as prek::languages::LanguageImpl>::install::{{closure}}
 0.1%   0.2% 19.0KiB              prek prek::hook::HookBuilder::build::{{closure}}
 0.1%   0.2% 19.0KiB              prek prek::hook::HookBuilder::build::{{closure}}
38.4%  91.8%  9.1MiB                   And 20993 smaller methods. Use -n N to show more.
41.8% 100.0%  9.9MiB                   .text section size, the file size is 23.8MiB

@j178 j178 added the performance Performance improvements label Feb 19, 2026
@j178 j178 force-pushed the bitset-tagset branch 2 times, most recently from 6d3fc68 to 3a56823 Compare February 19, 2026 04:48
Base automatically changed from gen-identify to master February 19, 2026 04:56
@j178 j178 merged commit 23941a7 into master Feb 19, 2026
49 checks passed
@j178 j178 deleted the bitset-tagset branch February 19, 2026 05:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request performance Performance improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants