Skip to content

Preserve non-UTF8 filenames from git#2023

Merged
j178 merged 1 commit intomasterfrom
non-utf8
Apr 30, 2026
Merged

Preserve non-UTF8 filenames from git#2023
j178 merged 1 commit intomasterfrom
non-utf8

Conversation

@j178
Copy link
Copy Markdown
Owner

@j178 j178 commented Apr 30, 2026

Preserve raw Git path bytes on Unix so non-UTF8 filenames no longer abort file collection.

Glob filters now match paths directly; regex filters remain UTF-8-only.

Closes #1701
Closes #649

@j178 j178 added the bug Something isn't working label Apr 30, 2026
@j178 j178 marked this pull request as ready for review April 30, 2026 09:37
Copilot AI review requested due to automatic review settings April 30, 2026 09:37
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.21%. Comparing base (3268a83) to head (417f030).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2023      +/-   ##
==========================================
- Coverage   92.22%   92.21%   -0.01%     
==========================================
  Files         117      117              
  Lines       23790    23825      +35     
==========================================
+ Hits        21941    21971      +30     
- Misses       1849     1854       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@prek-ci-bot
Copy link
Copy Markdown

prek-ci-bot Bot commented Apr 30, 2026

📦 Cargo Bloat Comparison

Binary size change: +0.00% (25.8 MiB → 25.8 MiB)

Expand for cargo-bloat output

Head Branch Results

 File  .text     Size             Crate Name
 1.3%   2.6% 332.0KiB        aws_lc_sys aws_lc_0_39_1_aes_gcm_encrypt_avx512
 1.3%   2.6% 332.0KiB        aws_lc_sys aws_lc_0_39_1_aes_gcm_decrypt_avx512
 0.3%   0.7%  87.1KiB              prek prek::languages::<impl prek::config::Language>::run::{{closure}}::{{closure}}
 0.3%   0.7%  82.2KiB              prek prek::languages::<impl prek::config::Language>::run::{{closure}}::{{closure}}
 0.2%   0.5%  60.9KiB             prek? <prek::cli::Command as clap_builder::derive::Subcommand>::augment_subcommands
 0.2%   0.4%  56.8KiB              prek prek::languages::<impl prek::config::Language>::install::{{closure}}
 0.2%   0.4%  53.3KiB annotate_snippets annotate_snippets::renderer::render::render
 0.2%   0.4%  46.8KiB              prek prek::run::{{closure}}
 0.2%   0.3%  40.9KiB              prek prek::cli::run::run::run::{{closure}}
 0.1%   0.3%  33.3KiB             prek? <prek::cli::RunArgs as clap_builder::derive::Args>::augment_args
 0.1%   0.2%  28.0KiB        aws_lc_sys aws_lc_0_39_1_edwards25519_scalarmuldouble_alt
 0.1%   0.2%  28.0KiB             prek? <prek::config::_::<impl serde_core::de::Deserialize for prek::config::Config>::deserialize::__Visitor as serde_core::de::Visitor>::visit_map
 0.1%   0.2%  27.6KiB      serde_saphyr saphyr_parser_bw::scanner::Scanner<T>::fetch_more_tokens
 0.1%   0.2%  27.5KiB        aws_lc_sys aws_lc_0_39_1_edwards25519_scalarmuldouble
 0.1%   0.2%  27.4KiB               std core::ptr::drop_in_place<prek::languages::<impl prek::config::Language>::install::{{closure}}>
 0.1%   0.2%  26.4KiB              prek prek::cli::try_repo::try_repo::{{closure}}
 0.1%   0.2%  22.8KiB              prek prek::hooks::meta_hooks::MetaHooks::run::{{closure}}
 0.1%   0.2%  22.5KiB      serde_saphyr saphyr_parser_bw::scanner::Scanner<T>::fetch_more_tokens
 0.1%   0.2%  22.3KiB         [Unknown] Lp384_montjscalarmul_alt_p384_montjadd
 0.1%   0.2%  21.5KiB      clap_builder clap_builder::parser::parser::Parser::get_matches_with
41.2%  86.3%  10.6MiB                   And 23716 smaller methods. Use -n N to show more.
47.7% 100.0%  12.3MiB                   .text section size, the file size is 25.8MiB

Base Branch Results

 File  .text     Size             Crate Name
 1.3%   2.6% 332.0KiB        aws_lc_sys aws_lc_0_39_1_aes_gcm_encrypt_avx512
 1.3%   2.6% 332.0KiB        aws_lc_sys aws_lc_0_39_1_aes_gcm_decrypt_avx512
 0.3%   0.7%  84.0KiB              prek prek::languages::<impl prek::config::Language>::run::{{closure}}::{{closure}}
 0.3%   0.7%  82.2KiB              prek prek::languages::<impl prek::config::Language>::run::{{closure}}::{{closure}}
 0.2%   0.5%  61.1KiB             prek? <prek::cli::Command as clap_builder::derive::Subcommand>::augment_subcommands
 0.2%   0.4%  56.8KiB              prek prek::languages::<impl prek::config::Language>::install::{{closure}}
 0.2%   0.4%  53.3KiB annotate_snippets annotate_snippets::renderer::render::render
 0.2%   0.4%  46.8KiB              prek prek::run::{{closure}}
 0.2%   0.3%  41.3KiB              prek prek::cli::run::run::run::{{closure}}
 0.1%   0.3%  33.3KiB             prek? <prek::cli::RunArgs as clap_builder::derive::Args>::augment_args
 0.1%   0.2%  28.0KiB        aws_lc_sys aws_lc_0_39_1_edwards25519_scalarmuldouble_alt
 0.1%   0.2%  28.0KiB             prek? <prek::config::_::<impl serde_core::de::Deserialize for prek::config::Config>::deserialize::__Visitor as serde_core::de::Visitor>::visit_map
 0.1%   0.2%  27.6KiB      serde_saphyr saphyr_parser_bw::scanner::Scanner<T>::fetch_more_tokens
 0.1%   0.2%  27.5KiB        aws_lc_sys aws_lc_0_39_1_edwards25519_scalarmuldouble
 0.1%   0.2%  27.4KiB               std core::ptr::drop_in_place<prek::languages::<impl prek::config::Language>::install::{{closure}}>
 0.1%   0.2%  26.4KiB              prek prek::cli::try_repo::try_repo::{{closure}}
 0.1%   0.2%  22.7KiB              prek prek::hooks::meta_hooks::MetaHooks::run::{{closure}}
 0.1%   0.2%  22.5KiB      serde_saphyr saphyr_parser_bw::scanner::Scanner<T>::fetch_more_tokens
 0.1%   0.2%  22.3KiB         [Unknown] Lp384_montjscalarmul_alt_p384_montjadd
 0.1%   0.2%  21.5KiB      clap_builder clap_builder::parser::parser::Parser::get_matches_with
41.2%  86.3%  10.6MiB                   And 23741 smaller methods. Use -n N to show more.
47.7% 100.0%  12.3MiB                   .text section size, the file size is 25.8MiB

@prek-ci-bot
Copy link
Copy Markdown

prek-ci-bot Bot commented Apr 30, 2026

⚡️ Hyperfine Benchmarks

Summary: 0 regressions, 0 improvements above the 10% threshold.

Environment
  • OS: Linux 6.17.0-1010-azure
  • CPU: 4 cores
  • prek version: prek 0.3.11+5 (abc39b8 2026-04-30)
  • Rust version: rustc 1.95.0 (59807616e 2026-04-14)
  • Hyperfine version: hyperfine 1.20.0
CLI Commands

Benchmarking basic commands in the main repo:

prek --version

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base --version 2.5 ± 0.2 2.1 3.1 1.00
prek-head --version 2.5 ± 0.2 2.2 3.2 1.01 ± 0.10

prek list

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base list 10.0 ± 0.2 9.6 11.1 1.01 ± 0.04
prek-head list 9.9 ± 0.3 9.0 10.7 1.00

prek validate-config .pre-commit-config.yaml

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base validate-config .pre-commit-config.yaml 3.4 ± 0.1 3.2 3.8 1.04 ± 0.08
prek-head validate-config .pre-commit-config.yaml 3.3 ± 0.2 3.0 4.1 1.00

prek sample-config

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base sample-config 2.6 ± 0.1 2.5 2.9 1.00
prek-head sample-config 2.8 ± 0.1 2.6 3.1 1.07 ± 0.05
Cold vs Warm Runs

Comparing first run (cold) vs subsequent runs (warm cache):

prek run --all-files (cold - no cache)

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run --all-files 147.7 ± 5.8 140.6 160.1 1.00
prek-head run --all-files 149.8 ± 3.8 144.4 156.1 1.01 ± 0.05

prek run --all-files (warm - with cache)

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run --all-files 147.2 ± 4.6 138.1 155.9 1.01 ± 0.04
prek-head run --all-files 145.5 ± 3.7 140.0 153.8 1.00
Full Hook Suite

Running the builtin hook suite on the benchmark workspace:

prek run --all-files (full builtin hook suite)

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run --all-files 143.7 ± 4.1 136.3 151.9 1.00
prek-head run --all-files 152.0 ± 12.3 138.9 224.0 1.06 ± 0.09
Individual Hook Performance

Benchmarking each hook individually on the test repo:

prek run trailing-whitespace --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run trailing-whitespace --all-files 21.9 ± 1.1 20.5 25.5 1.04 ± 0.06
prek-head run trailing-whitespace --all-files 20.9 ± 0.6 20.0 22.4 1.00

prek run end-of-file-fixer --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run end-of-file-fixer --all-files 28.7 ± 2.0 24.4 32.8 1.01 ± 0.11
prek-head run end-of-file-fixer --all-files 28.3 ± 2.4 24.6 34.6 1.00

prek run check-json --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run check-json --all-files 12.7 ± 0.5 11.9 13.9 1.10 ± 0.06
prek-head run check-json --all-files 11.5 ± 0.4 10.9 12.4 1.00

prek run check-yaml --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run check-yaml --all-files 11.6 ± 0.4 10.9 12.6 1.00 ± 0.06
prek-head run check-yaml --all-files 11.6 ± 0.5 10.9 13.0 1.00

prek run check-toml --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run check-toml --all-files 11.7 ± 0.5 10.9 12.5 1.02 ± 0.06
prek-head run check-toml --all-files 11.4 ± 0.5 10.8 12.7 1.00

prek run check-xml --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run check-xml --all-files 11.8 ± 0.4 11.3 13.2 1.01 ± 0.05
prek-head run check-xml --all-files 11.7 ± 0.5 10.8 12.9 1.00

prek run detect-private-key --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run detect-private-key --all-files 18.3 ± 1.6 15.7 21.6 1.02 ± 0.11
prek-head run detect-private-key --all-files 17.9 ± 1.1 15.9 20.1 1.00

prek run fix-byte-order-marker --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run fix-byte-order-marker --all-files 23.3 ± 2.1 20.7 28.8 1.00
prek-head run fix-byte-order-marker --all-files 23.6 ± 1.6 20.8 26.7 1.01 ± 0.11
Installation Performance

Benchmarking hook installation (fast path hooks skip Python setup):

prek install-hooks (cold - no cache)

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base install-hooks 5.2 ± 0.1 5.1 5.3 1.08 ± 0.03
prek-head install-hooks 4.8 ± 0.1 4.7 4.9 1.00

prek install-hooks (warm - with cache)

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base install-hooks 4.8 ± 0.1 4.7 5.0 1.03 ± 0.04
prek-head install-hooks 4.7 ± 0.1 4.5 4.8 1.00
File Filtering/Scoping Performance

Testing different file selection modes:

prek run (staged files only)

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run 49.5 ± 1.6 47.4 53.4 1.00 ± 0.04
prek-head run 49.4 ± 1.0 47.7 51.8 1.00

prek run --files '*.json' (specific file type)

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run --files '*.json' 8.5 ± 0.5 8.1 9.6 1.01 ± 0.07
prek-head run --files '*.json' 8.4 ± 0.3 8.1 9.3 1.00
Workspace Discovery & Initialization

Benchmarking hook discovery and initialization overhead:

prek run --dry-run --all-files (measures init overhead)

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run --dry-run --all-files 12.8 ± 0.2 12.6 13.3 1.00
prek-head run --dry-run --all-files 13.2 ± 0.5 12.5 14.4 1.03 ± 0.04
Meta Hooks Performance

Benchmarking meta hooks separately:

prek run check-hooks-apply --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run check-hooks-apply --all-files 13.4 ± 0.5 12.9 14.6 1.00
prek-head run check-hooks-apply --all-files 13.5 ± 0.4 12.9 14.4 1.01 ± 0.05

prek run check-useless-excludes --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run check-useless-excludes --all-files 13.4 ± 0.4 12.7 14.5 1.13 ± 0.05
prek-head run check-useless-excludes --all-files 11.8 ± 0.4 11.5 13.1 1.00

prek run identity --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run identity --all-files 10.4 ± 0.2 10.1 10.8 1.02 ± 0.03
prek-head run identity --all-files 10.2 ± 0.2 10.0 10.5 1.00

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates prek’s file collection and filtering pipeline to tolerate non‑UTF8 filenames on Unix by preserving raw Git path bytes and switching filename matching to operate on &Path (with UTF‑8-only regex semantics).

Changes:

  • Preserve raw Git path bytes on Unix when parsing git ... -z outputs, avoiding UTF‑8 decode failures during file collection.
  • Update FilePattern / GlobPatterns matching APIs to take &Path and apply globs directly to OS paths while keeping regex matching UTF‑8-only.
  • Improve prek-identify filename tagging to avoid panicking on non‑UTF8 filenames and add targeted unit tests.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
crates/prek/src/git.rs Parse NUL-delimited Git path output into PathBuf without UTF‑8 decoding on Unix; add regression test.
crates/prek/src/config.rs Change pattern matching APIs to accept &Path; make regex matching conditional on UTF‑8 conversion.
crates/prek/src/cli/run/filter.rs Stop rejecting non‑UTF8 paths up front; add tests covering no-pattern/glob/regex behavior.
crates/prek/src/hooks/meta_hooks.rs Update meta-hook patterns and matching to use &Path-based matching.
crates/prek/src/hooks/builtin_hooks/mod.rs Update builtin hook pattern constructor call to renamed FilePattern::regex.
crates/prek/src/cli/auto_update/mod.rs Update tag glob matching call sites for new &Path-based matcher signature.
crates/prek-identify/src/lib.rs Avoid panics when filename isn’t UTF‑8; keep extension-based tagging; add unit test.

Comment thread crates/prek/src/hooks/meta_hooks.rs
@j178 j178 merged commit 50a3e25 into master Apr 30, 2026
34 checks passed
@j178 j178 deleted the non-utf8 branch April 30, 2026 09:47
@tyilo
Copy link
Copy Markdown
Contributor

tyilo commented Apr 30, 2026

Thanks, it works perfect for my use-case!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prek fails to run when there are files with non-utf8 names Use camino::Utf8PathBuf

3 participants