-
-
Notifications
You must be signed in to change notification settings - Fork 33.1k
gh-138407: avoid useless join operation when calculate the hash vaule for pathlib.Path #138645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… vaule for pathlib.Path Signed-off-by: Manjusaka <me@manjusaka.me>
Signed-off-by: Manjusaka <me@manjusaka.me>
Please check.
Benchmark this with paths of various lengths (very short and very long) and with many arguments to |
Make sense, I will update benchmark later(I will fix failed CI first) |
FWIW the tests are failing because you need to case-normalize the path on Windows. This is difficult to optimize because it depends on the whether the user calls
|
To add another wrinkle: sometimes paths are instantiated with their normalized string representation already set, e.g. this is true for the results of |
I have made a full benchmark test The following is the result 🧪 Available Tests
=� Benchmark DetailsThe benchmark tests hash performance on single-level paths (
=� Performance ResultsLocal Benchmark Results
Summary: The optimized implementation shows a 1.44x performance improvement (44% faster) for short single-level path hashing operations. Long Path Benchmark Results
Summary: The optimized implementation shows a 1.34x performance improvement (34% faster) for long single-level path hashing operations. Deep Path Benchmark Results
Summary: The optimized implementation shows a 1.12x performance improvement (12% faster) for 512-level deep path hashing operations. Deep Long Path Benchmark Results
Summary: The optimized implementation shows a 1.06x performance regression (6% slower) for 512-level deep paths with 10-character component names. Operation Pattern Benchmark ResultsTesting different operation patterns on short single-level paths ( 1. hash(p) only - 1.44x faster ✅
2. str(p) only - Equal performance ✅
3. str(p); hash(p) - 1.12x faster ❌ (Expected: Slower)
4. hash(p); str(p) - 1.06x faster ❌ (Expected: Slower)
|
Here's full benchmark repo https://github.com/Zheaoli/pathlib-benchmark |
I'm struggling to understand why the new code is faster for |
I'm not sure if it's faster, but this is the simplest way I can think of to implement def __hash__(self):
try:
return self._hash
except AttributeError:
if self.parser is posixpath:
self._hash = hash((self.root, tuple(self._tail)))
else:
self._hash = hash((self.drive.lower(),
self.root.lower(),
tuple([part.lower() for part in self._tail])))
return self._hash Would you mind running that through your benchmarks please? |
Sorry for reply late. I'm working on PyCon China 2025 last week. I will update this PR ASAP I can |
Co-authored-by: Barney Gale <barney.gale@gmail.com> Signed-off-by: Manjusaka <me@manjusaka.me>
@barneygale Here's new benchmark 📊 Benchmark DetailsThe benchmark tests hash performance on single-level paths (
📈 Performance ResultsShort Single-Level Path Results
Summary: The optimized implementation shows a 1.55x performance improvement (55% faster) for short single-level path hashing operations. Long Single-Level Path Results
Summary: The optimized implementation shows a 1.46x performance improvement (46% faster) for long single-level path hashing operations. Deep Path Results
Summary: The optimized implementation shows a 1.13x performance improvement (13% faster) for 512-level deep path hashing operations. Deep Long Path Results
Summary: The optimized implementation shows a 1.02x performance regression (2% slower) for 512-level deep paths with 10-character component names. Operation Pattern ResultsTesting different operation patterns on short single-level paths ( 1. hash(p) only - 1.55x faster ✅
2. str(p) only - Equal performance ✅
3. str(p); hash(p) - 1.18x faster ✅
4. hash(p); str(p) - 1.15x faster ✅
|
@barneygale PTAL when you have time |
Signed-off-by: Manjusaka me@manjusaka.me<!--
Try to fix #138407
The benchmark:
Before this patch
After