-
-
Notifications
You must be signed in to change notification settings - Fork 33.1k
gh-139001: Fix thread-safety issue in pathlib.Path
#139066
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-139001: Fix thread-safety issue in pathlib.Path
#139066
Conversation
Don't cache the joined path in `_raw_path` because the caching isn't thread safe.
It's a shame to lose the caching. Presumably something like this wouldn't work? @property
def _raw_path(self):
paths = self._raw_paths
if len(paths) == 1:
return paths[0]
elif paths:
# Join path segments from the initializer.
path = self.parser.join(*paths)
# Cache the joined path.
paths[:] = [path]
return path
else:
paths[:] = ['']
return '' |
That doesn't seem to work, at least not in the free threaded build. I've added a test case. I think we could probably make the caching thread safe with some extra work, but I'm not sure it's worth it. It looks like |
import pyperf
from pathlib import Path # or from pathlib_new which is a patched version
PATH = "/" + "a" * 511
p = Path(PATH)
def bench():
for i in range(100000):
a = p.root
del a
runner = pyperf.Runner()
runner.bench_func('bench', bench)
|
By bad, previous bench have some mistake import pyperf
from pathlib import Path # or from pathlib_new which is a patched version
PATH = [chr(ord('a') + (i % 26)) for i in range(20)]
p = Path(*PATH)
def bench():
a = p.root
del a
runner = pyperf.Runner()
runner.bench_func('bench', bench) Here's results
|
Updated: import pyperf
from pathlib_new import Path
PATH = ["abc123" for _ in range(200)]
def bench():
p = Path(*PATH)
a = p.root
del a
runner = pyperf.Runner()
runner.bench_func('bench', bench) Base on the code, I find that if the path level < 200 , this patch is fater than old way. |
@barneygale, are you okay with this change? |
As an illustration of the current caching behaviour, consider: p = Path('/usr', 'local', 'foo', 'bar')
q0 = p / 'spam.txt'
q1 = p / 'eggs.txt'
str(p)
str(q0)
str(q1) The current implementation would make these calls: os.path.join('/usr', 'local', 'foo', 'bar')
os.path.join('/usr/local/foo/bar', 'spam.txt')
os.path.join('/usr/local/foo/bar', 'eggs.txt') With this PR: os.path.join('/usr', 'local', 'foo', 'bar')
os.path.join('/usr', 'local', 'foo', 'bar', 'spam.txt')
os.path.join('/usr', 'local', 'foo', 'bar', 'eggs.txt') The extra arguments have a cost IIRC. And maybe that's fine, I just wanted to explain the current behaviour better. |
@barneygale, I've added a print statement to os.path.join('/usr', 'local', 'foo', 'bar')
os.path.join('/usr', 'local', 'foo', 'bar', 'spam.txt')
os.path.join('/usr', 'local', 'foo', 'bar', 'eggs.txt') |
Oh! If you move |
Yes, then you get the behavior you described |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can bring the optimization back in future that would be great, but for now it's better to have pathlib working in free-threaded python. Thanks for fixing this.
Thanks @colesbury for the PR 🌮🎉.. I'm working now to backport this PR to: 3.14. |
…139066) Don't cache the joined path in `_raw_path` because the caching isn't thread safe. (cherry picked from commit d9b4eef71e7904fbe3a3786a908e493be7debbff) Co-authored-by: Sam Gross <colesbury@gmail.com>
GH-139926 is a backport of this pull request to the 3.14 branch. |
@barneygale thanks for the review |
Don't cache the joined path in
_raw_path
because the caching isn't thread safe.pathlib
on3.14t
#139001