Skip to content

Conversation

colesbury
Copy link
Contributor

@colesbury colesbury commented Sep 17, 2025

Don't cache the joined path in _raw_path because the caching isn't thread safe.

Don't cache the joined path in `_raw_path` because the caching isn't thread safe.
@barneygale
Copy link
Contributor

It's a shame to lose the caching. Presumably something like this wouldn't work?

    @property
    def _raw_path(self):
        paths = self._raw_paths
        if len(paths) == 1:
            return paths[0]
        elif paths:
            # Join path segments from the initializer.
            path = self.parser.join(*paths)
            # Cache the joined path.
            paths[:] = [path]
            return path
        else:
            paths[:] = ['']
            return ''

@colesbury
Copy link
Contributor Author

That doesn't seem to work, at least not in the free threaded build. I've added a test case.

I think we could probably make the caching thread safe with some extra work, but I'm not sure it's worth it. It looks like _raw_path is only used used to compute drive, root, and _tail, which are themselves cached, so I'm not sure there's any benefit for caching _raw_path.

@Zheaoli
Copy link
Contributor

Zheaoli commented Sep 30, 2025

Hi @colesbury @barneygale, I think maybe we need to think about more about this PR

I made a benchmark

import pyperf
from pathlib import Path # or from pathlib_new which is a patched version 

PATH = "/" + "a" * 511
p = Path(PATH)

def bench():
    for i in range(100000):
        a = p.root
        del a

runner = pyperf.Runner()
runner.bench_func('bench', bench)

Run in 3.13.0 without free-thread

+-----------+----------+-----------------------+
| Benchmark | demo_old | demo_new              |
+===========+==========+=======================+
| bench     | 1.97 ms  | 5.60 ms: 2.84x slower |
+-----------+----------+-----------------------+

I think maybe we can use flag as a feature gate?

@Zheaoli
Copy link
Contributor

Zheaoli commented Sep 30, 2025

By bad, previous bench have some mistake

import pyperf
from pathlib import Path # or from pathlib_new which is a patched version 


PATH = [chr(ord('a') + (i % 26)) for i in range(20)]
p = Path(*PATH)

def bench():
    a = p.root
    del a

runner = pyperf.Runner()
runner.bench_func('bench', bench)

Here's results

+-----------+------------+-----------------------+
| Benchmark | demo_old14 | demo_new14            |
+===========+============+=======================+
| bench     | 30.1 ns    | 29.4 ns: 1.02x faster |
+-----------+------------+-----------------------+

@Zheaoli
Copy link
Contributor

Zheaoli commented Sep 30, 2025

Updated:

import pyperf
from pathlib_new import Path



PATH = ["abc123" for _ in range(200)]
def bench():
    p = Path(*PATH)
    a = p.root
    del a

runner = pyperf.Runner()
runner.bench_func('bench', bench)

Base on the code, I find that if the path level < 200 , this patch is fater than old way.

@colesbury
Copy link
Contributor Author

@barneygale, are you okay with this change?

@barneygale
Copy link
Contributor

barneygale commented Oct 10, 2025

As an illustration of the current caching behaviour, consider:

p = Path('/usr', 'local', 'foo', 'bar')
q0 = p / 'spam.txt'
q1 = p / 'eggs.txt'
str(p)
str(q0)
str(q1)

The current implementation would make these calls:

os.path.join('/usr', 'local', 'foo', 'bar')
os.path.join('/usr/local/foo/bar', 'spam.txt')
os.path.join('/usr/local/foo/bar', 'eggs.txt')

With this PR:

os.path.join('/usr', 'local', 'foo', 'bar')
os.path.join('/usr', 'local', 'foo', 'bar', 'spam.txt')
os.path.join('/usr', 'local', 'foo', 'bar', 'eggs.txt')

The extra arguments have a cost IIRC. And maybe that's fine, I just wanted to explain the current behaviour better.

@colesbury
Copy link
Contributor Author

@barneygale, I've added a print statement to join in posixpath.py and I see the same thing on both main and the PR:

os.path.join('/usr', 'local', 'foo', 'bar')
os.path.join('/usr', 'local', 'foo', 'bar', 'spam.txt')
os.path.join('/usr', 'local', 'foo', 'bar', 'eggs.txt')

@barneygale
Copy link
Contributor

Oh! If you move str(p) before the creation of q0 and q1 does that change?

@colesbury
Copy link
Contributor Author

Yes, then you get the behavior you described

Copy link
Contributor

@barneygale barneygale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can bring the optimization back in future that would be great, but for now it's better to have pathlib working in free-threaded python. Thanks for fixing this.

@colesbury colesbury merged commit d9b4eef into python:main Oct 10, 2025
49 checks passed
@miss-islington-app
Copy link

Thanks @colesbury for the PR 🌮🎉.. I'm working now to backport this PR to: 3.14.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Oct 10, 2025
…139066)

Don't cache the joined path in `_raw_path` because the caching isn't thread safe.
(cherry picked from commit d9b4eef71e7904fbe3a3786a908e493be7debbff)

Co-authored-by: Sam Gross <colesbury@gmail.com>
@bedevere-app
Copy link

bedevere-app bot commented Oct 10, 2025

GH-139926 is a backport of this pull request to the 3.14 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label Oct 10, 2025
@colesbury
Copy link
Contributor Author

@barneygale thanks for the review

@colesbury colesbury deleted the gh-139001-pathlib-dont-cache branch October 10, 2025 21:20
colesbury added a commit that referenced this pull request Oct 10, 2025
… (gh-139926)

Don't cache the joined path in `_raw_path` because the caching isn't thread safe.
(cherry picked from commit d9b4eef)

Co-authored-by: Sam Gross <colesbury@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants