Skip to content

pathlib's deferred joining slows real workloads #113888

@barneygale

Description

@barneygale

In #104996 I made pathlib defer joining of arguments given to path initialisers (like PurePath('a', 'b') and via joinpath(), __truediv__() and __rtruediv__().

This "optimisation" often results in more path joining. Consider:

test_path = pathlib.Path('/home', 'barney', 'projects', 'cpython', 'Lib', 'test')
print(test_path / 'test_abc.py')
print(test_path / 'test_pathlib')
print(test_path / 'test_zipfile')

(the print() could be any operation on the path object other than a further join)

Under the hood this results in the following calls:

os.path.join('/home', 'barney', 'projects', 'cpython', 'Lib', 'test', 'test_abc.py')  # cost=7
os.path.join('/home', 'barney', 'projects', 'cpython', 'Lib', 'test', 'test_pathlib')  # cost=7
os.path.join('/home', 'barney', 'projects', 'cpython', 'Lib', 'test', 'test_zipfile')  # cost=7
# total cost: 21

If we'd naively joined the paths, we'd instead have:

os.path.join('/home', 'barney', 'projects', 'cpython', 'Lib', 'test')  # cost=6
os.path.join('/home/barney/projects/cpython/Lib/test', 'test_abc.py')  # cost=2
os.path.join('/home/barney/projects/cpython/Lib/test', 'test_pathlib')  # cost=2
os.path.join('/home/barney/projects/cpython/Lib/test', 'test_zipfile')  # cost=2
# total cost: 12

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions