-
-
Notifications
You must be signed in to change notification settings - Fork 30.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-32689: Updates shutil.move to allow for Path objects to be used as source arg #5393
Conversation
Lib/shutil.py
Outdated
# 'foo' | ||
# >>> os.path.basename('/bar/foo/') | ||
# '' | ||
# _basename('/bar/foo/') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing '>>> '
@@ -550,7 +557,10 @@ def move(src, dst, copy_function=copy2): | |||
os.rename(src, dst) | |||
return | |||
|
|||
real_dst = os.path.join(dst, _basename(src)) | |||
# Using _basename instead of os.path.basename is important, as we must | |||
# ignore any trailing slash to avoid the basename returning '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain the need for str()
too here? (It's clear from the PR title, but it won't be clear immediately to future readers of the code.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
…be used as source argument by casting to string before performing rstrip. Added documentation on why rstrip needs to be used in the helper _basename method.
c10ec50
to
cc5645d
Compare
This will LGTM once you add a Misc/NEWS entry (see the "test" failure). |
# ignore any trailing slash to avoid the basename returning '' | ||
# Forcing src to a string allows flexibility of objects being passed in, | ||
# including Path objects | ||
real_dst = os.path.join(dst, _basename(str(src))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to not use os.fspath(src)
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean to use os.fspath(src)
instead of casting to a string?
I suppose it would guard users against passing in anything that isn't a string, bytes or path. Otherwise, if it is a path, __fspath__
simply calls str()
anyway. However, _basename
relies on having a string as its argument, bytes would indeed break that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fspath() is better. I'm surprised there are no other functions in shutil.py with the same need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fspath()
is used elsewhere in shutil.py, but it seems that only shutil.move
needs to handle its basename differently. If we really want to use fspath()
now, I can look at refactoring _basename()
, otherwise I can add this to my list of things to improve for shutil.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shutil doesnt explicitly document that operations don't support all path-like objects (str
, bytes
, and os.PathLike
), though the copy*
functions do say their arguments should be strings . For the most part shutil does work with path-like objects, but _basename
doesn't.
Since _basename
breaks the abstraction of os.path.basename
, it's _basename
that needs to be fixed, not everywhere that it gets called, even if for now that's only in one place. It should use path = os.fspath(path)
. If this returns bytes
, it should also use sep = os.fsencode(sep)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that's actually a typo in the PR name (it wouldn't let me edit it after I realized). This modifies shutil.move
not shutil.copy
, which doesn't specify that its arguments are strings.
I'm not sure I'd say that _basename
breaks the abstraction of os.path.basename
, it just gives us part of the path with different requirements. But, I didn't write the original code :)
I'm happy to take another look at this though.
Ah, the problem is that os.fspath(x) returns bytes if x is bytes, which
breaks _basename. Oh well.
|
68109bd
to
61f1019
Compare
I have superpowers that let me edit the subject. :-) I'll look into whether you can have them too (also, I thought everyone could edit their own subject -- unsure why you couldn't in this case.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emilyemorehouse: would you mind to add an unit test to make sure that the reported bug no longer fails? https://bugs.python.org/issue32689#msg310900
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
@vstinner of course! i'm planning on tackling this during the sprint, probably won't have time before then |
Ok, fine ;-) |
# >>> os.path.basename('/bar/foo/') | ||
# '' | ||
# >>> _basename('/bar/foo/') | ||
# 'foo' | ||
sep = os.path.sep + (os.path.altsep or '') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind considering patching here with something like:
if isinstance(path, os.PathLike):
return path.name
sep = os.path.sep + (os.path.altsep or '')
return os.path.basename(path.rstrip(sep))
?
That would be the only place to apply changes then.
@emilyemorehouse: Ping? I still don't see a new test to the change? Are you available to add it? Or would you prefer that I try to find someone else to write it? |
@emilyemorehouse It seems like this is ready to go pending the addition of the test that Victor requested. Would you be able to add that test for this change to make it into 3.8? Thanks! |
@emilyemorehouse Do you mind if I continue the work on that subject? I am currently pacthing |
How is this for making it into 3.8? Pathlib really needs it. Needing to do str() at random functions instead of having something that cleaning does the In other words, shutil is not PEP 519 compliant, and it would be great if it were. |
_basename already breaks on bytes. Not an introduced break from this code. I'll add a test, and let's get this in please! |
# ignore any trailing slash to avoid the basename returning '' | ||
# Forcing src to a string allows flexibility of objects being passed in, | ||
# including Path objects | ||
real_dst = os.path.join(dst, _basename(str(src))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it considered poor form to get rid of _basename altogether and make use of pathlib in the move function? I'm not sure if the idea is for all these modules to strictly avoid circular dependencies. They are already using os.path which is just as much a citizen in 3.8 as pathlib right?
e.g.
real_dst = os.path.join(dst, _basename(src))
becomes
real_dst = Path(dst) / Path(src).name
I've looked around and familiarized myself, and I now think importing pathlib here is fine. My only remaining concern is that of performance.
Here's the performance difference for this step.
In [46]: %timeit real_dst = os.path.join("a/b/c", _basename('b/'))
2.71 µs ± 62.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [47]: %timeit real_dst = Path("a/b/c") / Path('b/').name
12.4 µs ± 65.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Is 10us significant or insignificant compared to the least expensive operation this function will do? I don't know. Let's find out.
In [55]: %timeit os.rename('/tmp/a/a.txt', '/tmp/a/b.txt'); os.rename('/tmp/a/b.txt', '/tmp/a/a.txt')
124 µs ± 2.18 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
62us to rename. 10us seems significant enough that we wouldn't want to favor the Path sugar suggestion. 16% speed decrease from adding the 10us.
What do people think? I was hoping to get to use pathlib.Path here, but I suspect for this low level move, it should be as fast as possible, and 16% is not worth one line of sugary code to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another question, would it then be poor form to drop _basename from python3.8? Not sure on the policy regarding removing privates without a deprecation warning. Could always deprecate it for removal in 3.9.
Added a test that fails but is addressed to pass in this PR
After change:
Running all the tests
|
@vstinner Test is added, code added to make it pass. Performance testing was also done to check out a few other viable solutions to ensure mass renaming doesn't get a performance regression as a result of this change. |
This PR has been stuck for over two weeks. Who's waiting for whom? |
Looks like reviewers were waiting on @emilyemorehouse to add tests. @maxwellmckinnon made their own PR iterating on the feedback here: #15326 @emilyemorehouse: Would you like to finish this off yourself or do you mind if Maxwell continues it? |
I guess no one had any luck pinging @emilyemorehouse over the last month? I found Emily’s email and pinged her there to come take a look. Should we just merge the additional tweaks (giving Emily 1st credit of course) and new tests if nothing happens in two more weeks? |
Sorry, too late. Rc1 is around the corner.
On Sun, Sep 29, 2019 at 12:11 Maxwell A McKinnon ***@***.***> wrote:
I guess no one had any luck pinging @emilyemorehouse
<https://github.com/emilyemorehouse> over the last month?
I found Emily’s email and pinged her there to come take a look.
Should we just merge the additional tweaks (giving Emily 1st credit of
course) and new tests if nothing happens in two more weeks?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#5393?email_source=notifications&email_token=AAWCWMSREVBCHSQHKTSCXRDQMD4XRA5CNFSM4EN7IE4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD734L4A#issuecomment-536331760>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAWCWMU3JKSGSLSBQB4545DQMD4XRANCNFSM4EN7IE4A>
.
--
--Guido (mobile)
|
From Emily over email: “I haven't had the time to finish it, feel free to propose yours!” No worries on getting this in for anything. I imagine it can always be put in as a patch release to 3.7? All it does is take the tediousness out of getting hit with “that one shutil function that doesn’t work with pathlib”, as the stackoverflow on this bug can attest to. |
…s source arg (GH-15326) Important work originally done by @emilyemorehouse two years ago and nearly ready to go in. This bug has affected many people and in some cases has been a dealbreaker to the adoption of the otherwise wonderful pathlib and PEP519. https://stackoverflow.com/questions/33625931/copy-file-with-pathlib-in-python. This adds the outstanding test request from that PR @vstinner (#5393). Test fails without the change, passes with it, along with every other test in test_shutil. Some variants were experimented with to make the one line change and the most performant one was picked. # Added Test for PathLike directory destination, the current fail case ``` Lib/test/test_shutil.py::TestMove::test_move_file_pathlike FAILED [100%] ============================================================== FAILURES =============================================================== __________________________________________________ TestMove.test_move_file_pathlike ___________________________________________________ self = <test.test_shutil.TestMove testMethod=test_move_file_pathlike> def test_move_file_pathlike(self): # Move a file to another location on the same filesystem. src = pathlib.Path(self.src_file) > self._check_move_file(src, self.dst_dir, self.dst_file) Lib/test/test_shutil.py:1563: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Lib/test/test_shutil.py:1545: in _check_move_file shutil.move(src, dst) /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/shutil.py:562: in move real_dst = os.path.join(dst, _basename(src)) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ path = PosixPath('/var/folders/r2/psq74t5x3nbfzlph8bh2pvdw0000gn/T/tmp9ie0wh9_/foo') def _basename(path): # A basename() variant which first strips the trailing slash, if present. # Thus we always get the last component of the path, even for directories. sep = os.path.sep + (os.path.altsep or '') > return os.path.basename(path.rstrip(sep)) E AttributeError: 'PosixPath' object has no attribute 'rstrip' /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/shutil.py:526: AttributeError ============================================== 1 failed, 102 deselected in 0.30 seconds =============================================== ``` After change: ``` ========================================================= test session starts ========================================================= platform darwin -- Python 3.7.4, pytest-5.0.1, py-1.8.0, pluggy-0.12.0 -- /Users/maxwellmckinnon/.venvs/TA3.7/bin/python3.7 cachedir: .pytest_cache rootdir: /Users/maxwellmckinnon/dev/cpython plugins: cov-2.7.1, mock-1.10.4 collected 103 items / 102 deselected / 1 selected Lib/test/test_shutil.py::TestMove::test_move_file_pathlike PASSED [100%] ============================================== 1 passed, 102 deselected in 0.06 seconds =============================================== ``` Running all the tests in test_shutil.py ``` ╰─ pytest Lib/test/test_shutil.py -v ========================================================= test session starts ========================================================= platform darwin -- Python 3.7.4, pytest-5.0.1, py-1.8.0, pluggy-0.12.0 -- /Users/maxwellmckinnon/.venvs/TA3.7/bin/python3.7 cachedir: .pytest_cache rootdir: /Users/maxwellmckinnon/dev/cpython plugins: cov-2.7.1, mock-1.10.4 collected 103 items Lib/test/test_shutil.py::TestShutil::test_chown PASSED [ 0%] Lib/test/test_shutil.py::TestShutil::test_copy PASSED [ 1%] ... Lib/test/test_shutil.py::TermsizeTests::test_stty_match SKIPPED [ 99%] Lib/test/test_shutil.py::PublicAPITests::test_module_all_attribute PASSED [100%] ================================================ 96 passed, 7 skipped in 1.25 seconds ================================================= ``` # Performance Considerations Is it considered poor form to get rid of _basename altogether and make use of pathlib in the move function? I'm not sure if the idea is for all these modules to strictly avoid circular dependencies. They are already using os.path which is just as much a citizen in 3.8 as pathlib right? e.g. `real_dst = os.path.join(dst, _basename(src))` becomes `real_dst = Path(dst) / Path(src).name` I've looked around and familiarized myself, and I now think importing pathlib here is fine. My only remaining concern is that of performance. Here's the performance difference for this step. ``` In [46]: %timeit real_dst = os.path.join("a/b/c", _basename('b/')) 2.71 µs ± 62.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [47]: %timeit real_dst = Path("a/b/c") / Path('b/').name 12.4 µs ± 65.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) ``` Is 10us significant or insignificant compared to the least expensive operation this function will do? I don't know. Let's find out. ``` In [55]: %timeit os.rename('/tmp/a/a.txt', '/tmp/a/b.txt'); os.rename('/tmp/a/b.txt', '/tmp/a/a.txt') 124 µs ± 2.18 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` 62us to rename. 10us seems significant enough that we wouldn't want to favor the Path sugar suggestion. 16% speed decrease from adding the 10us. What do people think? I was hoping to get to use pathlib.Path here, but I suspect for this low level move, it should be as fast as possible, and 16% is not worth one line of sugary code to me. https://bugs.python.org/issue32689 Automerge-Triggered-By: @gvanrossum
Superseded by #15326. |
…s source arg (pythonGH-15326) Important work originally done by @emilyemorehouse two years ago and nearly ready to go in. This bug has affected many people and in some cases has been a dealbreaker to the adoption of the otherwise wonderful pathlib and PEP519. https://stackoverflow.com/questions/33625931/copy-file-with-pathlib-in-python. This adds the outstanding test request from that PR @vstinner (python#5393). Test fails without the change, passes with it, along with every other test in test_shutil. Some variants were experimented with to make the one line change and the most performant one was picked. # Added Test for PathLike directory destination, the current fail case ``` Lib/test/test_shutil.py::TestMove::test_move_file_pathlike FAILED [100%] ============================================================== FAILURES =============================================================== __________________________________________________ TestMove.test_move_file_pathlike ___________________________________________________ self = <test.test_shutil.TestMove testMethod=test_move_file_pathlike> def test_move_file_pathlike(self): # Move a file to another location on the same filesystem. src = pathlib.Path(self.src_file) > self._check_move_file(src, self.dst_dir, self.dst_file) Lib/test/test_shutil.py:1563: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Lib/test/test_shutil.py:1545: in _check_move_file shutil.move(src, dst) /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/shutil.py:562: in move real_dst = os.path.join(dst, _basename(src)) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ path = PosixPath('/var/folders/r2/psq74t5x3nbfzlph8bh2pvdw0000gn/T/tmp9ie0wh9_/foo') def _basename(path): # A basename() variant which first strips the trailing slash, if present. # Thus we always get the last component of the path, even for directories. sep = os.path.sep + (os.path.altsep or '') > return os.path.basename(path.rstrip(sep)) E AttributeError: 'PosixPath' object has no attribute 'rstrip' /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/shutil.py:526: AttributeError ============================================== 1 failed, 102 deselected in 0.30 seconds =============================================== ``` After change: ``` ========================================================= test session starts ========================================================= platform darwin -- Python 3.7.4, pytest-5.0.1, py-1.8.0, pluggy-0.12.0 -- /Users/maxwellmckinnon/.venvs/TA3.7/bin/python3.7 cachedir: .pytest_cache rootdir: /Users/maxwellmckinnon/dev/cpython plugins: cov-2.7.1, mock-1.10.4 collected 103 items / 102 deselected / 1 selected Lib/test/test_shutil.py::TestMove::test_move_file_pathlike PASSED [100%] ============================================== 1 passed, 102 deselected in 0.06 seconds =============================================== ``` Running all the tests in test_shutil.py ``` ╰─ pytest Lib/test/test_shutil.py -v ========================================================= test session starts ========================================================= platform darwin -- Python 3.7.4, pytest-5.0.1, py-1.8.0, pluggy-0.12.0 -- /Users/maxwellmckinnon/.venvs/TA3.7/bin/python3.7 cachedir: .pytest_cache rootdir: /Users/maxwellmckinnon/dev/cpython plugins: cov-2.7.1, mock-1.10.4 collected 103 items Lib/test/test_shutil.py::TestShutil::test_chown PASSED [ 0%] Lib/test/test_shutil.py::TestShutil::test_copy PASSED [ 1%] ... Lib/test/test_shutil.py::TermsizeTests::test_stty_match SKIPPED [ 99%] Lib/test/test_shutil.py::PublicAPITests::test_module_all_attribute PASSED [100%] ================================================ 96 passed, 7 skipped in 1.25 seconds ================================================= ``` # Performance Considerations Is it considered poor form to get rid of _basename altogether and make use of pathlib in the move function? I'm not sure if the idea is for all these modules to strictly avoid circular dependencies. They are already using os.path which is just as much a citizen in 3.8 as pathlib right? e.g. `real_dst = os.path.join(dst, _basename(src))` becomes `real_dst = Path(dst) / Path(src).name` I've looked around and familiarized myself, and I now think importing pathlib here is fine. My only remaining concern is that of performance. Here's the performance difference for this step. ``` In [46]: %timeit real_dst = os.path.join("a/b/c", _basename('b/')) 2.71 µs ± 62.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [47]: %timeit real_dst = Path("a/b/c") / Path('b/').name 12.4 µs ± 65.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) ``` Is 10us significant or insignificant compared to the least expensive operation this function will do? I don't know. Let's find out. ``` In [55]: %timeit os.rename('/tmp/a/a.txt', '/tmp/a/b.txt'); os.rename('/tmp/a/b.txt', '/tmp/a/a.txt') 124 µs ± 2.18 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` 62us to rename. 10us seems significant enough that we wouldn't want to favor the Path sugar suggestion. 16% speed decrease from adding the 10us. What do people think? I was hoping to get to use pathlib.Path here, but I suspect for this low level move, it should be as fast as possible, and 16% is not worth one line of sugary code to me. https://bugs.python.org/issue32689 Automerge-Triggered-By: @gvanrossum
Updates shutil.move function to allow for Path objects to be used as source argument by casting to string before performing rstrip. Added documentation on why rstrip needs to be used in the helper _basename method.
https://bugs.python.org/issue32689