-
-
Notifications
You must be signed in to change notification settings - Fork 31.6k
filecmp.cmp(shallow=True) isn't actually shallow when only mtime differs #87124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Consider the case where 2 files are shallow compared where only the mtime differs (i.e. the mode and size is identical). With filecmp.cmp(f1, f2, shallow=True) a deep compare would be performed behind the scenes since the guard clauses fell through. This discrepancy is either a problem in the docstring or a problem in the comparison itself. Two options remain:
My preference is to adjust the function to safeguard the meaning of 'shallow' in this context. |
This is a problem with the docstring. The actual docs for it are a bit more clear, https://docs.python.org/3/library/filecmp.html#filecmp.cmp : "If shallow is true, files with identical os.stat() signatures are taken to be equal. Otherwise, the contents of the files are compared." Your patch can't be used because it changes longstanding documented behavior. If you'd like to submit a patch to fix the docstring, that's fine, but we're not going to break existing code to make the function less accurate. The patch should probably make the documentation more clear while it's at it.
Proposed rewording of main docs would be: "If shallow is true, files with identical os.stat() signatures (file type, size, and modification time) are taken to be equal. When shallow is false, or the file signatures are identical, the contents of the files are compared." A similar wording (or at least, a shorter version of the above, rather than a strictly wrong description of the shallow parameter) could be applied to the docstring. |
Hi, this is also discussed in https://bugs.python.org/issue41354. Ciao, |
Christof and Alexander, as you both said your preference is to have some way of enforcing "only shallow" compare, I want to clarify -- it seems if you do that, you will get the answer of True if the files are most likely the same (sig is equal), but if the sig is not equal, I'm not sure what answer you expect or why. The sig may be different because of mtime, and then the files may be different or the same, it's anyone's guess. I wonder if both of you expect the same behavior, and if so, for the same use case or not? |
Also see this 16 years old issue: https://bugs.python.org/issue1234674 |
Hi Andrei, I would follow rsync. so, yes you can have false positives with a shallow comparison of size + mtime only. But that's usually ok for e.g. incremental backups. Wow, the bug is that old... |
But rsync is a quite more specific tool. For example, unix cmp command does not guess based on any part of |
Andrei, Thus,
|
I've put up the doc update PR here: I've also reviewed a few dozen SO questions about filecmp.cmp and shallow arg, many of them find the docs confusing or lacking, so I thought updating docs is definitely very useful. I haven't seen any of them asking for "force shallow" behavior, but there's many of them and I haven't read all; searching for "force shallow" doesn't find any. I think it might be useful to also add a new 'force shallow' arg, but I hope more people ask for it or compile a list of SO questions asking for it, otherwise it might not be worth additional complexity. |
Alexander: sorry, I didn't notice your PR because I was first working on a different issue, and then found this as a duplicate, and only noticed there's already an open PR here. If you prefer, we can close my PR and work on updating yours. |
Thanks all for your effort on improving this! ✨ 🍰 ✨ |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: