New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate os.path.commonprefix #74453
Comments
The function os.path.commonprefix computes the longest prefix of strings (any iterable, actually), regardless of their meaning as paths. I do not see any reason to use this function for paths, and keeping it in the os.path module makes it prone to be confused with os.path.commonpath (which was introduced in Python 3.5). I believe making this function raise a DeprecationWarning would help avoid having this kind of bugs. |
Ned Batchelder wrote an article about this function in 2010 :-) https://nedbatchelder.com/blog/201003/whats_the_point_of_ospathcommonprefix.html """
But it should say:
""" |
We shouldn't deprecate a function until add an alternative. There is a working alternative for paths, but commonprefix() is used in the wild for non-paths (for example in unittest.util). The problem is that there is no right place for this function. The string module is wrong place because commonprefix() supports not only strings. Perhaps the new seqtools module would be a right place. |
I agree with Serhiy that it might be time to create a seqtools module. |
The |
For now, there are three uses of commonprefix() in the stdlib:
I think that we should add commonprefix() in the string module (or maybe in a new module for operations on sequences), deprecate os.path.commonprefix() in documentation only, several versions later add a deprecation warning in os.path.commonprefix(), and several versions later remove os.path.commonprefix(). |
In the meantime, maybe that note in the |
Why is this a security issue? |
The function can be misused to prevent directory traversal. |
/cc @barneygale |
I agree with Ned Batchelder's blog post: "This function is worse than useless, it’s misleading."
+1 to this. I note that |
Hi, Regarding why the The So what is the issue in that case? The TrellixVulnTeam's with tarfile.open('my.tar') as f:
def is_within_directory(directory, target):
abs_directory = os.path.abspath(directory)
abs_target = os.path.abspath(target)
prefix = os.path.commonprefix([abs_directory, abs_target])
return prefix == abs_directory
def safe_extract(tar, path=".", members=None, *, numeric_owner=False):
for member in tar.getmembers():
member_path = os.path.join(path, member.name)
if not is_within_directory(path, member_path):
raise Exception("Attempted Path Traversal in Tar File")
tar.extractall(path, members, numeric_owner=numeric_owner)
safe_extract(f, tmp_dir) Now, this all boils down to the following check: prefix = os.path.commonprefix([abs_directory, abs_target])
return prefix == abs_directory The [*] Will extract archive_file_member.name: ../tmpXXX
target = ./tmp/../tmpXXX
abs_directory = /Users/dc/playground/py-tarfile-extractall-cve-patch/tmp
abs_target = /Users/dc/playground/py-tarfile-extractall-cve-patch/tmpXXX
prefix = /Users/dc/playground/py-tarfile-extractall-cve-patch/tmp As a result, while we intended to extract the
which matched the You can find the full proof of concept from above code here: https://gist.github.com/disconnect3d/00a22838380cd2a29cfc87a8599261f6 With all that being said, I think it may make sense to add a security warning that the [0] https://www.trellix.com/en-us/about/newsroom/stories/research/trellix-advanced-research-center-patches-vulnerable-open-source-projects.html |
Q: If we add I ask because I suspect finding the common prefix of precisely two strings is the most common use case, and users might not anticipate a need to wrap the arguments in a list. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: