Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate os.path.commonprefix #74453

Open
progval mannequin opened this issue May 4, 2017 · 13 comments
Open

Deprecate os.path.commonprefix #74453

progval mannequin opened this issue May 4, 2017 · 13 comments
Labels
stdlib Python modules in the Lib dir

Comments

@progval
Copy link
Mannequin

progval mannequin commented May 4, 2017

BPO 30267
Nosy @brettcannon, @rhettinger, @vstinner, @nedbat, @progval, @serhiy-storchaka

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2017-05-04.16:02:33.332>
labels = ['library']
title = 'Deprecate os.path.commonprefix'
updated_at = <Date 2022-02-16.07:38:29.868>
user = 'https://github.com/ProgVal'

bugs.python.org fields:

activity = <Date 2022-02-16.07:38:29.868>
actor = 'serhiy.storchaka'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2017-05-04.16:02:33.332>
creator = 'Valentin.Lorentz'
dependencies = []
files = []
hgrepos = []
issue_num = 30267
keywords = []
message_count = 6.0
messages = ['292993', '292994', '293004', '293140', '293173', '413320']
nosy_count = 6.0
nosy_names = ['brett.cannon', 'rhettinger', 'vstinner', 'nedbat', 'Valentin.Lorentz', 'serhiy.storchaka']
pr_nums = []
priority = 'low'
resolution = None
stage = None
status = 'open'
superseder = None
type = None
url = 'https://bugs.python.org/issue30267'
versions = []

@progval
Copy link
Mannequin Author

progval mannequin commented May 4, 2017

The function os.path.commonprefix computes the longest prefix of strings (any iterable, actually), regardless of their meaning as paths.

I do not see any reason to use this function for paths, and keeping it in the os.path module makes it prone to be confused with os.path.commonpath (which was introduced in Python 3.5).

I believe making this function raise a DeprecationWarning would help avoid having this kind of bugs.

@progval progval mannequin added the stdlib Python modules in the Lib dir label May 4, 2017
@vstinner
Copy link
Member

vstinner commented May 4, 2017

Ned Batchelder wrote an article about this function in 2010 :-)

https://nedbatchelder.com/blog/201003/whats_the_point_of_ospathcommonprefix.html

"""
The docs helpfully include the warning:

Note that this may return invalid paths because it works a character at a time.

But it should say:

This function is in the wrong place, and has nothing to do with paths, don't use it if you are interested in file paths!

"""

@serhiy-storchaka
Copy link
Member

We shouldn't deprecate a function until add an alternative. There is a working alternative for paths, but commonprefix() is used in the wild for non-paths (for example in unittest.util). The problem is that there is no right place for this function. The string module is wrong place because commonprefix() supports not only strings. Perhaps the new seqtools module would be a right place.

@brettcannon
Copy link
Member

I agree with Serhiy that it might be time to create a seqtools module.

@rhettinger
Copy link
Contributor

The os.path.commonprefix function has been around for a very long time. Deprecating it will just cause unnecessary pain for users and make it harder to upgrade to Python 3. The function isn't broken, the only issue here is that a new function was added with a similar name.

@serhiy-storchaka
Copy link
Member

For now, there are three uses of commonprefix() in the stdlib:

  1. In urllib.request it causes a security issue (see bpo-46756). commonpath() or just str.startswith() should be used instead.

  2. In lib2to3.main. The code contains a workaround around commonprefix(). It could be simplified by using commonpath().

  3. In unittest.util. This is the only correct use of commonprefix(). It cannot be replaced by commonpath().

I think that we should add commonprefix() in the string module (or maybe in a new module for operations on sequences), deprecate os.path.commonprefix() in documentation only, several versions later add a deprecation warning in os.path.commonprefix(), and several versions later remove os.path.commonprefix().

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@gbdlin
Copy link

gbdlin commented Jan 8, 2023

In the meantime, maybe that note in the commonprefix documentation entry should be escalated to a security warning?

@iritkatriel
Copy link
Member

In the meantime, maybe that note in the commonprefix documentation entry should be escalated to a security warning?

Why is this a security issue?

@vstinner
Copy link
Member

vstinner commented Apr 5, 2023

Why is this a security issue?

The function can be misused to prevent directory traversal.

@brettcannon
Copy link
Member

/cc @barneygale

@barneygale
Copy link
Contributor

I agree with Ned Batchelder's blog post: "This function is worse than useless, it’s misleading."

I think that we should add commonprefix() in the string module (or maybe in a new module for operations on sequences), deprecate os.path.commonprefix() in documentation only, several versions later add a deprecation warning in os.path.commonprefix(), and several versions later remove os.path.commonprefix().

+1 to this.

I note that os.path.commonprefix() called with no arguments returns the empty string '', which makes me feel slightly better about it going in string rather than a new seqtools module. In a similar vein we have a str.removeprefix() method that works on strings, despite the potential for it to be extended to sequences more generally.

@disconnect3d
Copy link
Contributor

disconnect3d commented Apr 7, 2023

Hi,

Regarding why the os.path.commonprefix may be a security vulnerability, here is an actual case where it (likely, depending on the source of a tar archive) is.

The os.path.commonprefix function is being used in 61000+ instances of an attempt to fix CVE-2007-4559, the tarfile.extractall path traversal made by the @TrellixVulnTeam [0]. However, apart from being spam (since Trellix hasn't evaluated whether a project needs such a "fix", or didn't bother to sign CLAs etc. [1]), the fix that was made still allows for a path traversal due to the use (or incorrect use) of os.path.commonprefix.

So what is the issue in that case?

The TrellixVulnTeam's tar.extractall fix looks as follows:

with tarfile.open('my.tar') as f:
    def is_within_directory(directory, target):
        abs_directory = os.path.abspath(directory)
        abs_target = os.path.abspath(target)

        prefix = os.path.commonprefix([abs_directory, abs_target])
        return prefix == abs_directory

    def safe_extract(tar, path=".", members=None, *, numeric_owner=False):
        for member in tar.getmembers():
            member_path = os.path.join(path, member.name)
            if not is_within_directory(path, member_path):
                raise Exception("Attempted Path Traversal in Tar File")

        tar.extractall(path, members, numeric_owner=numeric_owner)

    safe_extract(f, tmp_dir)

Now, this all boils down to the following check:

        prefix = os.path.commonprefix([abs_directory, abs_target])
        return prefix == abs_directory

The abs_target is just an absolute path from os.path.join(path_to_extract_to, archive_file_member.name) and the problem is that if we know the path we are extracting to, e.g., ./tmp we can then set the member.name to ../tmpXXX and then we will get:

[*] Will extract archive_file_member.name: ../tmpXXX
target           = ./tmp/../tmpXXX
abs_directory    = /Users/dc/playground/py-tarfile-extractall-cve-patch/tmp
abs_target       = /Users/dc/playground/py-tarfile-extractall-cve-patch/tmpXXX
prefix           = /Users/dc/playground/py-tarfile-extractall-cve-patch/tmp

As a result, while we intended to extract the tmpXXX file into ./tmp/tmpXXX, we ended up extracting it to ./tmpXXX and this is because of the os.path.commonprefix used here that made it so that:

os.path.commonprefix([abs_directory, abs_target])
==
abs_directory, abs_target([
    "/Users/dc/playground/py-tarfile-extractall-cve-patch/tmp",
    "/Users/dc/playground/py-tarfile-extractall-cve-patch/tmpXXX"
])
==
"/Users/dc/playground/py-tarfile-extractall-cve-patch/tmp"

which matched the abs_directory being checked in return prefix == abs_directory.

You can find the full proof of concept from above code here: https://gist.github.com/disconnect3d/00a22838380cd2a29cfc87a8599261f6

With all that being said, I think it may make sense to add a security warning that the os.path.commonprefix function should not be used for checking paths, giving a counter example like the one shown in this post, even if it is moved to another module/class.

[0] https://www.trellix.com/en-us/about/newsroom/stories/research/trellix-advanced-research-center-patches-vulnerable-open-source-projects.html
[1] Among others (I can't list 61k PRs here):

@barneygale
Copy link
Contributor

Q: If we add string.commonprefix(), should it accept a single list of strings, or should it have a signature more like os.path.join, i.e. def commonprefix(s, *strings):?

I ask because I suspect finding the common prefix of precisely two strings is the most common use case, and users might not anticipate a need to wrap the arguments in a list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir
Projects
None yet
Development

No branches or pull requests

8 participants