Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urllib's request.pathname2url not compatible with extended-length Windows file paths #87773

Closed
levineds mannequin opened this issue Mar 23, 2021 · 8 comments
Closed

urllib's request.pathname2url not compatible with extended-length Windows file paths #87773

levineds mannequin opened this issue Mar 23, 2021 · 8 comments
Assignees
Labels
3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes OS-windows type-bug An unexpected behavior, bug, or error

Comments

@levineds
Copy link
Mannequin

levineds mannequin commented Mar 23, 2021

BPO 43607
Nosy @pfmoore, @tjguk, @zware, @eryksun, @zooba, @miss-islington, @levineds
PRs
  • bpo-43607: Fix urllib handling of Windows paths with \\?\ prefix #25539
  • [3.9] bpo-43607: Fix urllib handling of Windows paths with \\?\ prefix (GH-25539) #25558
  • [3.8] bpo-43607: Fix urllib handling of Windows paths with \\?\ prefix (GH-25539) #25559
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/zooba'
    closed_at = <Date 2021-04-23.17:04:46.645>
    created_at = <Date 2021-03-23.22:56:49.070>
    labels = ['3.10', 'type-bug', '3.8', '3.9', 'OS-windows']
    title = "urllib's request.pathname2url not compatible with extended-length Windows file paths"
    updated_at = <Date 2021-04-23.18:21:54.014>
    user = 'https://github.com/levineds'

    bugs.python.org fields:

    activity = <Date 2021-04-23.18:21:54.014>
    actor = 'steve.dower'
    assignee = 'steve.dower'
    closed = True
    closed_date = <Date 2021-04-23.17:04:46.645>
    closer = 'steve.dower'
    components = ['Windows']
    creation = <Date 2021-03-23.22:56:49.070>
    creator = 'levineds'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 43607
    keywords = ['patch']
    message_count = 8.0
    messages = ['389415', '389430', '389432', '389433', '389434', '391708', '391714', '391722']
    nosy_count = 7.0
    nosy_names = ['paul.moore', 'tim.golden', 'zach.ware', 'eryksun', 'steve.dower', 'miss-islington', 'levineds']
    pr_nums = ['25539', '25558', '25559']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue43607'
    versions = ['Python 3.8', 'Python 3.9', 'Python 3.10']

    @levineds
    Copy link
    Mannequin Author

    levineds mannequin commented Mar 23, 2021

    Windows file paths are limited to 256 characters, and one of Windows's prescribed methods to address this is to prepend "\\?\" before a Windows absolute path (see: https://docs.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation)

    urllib.request.pathname2url raises an error on such paths as this function calls nturl2path.py's pathname2url function which explicitly checks that the number of characters before the ":" in a Windows path is precisely one, which is, of course, not the case if you are using an extended-length path (e.g. "\\?\C:\Python39").

    As a result, urllib cannot handle pathname2url conversion for some valid Windows paths.

    @levineds levineds mannequin added type-crash A hard crash of the interpreter, possibly with a core dump 3.9 only security fixes OS-windows type-bug An unexpected behavior, bug, or error and removed type-crash A hard crash of the interpreter, possibly with a core dump labels Mar 23, 2021
    @eryksun
    Copy link
    Contributor

    eryksun commented Mar 24, 2021

    RFC8089 doesn't specify "a mechanism for translating namespaced paths ["\\?\" and "\\.\"] to or from file URIs", and the Windows shell doesn't support them. So what's the practical benefit of supporting them in nturl2path?

    Windows file paths are limited to 256 characters,

    Classically, normal filepaths are limited to MAX_PATH - 1 (259) characters, in most cases, except for a few cases in which the limit is even smaller.

    For a normal filepath, the API replaces slashes with backlashes; resolves relative paths; resolves "." and ".." components; strips trailing dots and spaces from the final path component; and, for relative paths and DOS drive-letter paths, reserves DOS device names in the final path component (e.g. CON, NUL).

    The kernel supports filepaths with up to 32,767 characters, but classically this was only accessible by using a verbatim \\?\ filepath, or by using workarounds based on substitute drives or filesystem mountpoints and symlinks.

    With Python 3.6+ in Windows 10, if long paths are enabled in the system, normal filepaths support up to the full 32,767 characters in most cases. The need for the \\?\ prefix is thus limited to the rare case when a verbatim path is required, or when a filepath has to be passed to a legacy application that doesn't support long paths.

    @levineds
    Copy link
    Mannequin Author

    levineds mannequin commented Mar 24, 2021

    I really meant 255 characters not 256 because I was leaving three for "<drive name>:/". I suppose the most reasonable behavior is to strip out the "\\?\" before attempting the conversion as the path is sensible and parsable without that, as opposed to the current behavior which is to crash. The practical benefit is to permit the function to work on a wider range of inputs than currently is possible for essentially no cost.

    @eryksun
    Copy link
    Contributor

    eryksun commented Mar 24, 2021

    I suppose the most reasonable behavior is to strip out the "\\?\" before
    attempting the conversion as the path is sensible and parsable without

    Okay, so you're not looking to preserve the fact that it's a \\?\ verbatim path in the URI. You just want to automatically convert from verbatim \\?\X: or \\?\UNC\server\share to normal form. Devices other than drive letters and "UNC" wouldn't be supported.

    @levineds
    Copy link
    Mannequin Author

    levineds mannequin commented Mar 24, 2021

    I think that would make the most sense, yes.

    @zooba zooba added 3.8 (EOL) end of life 3.10 only security fixes labels Apr 22, 2021
    @zooba zooba self-assigned this Apr 22, 2021
    @zooba zooba added 3.8 (EOL) end of life 3.10 only security fixes labels Apr 22, 2021
    @zooba zooba self-assigned this Apr 22, 2021
    @zooba
    Copy link
    Member

    zooba commented Apr 23, 2021

    New changeset 3513d55 by Steve Dower in branch 'master':
    bpo-43607: Fix urllib handling of Windows paths with \\?\ prefix (GH-25539)
    3513d55

    @zooba zooba closed this as completed Apr 23, 2021
    @zooba zooba closed this as completed Apr 23, 2021
    @miss-islington
    Copy link
    Contributor

    New changeset 04bcfe0 by Miss Islington (bot) in branch '3.9':
    bpo-43607: Fix urllib handling of Windows paths with \\?\ prefix (GH-25539)
    04bcfe0

    @zooba
    Copy link
    Member

    zooba commented Apr 23, 2021

    New changeset e92d110 by Miss Islington (bot) in branch '3.8':
    bpo-43607: Fix urllib handling of Windows paths with \\?\ prefix (GH-25539)
    e92d110

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes OS-windows type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants