Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urllib.FancyURLopener does not treat URL fragments correctly #62319

Open
takahashishuhei mannequin opened this issue Jun 2, 2013 · 7 comments
Open

urllib.FancyURLopener does not treat URL fragments correctly #62319

takahashishuhei mannequin opened this issue Jun 2, 2013 · 7 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@takahashishuhei
Copy link
Mannequin

takahashishuhei mannequin commented Jun 2, 2013

BPO 18119
Nosy @orsenthil, @ezio-melotti, @mher, @karlcow
Files
  • issue18119-code-only.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2013-06-02.10:22:10.085>
    labels = ['type-bug', 'library']
    title = 'urllib.FancyURLopener does not treat URL fragments correctly'
    updated_at = <Date 2019-04-26.17:49:04.587>
    user = 'https://bugs.python.org/takahashishuhei'

    bugs.python.org fields:

    activity = <Date 2019-04-26.17:49:04.587>
    actor = 'BreamoreBoy'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2013-06-02.10:22:10.085>
    creator = 'takahashi.shuhei'
    dependencies = []
    files = ['36832']
    hgrepos = []
    issue_num = 18119
    keywords = ['patch']
    message_count = 7.0
    messages = ['190480', '228417', '228660', '228720', '228744', '228748', '228755']
    nosy_count = 5.0
    nosy_names = ['orsenthil', 'ezio.melotti', 'mher', 'karlcow', 'takahashi.shuhei']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue18119'
    versions = ['Python 2.7']

    @takahashishuhei
    Copy link
    Mannequin Author

    takahashishuhei mannequin commented Jun 2, 2013

    When urllib.FancyURLopener encounters 302 redirection to a URL with fragments, it sends wrong URL to servers.

    For example, if we run:
    urllib.urlopen('http://example.com/foo')
    and the server responds like following.
    HTTP/1.1 302 Found
    Location: /bar#test
    Then urllib tries next to fetch http://example.com/bar#test, instead of http://example.com/bar.

    Correctly, urllib should strip fragment part of URL before issuing requests.

    @takahashishuhei takahashishuhei mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Jun 2, 2013
    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Oct 4, 2014

    Slipped under the radar?

    @karlcow
    Copy link
    Mannequin

    karlcow mannequin commented Oct 6, 2014

    This is the correct behavior
    GET http://example.com/foo
    with a response containing
    302 and Location: /bar#test
    must trigger
    http://example.com/bar#test

    Location is defined in
    http://tools.ietf.org/html/rfc7231#section-7.1.2

    7.1.2. Location

    The "Location" header field is used in some responses to refer to a
    specific resource in relation to the response. The type of
    relationship is defined by the combination of request method and
    status code semantics.

         Location = URI-reference

    The field value consists of a single URI-reference. When it has the
    form of a relative reference ([RFC3986], Section 4.2), the final
    value is computed by resolving it against the effective request URI
    ([RFC3986], Section 5).

    A bit after in the spec.

    If the Location value provided in a 3xx (Redirection) response does
    not have a fragment component, a user agent MUST process the
    redirection as if the value inherits the fragment component of the
    URI reference used to generate the request target (i.e., the
    redirection inherits the original reference's fragment, if any).

    For example, a GET request generated for the URI reference
    "http://www.example.org/~tim" might result in a 303 (See Other)
    response containing the header field:

     Location: /People.html#tim
    

    which suggests that the user agent redirect to
    "http://www.example.org/People.html#tim"

    Likewise, a GET request generated for the URI reference
    "http://www.example.org/index.html#larry" might result in a 301
    (Moved Permanently) response containing the header field:

     Location: http://www.example.net/index.html
    

    which suggests that the user agent redirect to
    "http://www.example.net/index.html#larry", preserving the original
    fragment identifier.

    @takahashishuhei
    Copy link
    Mannequin Author

    takahashishuhei mannequin commented Oct 6, 2014

    Hi karl,

    Of course it is correct that the user agent is redirected to http://example.com/bar#test when it got such response. However, it never means UA can send an HTTP request containing fragment part.

    In RFC7230 section 3.1.1, HTTP request line is defined as:
    request-line = method SP request-target SP HTTP-version CRLF
    where request-target is defined in section 5.3. In usual case, it is origin-form:
    origin-form = absolute-path [ "?" query ]
    which never includes fragment part.

    This means current urllib behavior violates HTTP/1.1 spec.

    @karlcow
    Copy link
    Mannequin

    karlcow mannequin commented Oct 6, 2014

    Takahashi-san,

    Ah sorry misunderstood which part your were talking about. I assume wrongly you were talking about navigation.

    Yes for the request which is sent to the server it should be
    http://tools.ietf.org/html/rfc7230#section-5.3.1
    So refactoring your example.

    1st request:

    GET /foo HTTP/1.1
    Accept: text/html
    Host: example.com
    

    server response
    HTTP/1.1 302 Found
    Location: /bar#test

    second request must be
    GET /bar HTTP/1.1
    Accept: text/html
    Host: example.com

    As for the navigation context is indeed part of the piece of code taking in charge the document after being parsed and not the one doing the HTTP request. (putting it here just that people understand)

    (to be tested)
    For server side receiving invalid Request-line
    http://tools.ietf.org/html/rfc7230#section-3.1.1

    Recipients of an invalid request-line SHOULD respond with either a
    400 (Bad Request) error or a 301 (Moved Permanently) redirect with
    the request-target properly encoded. A recipient SHOULD NOT attempt
    to autocorrect and then process the request without a redirect, since
    the invalid request-line might be deliberately crafted to bypass
    security filters along the request chain.

    @karlcow
    Copy link
    Mannequin

    karlcow mannequin commented Oct 6, 2014

    In class urlopen_HttpTests
    https://hg.python.org/cpython/file/4f314dedb84f/Lib/test/test_urllib.py#l191

    there is a test for invalid redirects

    def test_invalid_redirect(self):
    https://hg.python.org/cpython/file/4f314dedb84f/Lib/test/test_urllib.py#l247

    And one for fragments
    def test_url_fragment(self):
    https://hg.python.org/cpython/file/4f314dedb84f/Lib/test/test_urllib.py#l205

    which refers to http://bugs.python.org/issue11703

    code in
    https://hg.python.org/cpython/file/d5688a94a56c/Lib/urllib.py

    @karlcow
    Copy link
    Mannequin

    karlcow mannequin commented Oct 7, 2014

    OK I fixed the code. The issue is here
    https://hg.python.org/cpython/file/1e1c6e306eb4/Lib/urllib/request.py#l656

            newurl = urlunparse(urlparts)

    Basically it reinjects the fragment in the new url. The fix is easy.

            if urlparts.fragment:
                urlparts = list(urlparts)
                urlparts[5] = ""
            newurl = urlunparse(urlparts)

    I was trying to make a test for it, but failed. Could someone help me for the test so I can complete the patch?

    Added the code patch only.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    0 participants