Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urllib2 HTTPRedirectHandler not forwarding POST data in redirect #58352

Closed
crustymonkey mannequin opened this issue Feb 27, 2012 · 16 comments
Closed

urllib2 HTTPRedirectHandler not forwarding POST data in redirect #58352

crustymonkey mannequin opened this issue Feb 27, 2012 · 16 comments
Labels
extension-modules C modules in the Modules dir type-feature A feature request or enhancement

Comments

@crustymonkey
Copy link
Mannequin

crustymonkey mannequin commented Feb 27, 2012

BPO 14144
Nosy @orsenthil, @merwok
Files
  • urllib2.py.patch: Patch to fix POST data not being redirected
  • urllib2.py.redirect_option.patch: Patch to make the redirecting of POST data an option
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2012-03-08.17:44:29.703>
    created_at = <Date 2012-02-27.22:44:09.043>
    labels = ['extension-modules', 'type-feature']
    title = 'urllib2 HTTPRedirectHandler not forwarding POST data in redirect'
    updated_at = <Date 2012-03-16.23:22:45.554>
    user = 'https://bugs.python.org/crustymonkey'

    bugs.python.org fields:

    activity = <Date 2012-03-16.23:22:45.554>
    actor = 'eric.araujo'
    assignee = 'none'
    closed = True
    closed_date = <Date 2012-03-08.17:44:29.703>
    closer = 'orsenthil'
    components = ['Extension Modules']
    creation = <Date 2012-02-27.22:44:09.043>
    creator = 'crustymonkey'
    dependencies = []
    files = ['24665', '24711']
    hgrepos = []
    issue_num = 14144
    keywords = ['patch']
    message_count = 16.0
    messages = ['154516', '154517', '154518', '154726', '154730', '154732', '154788', '154830', '154831', '154914', '154922', '154940', '154942', '154997', '155168', '156103']
    nosy_count = 3.0
    nosy_names = ['orsenthil', 'eric.araujo', 'crustymonkey']
    pr_nums = []
    priority = 'normal'
    resolution = 'works for me'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue14144'
    versions = ['Python 3.3']

    @crustymonkey
    Copy link
    Mannequin Author

    crustymonkey mannequin commented Feb 27, 2012

    I've noticed that urllib2's HTTPRedirectHandler does not redirect a POST request with the POST data.

    If you send a POST request to a server with data, the data is dropped when the new Request is made to the new url. As stated in a comment in the library itself, redirecting a POST request is not strictly RFC compliant, but it's generally supported anyway. The problem here being that our POST data is not also being redirected. I ran into this issue when writing a web api wrapper in Python.

    I'm submitting a small patch that fixes this issue:

    --- /usr/lib/python2.7/urllib2.py	2011-10-04 16:07:28.000000000 -0500
    +++ urllib2.py	2012-02-27 16:03:36.000000000 -0600
    @@ -551,7 +551,11 @@
                 newheaders = dict((k,v) for k,v in req.headers.items()
                                   if k.lower() not in ("content-length", "content-type")
                                  )
    +            data = None
    +            if req.has_data():
    +                data = req.get_data()
                 return Request(newurl,
    +                           data=data,
                                headers=newheaders,
                                origin_req_host=req.get_origin_req_host(),
                                unverifiable=True)

    @crustymonkey crustymonkey mannequin added type-bug An unexpected behavior, bug, or error extension-modules C modules in the Modules dir labels Feb 27, 2012
    @orsenthil
    Copy link
    Member

    If I recollect correctly, redirecting a POST request with POST data is not a recommended thing to do and could be a security hazard. That's why RFC prohibits it too. It is intentional that urllib2 is not redirecting carrying along the POST data. Could you show any browsers showing this behavior of redirecting along with POST?

    @crustymonkey
    Copy link
    Mannequin Author

    crustymonkey mannequin commented Feb 27, 2012

    Senthil,

    That is a good point about the potential for security issues. What if it was an explicit option in HTTPRedirectHandler since there is a possibility of value in being able to do it. I know my case is probably unusual, but I imagine that others might have run into this too. Something roughly along this line is what I'm thinking:

    ----------------

    class HTTPRedirectHandler(BaseHandler):
        redirect_post_data = False
        ...
        ...
        def redirect_request(self, req, fp, code, msg, headers, newurl):
            ...
            ...
            data = None
            if req.has_data() and self.redirect_post_data:
                data = req.get_data()
            return Request(newurl,
                           data=data,
                           headers=newheaders,
                           origin_req_host=req.get_origin_req_host(),
                           unverifiable=True)

    That would leave the current default behavior as-is, but leave the option to explicitly override it by the user, perhaps with a BIG DISCLAIMER comment about security.

    @merwok
    Copy link
    Member

    merwok commented Mar 2, 2012

    Could you give the RFC section or URI? I glanced at the page about status codes but did not find a prohibition on POST.

    @merwok merwok changed the title urllib2 HTTPRedirectHandler not handling POST data in redirect urllib2 HTTPRedirectHandler not forwarding POST data in redirect Mar 2, 2012
    @orsenthil
    Copy link
    Member

    Here is a section which talks about 3xx redirection

    http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html

    10.3 Redirection 3xx

    This class of status code indicates that further action needs to be taken by the user agent in order to fulfill the request. The action required MAY be carried out by the user agent without interaction with the user if and only if the method used in the second request is GET or HEAD.

    @merwok
    Copy link
    Member

    merwok commented Mar 2, 2012

    That’s clear as mud :) If I read correctly, the text means that the user agent is free to follow redirects without asking the user if the request is GET or HEAD, and needs to ask the user if the request is e.g. POST. That’s in line with what Firefox does when you refresh after a POST: It asks you to confirm if you want the data to be re-sent. POST not being idempotent explains the need for this precaution.

    So, I think a library like urllib should not prevent people from doing something that is allowed. +1 to a new param to transfer the body when following a redirect under a POST.

    @crustymonkey
    Copy link
    Mannequin Author

    crustymonkey mannequin commented Mar 2, 2012

    Senthil,

    The HTTPRedirectHandler is already breaking RFC2616 by it's own admission in the code comments (from the source):

    # Strictly (according to RFC 2616), 301 or 302 in response
    # to a POST MUST NOT cause a redirection without confirmation
    # from the user (of urllib2, in this case). In practice,
    # essentially all clients do redirect in this case, so we
    # do the same.
    # be conciliant with URIs containing a space

    I can definitely understand the issue with changing the default behavior to redirect the POST data. However, an added option which leaves the current behavior as the default shouldn't hurt. I'm submitting a new patch file (urllib2.py.redirect_option.patch), which will do exactly that.

    @merwok
    Copy link
    Member

    merwok commented Mar 3, 2012

    However, an added option which leaves the current behavior as the default shouldn't hurt.
    My opinion too. urllib is sometimes a client, sometimes a library used to build clients, which need a knob to implement their own decisions or possibly ask the user for confirmation.

    A new argument being a new feature, this patch must target 3.3.

    Some comments on the patch:

    + # NOTE: Setting redirect_post_data to True *can* introduce security
    + # issues and is not recommended unless you are sure of where the
    + # POST data is being redirected!
    I would tone down this note, for example:

     # setting redirect_post_data to True can introduce security
     # issues, use with caution
    

    + redirect_post_data = False
    Is an attribute okay or should methods (init, maybe methods that do the requests too) grow a new parameter?

    @merwok merwok added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Mar 3, 2012
    @merwok
    Copy link
    Member

    merwok commented Mar 3, 2012

    Also, new tests and a doc update would be needed, but you may want to wait for Senthil’s approval before doing more work on this.

    @crustymonkey
    Copy link
    Mannequin Author

    crustymonkey mannequin commented Mar 4, 2012

    I have no problem making doc and test changes. I'll probably need a pointer as to where these changes need to be made and submitted to, but like you said, I'll wait until the patch is accepted before doing that.

    @merwok
    Copy link
    Member

    merwok commented Mar 5, 2012

    I have no problem making doc and test changes. I'll probably need a
    pointer as to where these changes need to be made
    Great! I think the right test file is Lib/test/test_urllib2.py (the test files were not renamed in Python 3 when some modules were renamed); there is a test_redirect method that should be the right place. Senthil will confirm.

    Remember to work from a 3.3 checkout (i.e. the default branch of http://hg.python.org/cpython). More help about contributing (like where to find files) is at http://docs.python.org/devguide.

    and submitted to,
    Just as a patch on this report.

    @orsenthil
    Copy link
    Member

    Hi Jay & Éric,

    I understand your points and providing an extra argument seems like an idea that could be useful to circumvent , what you see as a problem.

    The RFC section states that -

    "The action required MAY be carried out by the user agent without interaction with the user if and only if the method used in the second request is GET or HEAD".

    By this, I understand, RFC means, for the POST data, the user is made aware and is conscious of the redirect which is happening and is "permitting" to POST the data to new location.

    The interaction happens like this:

    User: Post to /a
    Browser: Posts to /a and Server says oh /a is /b
    Browser: Hello user! Server says /a is now /b. Shall I post to /b?
    User: Yes, you may.

    This is different from what you are saying, which is like with having an option in the browser settings which will enable following redirect on POST.

    User: Post to /a (and if there is redirect follow that post to the corresponding site).
    Browser: Posts to /a and Server says /a is /b.
    Browser: Posts to /b

    But do you know if any such browser setting exist? No. Browsers for good reasons do not provide such a setting and they prompt user if they want to follow the redirect with POST.

    In a similar way, developers using urllib as library in their applications can obtain the redirected URL and then POST to the redirected URL. That would be equivalent behavior.

    Providing an automatic follow redirect on POST could serious security issue, both for clients/libraries and browser. Even with a word of caution, it has a high chance of being misused. So, I am -1 on this proposal.

    I hope you understand my argument. I had thought about this earlier a for a similar issue and I remember we made the decision to drop the data following the redirected POST. If my argument is not convincing enough, then I think, it would be good idea to bring this bug to discussion on python-dev or web-sig and provide some concrete real world examples. That could bring some use cases for/against this issue and might be helpful.

    Thanks,
    Senthil

    @merwok
    Copy link
    Member

    merwok commented Mar 5, 2012

    "The action required MAY be carried out by the user agent without interaction with the user if and
    only if the method used in the second request is GET or HEAD".

    By this, I understand, RFC means, for the POST data, the user is made aware and is conscious of the
    redirect which is happening and is "permitting" to POST the data to new location.

    My reading too.

    The interaction happens like this:
    User: Post to /a
    Browser: Posts to /a and Server says oh /a is /b
    Browser: Hello user! Server says /a is now /b. Shall I post to /b?
    User: Yes, you may.

    This is different from what you are saying, which is like with having an option in the browser settings
    which will enable following redirect on POST.

    The big difference that I see is that urllib is a library, not a browser. A browser could be implemented on top of urllib, and it is the browser’s responsibility to ask the user if they want POST data to be sent again. The browser then needs to tell urllib to forward POST data: this is the piece that’s missing.

    In a similar way, developers using urllib as library in their applications can obtain the redirected URL
    and then POST to the redirected URL. That would be equivalent behavior.

    Ah, so there is a way to achieve it! If Jay is satisfied by this, we could add an example of using that in the urllib howto, with the appropriate warnings. That would have much less risk than a new parameter.

    @orsenthil
    Copy link
    Member

    On Mon, Mar 05, 2012 at 11:15:38AM +0000, Éric Araujo wrote:

    Ah, so there is a way to achieve it! If Jay is satisfied by this,
    we could add an example of using that in the urllib howto, with the
    appropriate warnings. That would have much less risk than a new
    parameter.

    Yeah. This could be cookbook recipe style example, if it's utility
    value is high.

    @orsenthil
    Copy link
    Member

    I am closing this issue as I feel that the requirement may not be addressed by a change. If there is patch for HowTo, we can tackle that in another issue. Thank you for contribution.

    @merwok
    Copy link
    Member

    merwok commented Mar 16, 2012

    Continued in bpo-14338.

    @merwok merwok removed the invalid label Mar 16, 2012
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants