urllib2 HTTPRedirectHandler not forwarding POST data in redirect #58352

crustymonkey · 2012-02-27T22:44:09Z

BPO	14144
Nosy	@orsenthil, @merwok
Files	urllib2.py.patch: Patch to fix POST data not being redirected urllib2.py.redirect_option.patch: Patch to make the redirecting of POST data an option

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2012-03-08.17:44:29.703>
created_at = <Date 2012-02-27.22:44:09.043>
labels = ['extension-modules', 'type-feature']
title = 'urllib2 HTTPRedirectHandler not forwarding POST data in redirect'
updated_at = <Date 2012-03-16.23:22:45.554>
user = 'https://bugs.python.org/crustymonkey'

bugs.python.org fields:

activity = <Date 2012-03-16.23:22:45.554>
actor = 'eric.araujo'
assignee = 'none'
closed = True
closed_date = <Date 2012-03-08.17:44:29.703>
closer = 'orsenthil'
components = ['Extension Modules']
creation = <Date 2012-02-27.22:44:09.043>
creator = 'crustymonkey'
dependencies = []
files = ['24665', '24711']
hgrepos = []
issue_num = 14144
keywords = ['patch']
message_count = 16.0
messages = ['154516', '154517', '154518', '154726', '154730', '154732', '154788', '154830', '154831', '154914', '154922', '154940', '154942', '154997', '155168', '156103']
nosy_count = 3.0
nosy_names = ['orsenthil', 'eric.araujo', 'crustymonkey']
pr_nums = []
priority = 'normal'
resolution = 'works for me'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue14144'
versions = ['Python 3.3']

crustymonkey · 2012-02-27T22:44:09Z

I've noticed that urllib2's HTTPRedirectHandler does not redirect a POST request with the POST data.

If you send a POST request to a server with data, the data is dropped when the new Request is made to the new url. As stated in a comment in the library itself, redirecting a POST request is not strictly RFC compliant, but it's generally supported anyway. The problem here being that our POST data is not also being redirected. I ran into this issue when writing a web api wrapper in Python.

I'm submitting a small patch that fixes this issue:

--- /usr/lib/python2.7/urllib2.py	2011-10-04 16:07:28.000000000 -0500
+++ urllib2.py	2012-02-27 16:03:36.000000000 -0600
@@ -551,7 +551,11 @@
             newheaders = dict((k,v) for k,v in req.headers.items()
                               if k.lower() not in ("content-length", "content-type")
                              )
+            data = None
+            if req.has_data():
+                data = req.get_data()
             return Request(newurl,
+                           data=data,
                            headers=newheaders,
                            origin_req_host=req.get_origin_req_host(),
                            unverifiable=True)

orsenthil · 2012-02-27T22:47:32Z

If I recollect correctly, redirecting a POST request with POST data is not a recommended thing to do and could be a security hazard. That's why RFC prohibits it too. It is intentional that urllib2 is not redirecting carrying along the POST data. Could you show any browsers showing this behavior of redirecting along with POST?

crustymonkey · 2012-02-27T23:07:49Z

Senthil,

That is a good point about the potential for security issues. What if it was an explicit option in HTTPRedirectHandler since there is a possibility of value in being able to do it. I know my case is probably unusual, but I imagine that others might have run into this too. Something roughly along this line is what I'm thinking:

----------------

class HTTPRedirectHandler(BaseHandler):
    redirect_post_data = False
    ...
    ...
    def redirect_request(self, req, fp, code, msg, headers, newurl):
        ...
        ...
        data = None
        if req.has_data() and self.redirect_post_data:
            data = req.get_data()
        return Request(newurl,
                       data=data,
                       headers=newheaders,
                       origin_req_host=req.get_origin_req_host(),
                       unverifiable=True)

That would leave the current default behavior as-is, but leave the option to explicitly override it by the user, perhaps with a BIG DISCLAIMER comment about security.

merwok · 2012-03-02T04:36:35Z

Could you give the RFC section or URI? I glanced at the page about status codes but did not find a prohibition on POST.

orsenthil · 2012-03-02T05:15:46Z

Here is a section which talks about 3xx redirection

http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html

10.3 Redirection 3xx

This class of status code indicates that further action needs to be taken by the user agent in order to fulfill the request. The action required MAY be carried out by the user agent without interaction with the user if and only if the method used in the second request is GET or HEAD.

merwok · 2012-03-02T05:27:33Z

That’s clear as mud :) If I read correctly, the text means that the user agent is free to follow redirects without asking the user if the request is GET or HEAD, and needs to ask the user if the request is e.g. POST. That’s in line with what Firefox does when you refresh after a POST: It asks you to confirm if you want the data to be re-sent. POST not being idempotent explains the need for this precaution.

So, I think a library like urllib should not prevent people from doing something that is allowed. +1 to a new param to transfer the body when following a redirect under a POST.

crustymonkey · 2012-03-02T19:18:54Z

Senthil,

The HTTPRedirectHandler is already breaking RFC2616 by it's own admission in the code comments (from the source):

# Strictly (according to RFC 2616), 301 or 302 in response
# to a POST MUST NOT cause a redirection without confirmation
# from the user (of urllib2, in this case). In practice,
# essentially all clients do redirect in this case, so we
# do the same.
# be conciliant with URIs containing a space

I can definitely understand the issue with changing the default behavior to redirect the POST data. However, an added option which leaves the current behavior as the default shouldn't hurt. I'm submitting a new patch file (urllib2.py.redirect_option.patch), which will do exactly that.

merwok · 2012-03-03T11:44:53Z

However, an added option which leaves the current behavior as the default shouldn't hurt.
My opinion too. urllib is sometimes a client, sometimes a library used to build clients, which need a knob to implement their own decisions or possibly ask the user for confirmation.

A new argument being a new feature, this patch must target 3.3.

Some comments on the patch:

+ # NOTE: Setting redirect_post_data to True *can* introduce security
+ # issues and is not recommended unless you are sure of where the
+ # POST data is being redirected!
I would tone down this note, for example:

 # setting redirect_post_data to True can introduce security
 # issues, use with caution

+ redirect_post_data = False
Is an attribute okay or should methods (init, maybe methods that do the requests too) grow a new parameter?

merwok · 2012-03-03T11:45:48Z

Also, new tests and a doc update would be needed, but you may want to wait for Senthil’s approval before doing more work on this.

crustymonkey · 2012-03-04T23:40:59Z

I have no problem making doc and test changes. I'll probably need a pointer as to where these changes need to be made and submitted to, but like you said, I'll wait until the patch is accepted before doing that.

merwok · 2012-03-05T05:17:34Z

I have no problem making doc and test changes. I'll probably need a
pointer as to where these changes need to be made
Great! I think the right test file is Lib/test/test_urllib2.py (the test files were not renamed in Python 3 when some modules were renamed); there is a test_redirect method that should be the right place. Senthil will confirm.

Remember to work from a 3.3 checkout (i.e. the default branch of http://hg.python.org/cpython). More help about contributing (like where to find files) is at http://docs.python.org/devguide.

and submitted to,
Just as a patch on this report.

orsenthil · 2012-03-05T10:10:53Z

Hi Jay & Éric,

I understand your points and providing an extra argument seems like an idea that could be useful to circumvent , what you see as a problem.

The RFC section states that -

"The action required MAY be carried out by the user agent without interaction with the user if and only if the method used in the second request is GET or HEAD".

By this, I understand, RFC means, for the POST data, the user is made aware and is conscious of the redirect which is happening and is "permitting" to POST the data to new location.

The interaction happens like this:

User: Post to /a
Browser: Posts to /a and Server says oh /a is /b
Browser: Hello user! Server says /a is now /b. Shall I post to /b?
User: Yes, you may.

This is different from what you are saying, which is like with having an option in the browser settings which will enable following redirect on POST.

User: Post to /a (and if there is redirect follow that post to the corresponding site).
Browser: Posts to /a and Server says /a is /b.
Browser: Posts to /b

But do you know if any such browser setting exist? No. Browsers for good reasons do not provide such a setting and they prompt user if they want to follow the redirect with POST.

In a similar way, developers using urllib as library in their applications can obtain the redirected URL and then POST to the redirected URL. That would be equivalent behavior.

Providing an automatic follow redirect on POST could serious security issue, both for clients/libraries and browser. Even with a word of caution, it has a high chance of being misused. So, I am -1 on this proposal.

I hope you understand my argument. I had thought about this earlier a for a similar issue and I remember we made the decision to drop the data following the redirected POST. If my argument is not convincing enough, then I think, it would be good idea to bring this bug to discussion on python-dev or web-sig and provide some concrete real world examples. That could bring some use cases for/against this issue and might be helpful.

Thanks,
Senthil

merwok · 2012-03-05T11:15:38Z

"The action required MAY be carried out by the user agent without interaction with the user if and
only if the method used in the second request is GET or HEAD".

By this, I understand, RFC means, for the POST data, the user is made aware and is conscious of the
redirect which is happening and is "permitting" to POST the data to new location.

My reading too.

The interaction happens like this:
User: Post to /a
Browser: Posts to /a and Server says oh /a is /b
Browser: Hello user! Server says /a is now /b. Shall I post to /b?
User: Yes, you may.

This is different from what you are saying, which is like with having an option in the browser settings
which will enable following redirect on POST.

The big difference that I see is that urllib is a library, not a browser. A browser could be implemented on top of urllib, and it is the browser’s responsibility to ask the user if they want POST data to be sent again. The browser then needs to tell urllib to forward POST data: this is the piece that’s missing.

In a similar way, developers using urllib as library in their applications can obtain the redirected URL
and then POST to the redirected URL. That would be equivalent behavior.

Ah, so there is a way to achieve it! If Jay is satisfied by this, we could add an example of using that in the urllib howto, with the appropriate warnings. That would have much less risk than a new parameter.

orsenthil · 2012-03-06T01:35:06Z

On Mon, Mar 05, 2012 at 11:15:38AM +0000, Éric Araujo wrote:

Ah, so there is a way to achieve it! If Jay is satisfied by this,
we could add an example of using that in the urllib howto, with the
appropriate warnings. That would have much less risk than a new
parameter.

Yeah. This could be cookbook recipe style example, if it's utility
value is high.

orsenthil · 2012-03-08T17:44:30Z

I am closing this issue as I feel that the requirement may not be addressed by a change. If there is patch for HowTo, we can tackle that in another issue. Thank you for contribution.

merwok · 2012-03-16T23:22:45Z

Continued in bpo-14338.

crustymonkey mannequin added type-bug An unexpected behavior, bug, or error extension-modules C modules in the Modules dir labels Feb 27, 2012

merwok changed the title ~~urllib2 HTTPRedirectHandler not handling POST data in redirect~~ urllib2 HTTPRedirectHandler not forwarding POST data in redirect Mar 2, 2012

merwok added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Mar 3, 2012

orsenthil added the invalid label Mar 8, 2012

orsenthil closed this as completed Mar 8, 2012

merwok removed the invalid label Mar 16, 2012

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

urllib2 HTTPRedirectHandler not forwarding POST data in redirect #58352

urllib2 HTTPRedirectHandler not forwarding POST data in redirect #58352

crustymonkey mannequin commented Feb 27, 2012

crustymonkey mannequin commented Feb 27, 2012

orsenthil commented Feb 27, 2012

crustymonkey mannequin commented Feb 27, 2012

merwok commented Mar 2, 2012

orsenthil commented Mar 2, 2012

merwok commented Mar 2, 2012

crustymonkey mannequin commented Mar 2, 2012

merwok commented Mar 3, 2012

merwok commented Mar 3, 2012

crustymonkey mannequin commented Mar 4, 2012

merwok commented Mar 5, 2012

orsenthil commented Mar 5, 2012

merwok commented Mar 5, 2012

orsenthil commented Mar 6, 2012

orsenthil commented Mar 8, 2012

merwok commented Mar 16, 2012

urllib2 HTTPRedirectHandler not forwarding POST data in redirect #58352

urllib2 HTTPRedirectHandler not forwarding POST data in redirect #58352

Comments

crustymonkey mannequin commented Feb 27, 2012

crustymonkey mannequin commented Feb 27, 2012

orsenthil commented Feb 27, 2012

crustymonkey mannequin commented Feb 27, 2012

merwok commented Mar 2, 2012

orsenthil commented Mar 2, 2012

merwok commented Mar 2, 2012

crustymonkey mannequin commented Mar 2, 2012

merwok commented Mar 3, 2012

merwok commented Mar 3, 2012

crustymonkey mannequin commented Mar 4, 2012

merwok commented Mar 5, 2012

orsenthil commented Mar 5, 2012

merwok commented Mar 5, 2012

orsenthil commented Mar 6, 2012

orsenthil commented Mar 8, 2012

merwok commented Mar 16, 2012