New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
urllib2 HTTPRedirectHandler not forwarding POST data in redirect #58352
Comments
I've noticed that urllib2's HTTPRedirectHandler does not redirect a POST request with the POST data. If you send a POST request to a server with data, the data is dropped when the new Request is made to the new url. As stated in a comment in the library itself, redirecting a POST request is not strictly RFC compliant, but it's generally supported anyway. The problem here being that our POST data is not also being redirected. I ran into this issue when writing a web api wrapper in Python. I'm submitting a small patch that fixes this issue: --- /usr/lib/python2.7/urllib2.py 2011-10-04 16:07:28.000000000 -0500
+++ urllib2.py 2012-02-27 16:03:36.000000000 -0600
@@ -551,7 +551,11 @@
newheaders = dict((k,v) for k,v in req.headers.items()
if k.lower() not in ("content-length", "content-type")
)
+ data = None
+ if req.has_data():
+ data = req.get_data()
return Request(newurl,
+ data=data,
headers=newheaders,
origin_req_host=req.get_origin_req_host(),
unverifiable=True) |
If I recollect correctly, redirecting a POST request with POST data is not a recommended thing to do and could be a security hazard. That's why RFC prohibits it too. It is intentional that urllib2 is not redirecting carrying along the POST data. Could you show any browsers showing this behavior of redirecting along with POST? |
Senthil, That is a good point about the potential for security issues. What if it was an explicit option in HTTPRedirectHandler since there is a possibility of value in being able to do it. I know my case is probably unusual, but I imagine that others might have run into this too. Something roughly along this line is what I'm thinking: ---------------- class HTTPRedirectHandler(BaseHandler):
redirect_post_data = False
...
...
def redirect_request(self, req, fp, code, msg, headers, newurl):
...
...
data = None
if req.has_data() and self.redirect_post_data:
data = req.get_data()
return Request(newurl,
data=data,
headers=newheaders,
origin_req_host=req.get_origin_req_host(),
unverifiable=True) That would leave the current default behavior as-is, but leave the option to explicitly override it by the user, perhaps with a BIG DISCLAIMER comment about security. |
Could you give the RFC section or URI? I glanced at the page about status codes but did not find a prohibition on POST. |
Here is a section which talks about 3xx redirection http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html 10.3 Redirection 3xx This class of status code indicates that further action needs to be taken by the user agent in order to fulfill the request. The action required MAY be carried out by the user agent without interaction with the user if and only if the method used in the second request is GET or HEAD. |
That’s clear as mud :) If I read correctly, the text means that the user agent is free to follow redirects without asking the user if the request is GET or HEAD, and needs to ask the user if the request is e.g. POST. That’s in line with what Firefox does when you refresh after a POST: It asks you to confirm if you want the data to be re-sent. POST not being idempotent explains the need for this precaution. So, I think a library like urllib should not prevent people from doing something that is allowed. +1 to a new param to transfer the body when following a redirect under a POST. |
Senthil, The HTTPRedirectHandler is already breaking RFC2616 by it's own admission in the code comments (from the source): # Strictly (according to RFC 2616), 301 or 302 in response I can definitely understand the issue with changing the default behavior to redirect the POST data. However, an added option which leaves the current behavior as the default shouldn't hurt. I'm submitting a new patch file (urllib2.py.redirect_option.patch), which will do exactly that. |
A new argument being a new feature, this patch must target 3.3. Some comments on the patch: + # NOTE: Setting redirect_post_data to True *can* introduce security
+ redirect_post_data = False |
Also, new tests and a doc update would be needed, but you may want to wait for Senthil’s approval before doing more work on this. |
I have no problem making doc and test changes. I'll probably need a pointer as to where these changes need to be made and submitted to, but like you said, I'll wait until the patch is accepted before doing that. |
Remember to work from a 3.3 checkout (i.e. the default branch of http://hg.python.org/cpython). More help about contributing (like where to find files) is at http://docs.python.org/devguide.
|
Hi Jay & Éric, I understand your points and providing an extra argument seems like an idea that could be useful to circumvent , what you see as a problem. The RFC section states that - "The action required MAY be carried out by the user agent without interaction with the user if and only if the method used in the second request is GET or HEAD". By this, I understand, RFC means, for the POST data, the user is made aware and is conscious of the redirect which is happening and is "permitting" to POST the data to new location. The interaction happens like this: User: Post to /a This is different from what you are saying, which is like with having an option in the browser settings which will enable following redirect on POST. User: Post to /a (and if there is redirect follow that post to the corresponding site). But do you know if any such browser setting exist? No. Browsers for good reasons do not provide such a setting and they prompt user if they want to follow the redirect with POST. In a similar way, developers using urllib as library in their applications can obtain the redirected URL and then POST to the redirected URL. That would be equivalent behavior. Providing an automatic follow redirect on POST could serious security issue, both for clients/libraries and browser. Even with a word of caution, it has a high chance of being misused. So, I am -1 on this proposal. I hope you understand my argument. I had thought about this earlier a for a similar issue and I remember we made the decision to drop the data following the redirected POST. If my argument is not convincing enough, then I think, it would be good idea to bring this bug to discussion on python-dev or web-sig and provide some concrete real world examples. That could bring some use cases for/against this issue and might be helpful. Thanks, |
My reading too.
The big difference that I see is that urllib is a library, not a browser. A browser could be implemented on top of urllib, and it is the browser’s responsibility to ask the user if they want POST data to be sent again. The browser then needs to tell urllib to forward POST data: this is the piece that’s missing.
Ah, so there is a way to achieve it! If Jay is satisfied by this, we could add an example of using that in the urllib howto, with the appropriate warnings. That would have much less risk than a new parameter. |
On Mon, Mar 05, 2012 at 11:15:38AM +0000, Éric Araujo wrote:
Yeah. This could be cookbook recipe style example, if it's utility |
I am closing this issue as I feel that the requirement may not be addressed by a change. If there is patch for HowTo, we can tackle that in another issue. Thank you for contribution. |
Continued in bpo-14338. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: