-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
urllib2 requests history + HEAD support #44649
Comments
1)Add history off all sent and received headers/requests >>> fd = urllib2.urlopen("http://www.python.org/")
>>> print fd.history[0].request_line
GET / HTTP/1.1
>>> print fd.history[0].sended
[('Accept-Encoding', 'identity'), ('Host', 'www.python.org'), ('Connection', 'cl
ose'), ('User-Agent', 'Python-urllib/2.6')] 2)Add support for HEAD (and other) requests: >>> fd = urllib2.urlopen("http://www.python.org/",
request_cmd = "HEAD")
>>> print len(fd.read())
0 Please send email here: |
Make diff file with svn and delete previos |
koder_ua, what's the history useful for? I'm not against it... exactly... it just seems like I would never use it. If I did want a history of HTTP activity, I would need to maintain it myself. I wouldn't want it tied to a particular object--these urlopen() objects, typically you open them, read them, and then throw them away. I'm iffy on the API for 2) as well, but I can see the appeal. The thing is... this patch is really, really rough. In one of the tests, an attribute is called "sended_hdrs" in one place and "sended_headers" in another. (Actually it should be called "request_headers". Likewise "recived" and "res" should each be spelled "response".) The whitespace isn't PEP-8. It's also missing a doc patch. |
Patch has tests too, might need updating. :) |
Having a HEAD request for urllib2 might be a good idea. I shall use this But, having a history support in the urllib2 module is not a good idea |
+1 for HEAD |
If you know you want an HEAD request, it means you already know it Aside: s/sended/sent/ Cheers |
Here is a discussion and explanation from the submitter on what is meant |
Seems to be +1 for HEAD, -1 for history, keyword says easy so should be feasable for 3.2. :) |
I have attached a patch to add support for HEAD, PUT and DELETE methods. The code review is available here: http://codereview.appspot.com/1696061. I have started working on another patch that validates that the method is properly set. For instance, it doesn't make sense to have a HEAD or DELETE with post data. The problem is that the interface is so wide open that it is hard to catch all possible user errors. A user could call Request.__init__ correctly, but then set req.method to an invalid method. If there is some interest I'll finish up the patch. |
Can this be somehow implemented as a bugfix patch as well on other versions? |
Only stable (2.7 and 3.1) and development versions (3.2) get bug fixes. |
New features can only go in 3.2. From a quick look, the patch looks ok. |
I decided to take a look at this old, forgotten issue and propose an updated patch. I like the submitter's idea that urllib.Request.__init__() should take a "method" parameter to override the return value of get_method(). I've created and attached a patch (issue1673007_urllib_Request_method_v1.patch) which implements this functionality, adds unit tests, and updates the documentation. |
Attached an updated patch that addresses the comments of Éric in the review and adds an entry to the whatsnew. |
Hi Ezio, I had probably overlooked this one. But It's a very interesting one
|
I made some more comments. |
We have discussed the API a bit on IRC and these are the outcomes:
IMHO r = Request(url, data, method=foo) should behave like class FooRequest(Request):
def get_method(self):
return foo
r = FooRequest(url, data) when foo is not None and class FooRequest(Request): pass
r = FooRequest(url, data) when foo is None, so the answers to the 5 questions would be
This is also what the patch implements. Regarding 3), one might look at Request.method and see None instead of GET/POST, and this might be confusing. We could document that get_method() should be used instead to see what method will be actually used or make Request.method private and avoid the problem. If we do so there won't be a way to change the method after instance creation (but that wasn't even supported before[0], so it's probably not a problem). OTOH, the other args (e.g. data) are available and documented. I'm fine with either documenting that get_method() should be used instead or making Request.method private. Another approach is to move the get_method() logic in the __init__, set self.method to method (if specified) or either POST (if there are data) or GET. Then have get_method() simply return self.method (and possibly deprecate it). [0]: it's actually possible to replace the get_method() method, but if Request.method is public one could set it to a new value or to None to restore the normal behavior (GET or POST depending on data). |
Our discussion stemmed from this point. If you look at the change proposed, Request class is taking a new parameter by name 'method' and it is initialized to None: class Request: def __init__(self, url, data=None, headers={},
- origin_req_host=None, unverifiable=False):
+ origin_req_host=None, unverifiable=False,
+ method=None): But that actually defaults to GET method in terms of HTTP request for often used scenarios where the only required parameter (url) is sent. This happens in the get_method call: def get_method(self):
- if self.data is not None:
+ """Return a string indicating the HTTP request method."""
+ if self.method is not None:
+ return self.method
+ elif self.data is not None:
return "POST"
else:
return "GET" Since, it is understood that the default action of Request(url) is to do a GET, I proposed that we have Request's method parameter default to GET instead of None, so the change would look like: class Request: def __init__(self, url, data=None, headers={},
- origin_req_host=None, unverifiable=False):
+ origin_req_host=None, unverifiable=False,
+ method="GET"): And it is more meaningful when someone is looking at the Request signature. Specifying method=None and implicitly meaning it as "GET" for normal situations was not intuitive to me. (This is not case when we do not pass an explicit method arg). At this point, Ezio's summary of API changes discussed becomes interesting. I read again and it seems to me that, the assumption is get_method is an important method which determines what method should be used and method should be given preference over data. My point is, get_method is an useful, helper function that is helpful is sending the correct method to the http.client code which does the actual task. In the current situation, get_method "determines" based on data parameter should it send a "GET" or a "POST", but if we start using method=arg then, get_method should just return what was initialized by the method arg (defaulting to "GET").
My personal on this is -1 on throwing an error and when data is sent, just do the POST (data overrides method). BTW, this needs to discussed irrespective of point 1). But having method="GET" could give raise to his scenario more often. A person would just send data and forget about changing the method to "POST". Coming to specific questions which Ezio pointed: My take:
If data is passed use POST.
Nope, give data the priority and do POST. (As urllib is currently doing)
Not necessarily, it should be public.
My take - It should be initialized to default (GET), instead of None.
This is an interesting question. get_method essentially becomes less useful or it could serve as an arbiter when data and GET is sent and may be used as reliable way to get the Request's method. It should not be deprecated. |
Senthil, I highly disagree with what you said:
What I tried to respect with the patch, and I think this was also Denver’s intention, is to add this functionality without breaking the current behaviour. That behaviour is that GET is default, and POST is set as default if there is any data. The functionality requires that the request method can be set (and the default behaviour can be overriden) without looking at the data (as explained above). Ideally I would probably like to see the functionality of And again on the Regarding those questions on the implementation, I agree with Ezio, and as I said this is probably the only way to add this functionality without breaking previous usages. And if we break previous usages (or restrict its functionality with weird priority rules based on the data), we better not add this functionality at all. |
Patrick, Lots of valid points. I had not looked at the RFC spec when I mentioned about data over request (GET) method, but was trying to derive the current functionality of module (so that users can have a seamless experience) with additional method="GET" as default. Note, my intention was to be explicit when we give method arg. But yeah, when user has specified the methods (PUT/DELETE etc) and given the data, correct rules should apply on how that method should deal with data. As you pointed out to RFC, I realize RFC clearly points out the data (message-body) should be ignored and method should be given preference, whenever specification of method has nothing to with data. I should take back my argument on giving data as preference even over GET. Now, question arises- Can we in anyway default the method="GET" and maintain compatibility as well consistency with user expectations? At the moment, just by sending data over Request, the method is assumed to be POST. If that is not possible, then the way, current patch does seems to be a good way to acheive the purpose. |
New changeset 0a0aafaa9bf5 by Senthil Kumaran in branch 'default': |
I have committed the patch accomodating the doc review comments which Ezio had mentioned. At the moment, the current way seems to be most backwards compatible one and did not want to delay it. Let's hope we get some feedback on this method arg. Thanks! |
Attached patch changes one occurrence of ugly whitespace, changes “not x == y” to “x != y” and “not x in y” to “x not in y”. Senthil, feel free to apply none, some or all of these minor cleanups. |
Hi Eric, The changes suggested in the patch are good for readability, I |
Doc patch to fix a reST error and tweak a few things. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: