-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
urllib.parse.unquote raises incorrect errormessage when string parameter is bytes #76679
Comments
urllib.parse.unquote(b'abc%20def')
...
TypeError: a bytes-like object is required, not 'str' |
If you read the traceback the message is "correct" for some definition of correct: the right hand side controls the type of the expression, so it is objecting to trying to look for the string '%' in a bytes object. There are probably ways this could be improved, but I'm not sure it is worth it, since this is just a general behavior of the 'in' operator. |
The message is incorrect for the input to the function, and confusing because the caller did supply a bytes-like object. The exception is from an implementation detail, and if the implementation changes the exception could as well. I suggest the implementation is changed to follow the example of urllib.parse.unquote_to_bytes, which encodes its string parameter to bytes if it receives a str object. Alternatively, a check for required type should be implemented and raise a TypeError with the correct error-message. |
We generally don't do type checking (see discussions of "duck typing"). We generally do just let the implementation detail bubble up. I don't think the encoding suggestion works, since we can't know what encoding the byte string is in, or even if it is a valid one. I wonder if unquote should be polymorphic: accept both bytes and strings and produce the same type of output as its input. I think there are other urllib methods that are, but I haven't checked. |
The unquote function does take an encoding argument, but I am more worried about unquote_to_to_bytes which just assumes utf-8. (But this might be an implementation detail of its intended usage.) I must agree that a polymorphic implementation would be better. Do you think this tracker is the correct place for such a change, or would an enhancement proposal be more suited? |
I think Nick was the last one who touched the byte/string issues in urllib, so I've nosied him. We'll see what he thinks (but this tracker accepts enhancement proposals as long as they are smallish ones). |
As David noted, we updated all the URL parsing functions to be polymorphic back in 3.2: https://docs.python.org/3/library/urllib.parse.html#parsing-ascii-encoded-bytes We left the quoting functions alone, because they already had their own way of dealing with the bytes-vs-str distinction (quote_from_bytes, unquote_to_bytes, etc) that meant the polymorphic approach we adopted for the parsing functions didn't make sense. That said, I think it would be reasonable to enhance unquote() to accept a bytes object, processing it as follows:
|
Added a patch containing the fix, is this the proper way or should I create a pull request? |
Patch for tests. Added test for calling unquote with bytes input and removed Exception raised in original test. |
@stein-k Thanks for the patch and tests. The patches apply cleanly on master branch. The project accepts pull requests and it will be helpful if you can make a PR for this on GitHub to get this merged. Since this is a bug fix I think it would require a News entry. Please find the below links for reference PR workflow : https://devguide.python.org/pullrequest/ Thanks |
I have made the News Entry and created the Pull request as described here: https://devguide.python.org/pullrequest/. Should I update the Status and Resolution of the issue here, or wait for some kind of confirmation? |
The status is changed after the patch is merged. The person that merges will usually change the status of the issue or If he/she forgets, anyone with developer role can update the status too. |
So urllib.parse.unquote() will accept bytes in addition to str starting with 3.9. The unclear exception remains in prior versions. Would someone like to add a better exception for 3.7 and 3.8? (I'm marking this as "newcomer friendly", referring only to improving the exception for 3.7 and 3.8.) |
Can this be closed? (It's intended as a 3.9-only change). |
Thanks for the reminder Irit! No, this should not be closed yet, as there is still the possibility of improving the exception in version 3.8. |
Ah yes, I missed that. I've pushed a PR for 3.8 that does this: >>> urllib.parse.unquote(b'abc%20def')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\User\src\cpython\lib\urllib\parse.py", line 635, in unquote
raise TypeError('Expected str, got bytes')
TypeError: Expected str, got bytes |
Thanks for the report and the PR, Stein! And thanks for the PR for 3.8, Irit! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: