New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
urllib.parse.parse_qsl does not handle unicode data properly #74668
Comments
After decoding percentage encoded
As seen in the partial traceback above, it breaks things when trying to parse unicode encode query string values. |
Would you be able to include an example for recreating this? Looking at the code, it uses the ascii encoding for bytes (which can only contain ASCII literal characters) and should not be using that encoding for strings. Thanks! |
I have recently stumbled upon this bug, and I can present the example and a solution I've used.
This happens in the parse_qsl function because _coerce_result is a synonym of _encode_result and is called with default parameter encoding='ascii'. As far as I understand, it should be called with the encoding parameter of the parse_qsl function:
I am not sure whether I should commit this to the repo and create a pull request, as described in the devguide. |
Can confirm that this issue exists in Python 3.8.10 and that cyrkov's solution works. (I manually re-implemented |
I've also run into this issue on Python 3.11.5 in code that calls |
Both decoding and encoding can fail or lose information. To avoid this we should either use the lossless encoding or error handler ('latin1' or 'surrogateescape') for both directions, or omit decoding and encoding at all. The latter is usually more efficient, and can even be simpler, like in this case. #115771 supports arbitrary raw and percent-encoded bytes. |
urllib.parse functions parse_qs() and parse_qsl() now support bytes arguments containing raw and percent-encoded non-ASCII data.
…honGH-115771) urllib.parse functions parse_qs() and parse_qsl() now support bytes arguments containing raw and percent-encoded non-ASCII data. (cherry picked from commit bdba8ef) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…honGH-115771) urllib.parse functions parse_qs() and parse_qsl() now support bytes arguments containing raw and percent-encoded non-ASCII data. (cherry picked from commit bdba8ef) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
* Restore support of None and other false values (fix regression introduced in pythongh-74668). * Raise TypeError for non-zero integers and non-empty sequences.
…H-116801) * Restore support of None and other false values. * Raise TypeError for non-zero integers and non-empty sequences. The regressions were introduced in pythongh-74668 (bdba8ef). (cherry picked from commit 1069a46) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…H-116801) * Restore support of None and other false values. * Raise TypeError for non-zero integers and non-empty sequences. The regressions were introduced in pythongh-74668 (bdba8ef). (cherry picked from commit 1069a46) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…H-116801) * Restore support of None and other false values. * Raise TypeError for non-zero integers and non-empty sequences. The regressions were introduced in pythongh-74668 (bdba8ef).
…honGH-115771) urllib.parse functions parse_qs() and parse_qsl() now support bytes arguments containing raw and percent-encoded non-ASCII data.
…H-116801) * Restore support of None and other false values. * Raise TypeError for non-zero integers and non-empty sequences. The regressions were introduced in pythongh-74668 (bdba8ef).
…honGH-115771) urllib.parse functions parse_qs() and parse_qsl() now support bytes arguments containing raw and percent-encoded non-ASCII data.
…H-116801) * Restore support of None and other false values. * Raise TypeError for non-zero integers and non-empty sequences. The regressions were introduced in pythongh-74668 (bdba8ef).
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
Linked PRs
The text was updated successfully, but these errors were encountered: