New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wsgiref.simple_server breaks unicode in URIs #70995
Comments
example code is in attachment example URI is (for example): http://127.0.0.1:8005/тест |
look also #bpo-26717 |
What do you mean by "breaks"? Also, why do you encode your string as utf-8? |
take a look at 'pi:' result, please. - attaching screenshot |
also attaching same print output in console |
My browser encodes url in utf-8. To resolve this bug we need to look in web standards, not in pep. |
Your code should be written as: res = """\
e:
{}
pi:
{}
qs:
{}
""".format(
pprint.pformat(e),
urllib.parse.unquote(e['PATH_INFO'].encode('Latin-1').decode('UTF-8')),
urllib.parse.parse_qs(urllib.parse.unquote(e['QUERY_STRING'].encode('Latin-1').decode('UTF-8')))
) |
There does appear to be something wrong with wsgiref, because with that rewritten code you should for: curl http://127.0.0.1:8000/тест get: pi: and for: curl http://127.0.0.1:8000/?a=тест get: pi: The PATH_INFO case appears to fail though and outputs: pi: Don't think I have missed anything. |
This gets even weirder. Gunicorn behaves same as wsgiref. However, it turns out they both only show the unexpected result if using curl. If you use safari they are both fine. Waitress blows up altogether on it with an exception when you use curl as client, but is okay with Safari and gives what I expect. My mod_wsgi package gives what I expect whether you use curl or Safari. So Apache may be doing some magic in there to allow it to always work. No idea. But obviously mod_wsgi rules as it works regardless. :-) uWSGI doesn't want to compile on MacOS X for me at the moment. That Apache works properly whether use curl or Safari and other WSGI servers don't suggests something is amiss. |
Graham: On my Linux computer, Curl seems to treat the test “URL” as a string of bytes and doesn’t percent encode it. Therefore you may be affected by bpo-26717 which I fixed the other day. But in real life, URLs are meant to only have literal ASCII characters (even if they encode other characters), so this shouldn’t be a big problem. Compare IRI vs URI. Browsers tend to percent-encode using UTF-8. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: