Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wsgiref.simple_server breaks unicode in URIs #70995

Closed
animus mannequin opened this issue Apr 20, 2016 · 11 comments
Closed

wsgiref.simple_server breaks unicode in URIs #70995

animus mannequin opened this issue Apr 20, 2016 · 11 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@animus
Copy link
Mannequin

animus mannequin commented Apr 20, 2016

BPO 26808
Nosy @orsenthil, @vadmium
Superseder
  • bpo-16679: Add advice about non-ASCII wsgiref PATH_INFO
  • Files
  • t.py
  • Screenshot from 2016-04-20 14-26-03.png
  • Screenshot from 2016-04-20 14-28-03.png
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2016-04-20.12:44:42.149>
    created_at = <Date 2016-04-20.11:07:30.231>
    labels = ['type-bug', 'library']
    title = 'wsgiref.simple_server breaks unicode in URIs'
    updated_at = <Date 2016-04-21.04:42:43.146>
    user = 'https://bugs.python.org/animus'

    bugs.python.org fields:

    activity = <Date 2016-04-21.04:42:43.146>
    actor = 'martin.panter'
    assignee = 'none'
    closed = True
    closed_date = <Date 2016-04-20.12:44:42.149>
    closer = 'martin.panter'
    components = ['Library (Lib)']
    creation = <Date 2016-04-20.11:07:30.231>
    creator = 'animus'
    dependencies = []
    files = ['42533', '42534', '42535']
    hgrepos = []
    issue_num = 26808
    keywords = []
    message_count = 11.0
    messages = ['263819', '263820', '263821', '263822', '263823', '263827', '263830', '263870', '263871', '263873', '263877']
    nosy_count = 6.0
    nosy_names = ['orsenthil', 'grahamd', 'SilentGhost', 'martin.panter', 'animus', '\xd0\x90\xd0\xbb\xd0\xb5\xd0\xba\xd1\x81\xd0\xb0\xd0\xbd\xd0\xb4\xd1\x80 \xd0\xad\xd1\x80\xd0\xb8']
    pr_nums = []
    priority = 'normal'
    resolution = 'duplicate'
    stage = None
    status = 'closed'
    superseder = '16679'
    type = 'behavior'
    url = 'https://bugs.python.org/issue26808'
    versions = ['Python 3.5']

    @animus
    Copy link
    Mannequin Author

    animus mannequin commented Apr 20, 2016

    example code is in attachment

    example URI is (for example): http://127.0.0.1:8005/тест

    @animus animus mannequin added extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error labels Apr 20, 2016
    @ghost
    Copy link

    ghost commented Apr 20, 2016

    look also #bpo-26717

    @SilentGhost
    Copy link
    Mannequin

    SilentGhost mannequin commented Apr 20, 2016

    What do you mean by "breaks"? Also, why do you encode your string as utf-8?

    @SilentGhost SilentGhost mannequin added stdlib Python modules in the Lib dir and removed extension-modules C modules in the Modules dir labels Apr 20, 2016
    @animus
    Copy link
    Mannequin Author

    animus mannequin commented Apr 20, 2016

    take a look at 'pi:' result, please. - attaching screenshot

    @animus
    Copy link
    Mannequin Author

    animus mannequin commented Apr 20, 2016

    also attaching same print output in console

    @vadmium
    Copy link
    Member

    vadmium commented Apr 20, 2016

    I think this is already covered in bpo-16679. PEP-3333 says it’s meant to work this way.

    I admit it is very quirky. See also bpo-22264 discussing future enhancements.

    @vadmium vadmium closed this as completed Apr 20, 2016
    @ghost
    Copy link

    ghost commented Apr 20, 2016

    My browser encodes url in utf-8. To resolve this bug we need to look in web standards, not in pep.

    @grahamd
    Copy link
    Mannequin

    grahamd mannequin commented Apr 21, 2016

    Your code should be written as:

        res = """\
    e:
    {}
    pi:
    {}
    qs:
    {}
    """.format(
            pprint.pformat(e),
            urllib.parse.unquote(e['PATH_INFO'].encode('Latin-1').decode('UTF-8')),
            urllib.parse.parse_qs(urllib.parse.unquote(e['QUERY_STRING'].encode('Latin-1').decode('UTF-8')))
            )

    @grahamd
    Copy link
    Mannequin

    grahamd mannequin commented Apr 21, 2016

    There does appear to be something wrong with wsgiref, because with that rewritten code you should for:

    curl http://127.0.0.1:8000/тест

    get:

    pi:
    /тест
    qs:
    {}

    and for:

    curl http://127.0.0.1:8000/?a=тест

    get:

    pi:
    /
    qs:
    {'a': ['тест']}

    The PATH_INFO case appears to fail though and outputs:

    pi:
    /��
    qs:
    {}

    Don't think I have missed anything.

    @grahamd
    Copy link
    Mannequin

    grahamd mannequin commented Apr 21, 2016

    This gets even weirder.

    Gunicorn behaves same as wsgiref.

    However, it turns out they both only show the unexpected result if using curl. If you use safari they are both fine.

    Waitress blows up altogether on it with an exception when you use curl as client, but is okay with Safari and gives what I expect.

    My mod_wsgi package gives what I expect whether you use curl or Safari. So Apache may be doing some magic in there to allow it to always work. No idea. But obviously mod_wsgi rules as it works regardless. :-)

    uWSGI doesn't want to compile on MacOS X for me at the moment.

    That Apache works properly whether use curl or Safari and other WSGI servers don't suggests something is amiss.

    @vadmium
    Copy link
    Member

    vadmium commented Apr 21, 2016

    Graham: On my Linux computer, Curl seems to treat the test “URL” as a string of bytes and doesn’t percent encode it. Therefore you may be affected by bpo-26717 which I fixed the other day. But in real life, URLs are meant to only have literal ASCII characters (even if they encode other characters), so this shouldn’t be a big problem. Compare IRI vs URI. Browsers tend to percent-encode using UTF-8.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants