Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WSGIProxy needs to quote PATH_INFO and QUERY_STRING on Python 3 #7

Closed
lrowe opened this issue Dec 19, 2014 · 2 comments · Fixed by #8
Closed

WSGIProxy needs to quote PATH_INFO and QUERY_STRING on Python 3 #7

lrowe opened this issue Dec 19, 2014 · 2 comments · Fixed by #8

Comments

@lrowe
Copy link
Contributor

lrowe commented Dec 19, 2014

I'm using WSGIProxy2 as part of WebTest and am seeing errors when using quoted unicode urls. I'll show the stack of what is going on at the point a UnicodeDecodeError is raised:

  test_function():
    testapp = webtest.TestApp(server_url)
    path = '/targets/NR2F1%C3%82-human/'
-> res = testapp.get(path, status=200)

  eggs/WebTest-2.0.16-py3.4.egg/webtest/app.py(321)get()
-> expect_errors=expect_errors)

(Pdb) args
url = '/targets/NR2F1%C3%82-human/'
(Pdb) p req.environ['PATH_INFO']
'/targets/NR2F1Ã\x82-human/'

The path is unquoted by webob.request.environ_from_url(path) which for Python 3 is::

def url_unquote(s):
    return unquote(s.encode('ascii')).decode('latin-1')
  eggs/WebTest-2.0.16-py3.4.egg/webtest/app.py(604)do_request()
-> res = req.get_response(app, catch_exc_info=True)
  eggs/WebOb-1.4-py3.4.egg/webob/request.py(1316)send()
-> application, catch_exc_info=True)
  eggs/WebOb-1.4-py3.4.egg/webob/request.py(1284)call_application()
-> app_iter = application(self.environ, start_response)
  eggs/WebTest-2.0.16-py3.4.egg/webtest/lint.py(198)lint_app()
-> iterator = application(environ, start_response_wrapper)
  eggs/WSGIProxy2-0.4.1-py3.4.egg/wsgiproxy/proxies.py(182)__call__()
-> response = self.process_request(uri, method, new_headers, environ)

(Pdb) pp environ['PATH_INFO']
'/targets/NR2F1Ã\x82-human/'
  eggs/WSGIProxy2-0.4.1-py3.4.egg/wsgiproxy/proxies.py(134)process_request()
-> return self.http(uri, method, environ['wsgi.input'], headers)
  eggs/WSGIProxy2-0.4.1-py3.4.egg/wsgiproxy/proxies.py(79)__call__()
-> conn.request(method, path, body, headers, **self.options)
  lib/python3.4/http/client.py(1090)request()
-> self._send_request(method, url, body, headers)
  lib/python3.4/http/client.py(1118)_send_request()
-> self.putrequest(method, url, **skips)

  lib/python3.4/http/client.py(975)putrequest()
    request = '%s %s %s' % (method, url, self._http_vsn_str)
    # Non-ASCII characters should have been eliminated earlier
-> self._output(request.encode('ascii'))

(Pdb) args
url = '/targets/NR2F1Ã\x82-human/'
(Pdb) p request
'GET /targets/NR2F1Ã\x82-human/ HTTP/1.1'
(Pdb) p request.encode('ascii')
*** UnicodeEncodeError: 'ascii' codec can't encode characters in position 18-19: ordinal not in range(128)

From pyramid.compat for reference:

if PY3: # pragma: no cover
    # see PEP 3333 for why we encode WSGI PATH_INFO to latin-1 before
    # decoding it to utf-8
    def decode_path_info(path):
        return path.encode('latin-1').decode('utf-8')
else:
    def decode_path_info(path):
        return path.decode('utf-8')

if PY3: # pragma: no cover
    # see PEP 3333 for why we decode the path to latin-1 
    from urllib.parse import unquote_to_bytes
    def unquote_bytes_to_wsgi(bytestring):
        return unquote_to_bytes(bytestring).decode('latin-1')
else:
    from urlparse import unquote as unquote_to_bytes
    def unquote_bytes_to_wsgi(bytestring):
        return unquote_to_bytes(bytestring)
(Pdb) from pyramid.compat import unquote_bytes_to_wsgi, decode_path_info
(Pdb) p unquote_bytes_to_wsgi(path)
'/targets/NR2F1Ã\x82-human/'
(Pdb) from webob.compat import url_unquote, url_quote
(Pdb) url_unquote(path)
'/targets/NR2F1Ã\x82-human/'
(Pdb) decode_path_info(url_unquote(path))
'/targets/NR2F1Â-human/'
(Pdb) url_quote(url_unquote(path))
'/targets/NR2F1%C3%83%C2%82-human/'

We'll have to be careful with the requoting. I'll see if I can unpick those parts soon and add to the ticket.

lrowe added a commit to ENCODE-DCC/encoded that referenced this issue Dec 19, 2014
All but one 'not bdd' test now passes. That error is only a test artefact due to gawel/WSGIProxy2#7.
Needed to remove aws as that is blocked by gevent. It can be moved to a separate buildout or virtualenv.
lrowe added a commit to ENCODE-DCC/encoded that referenced this issue Dec 19, 2014
All but one 'not bdd' test now passes. That error is only a test artefact due to gawel/WSGIProxy2#7.
Needed to remove wal-e as that is blocked by gevent. It can be moved to a separate buildout or virtualenv.
lrowe added a commit to ENCODE-DCC/encoded that referenced this issue Dec 19, 2014
All but one test passes. That error is only a test artefact due to gawel/WSGIProxy2#7.
Needed to remove wal-e as that is blocked by gevent. It can be moved to a separate buildout or virtualenv.
@gawel
Copy link
Owner

gawel commented Dec 19, 2014

Also It looks like some server store the original URI. Like gunicorn:

https://github.com/benoitc/gunicorn/blob/f3bb0e1e1d2ae8ba77c44dee20178dff3dde0f9f/gunicorn/http/wsgi.py#L90

lrowe added a commit to lrowe/WSGIProxy2 that referenced this issue Dec 19, 2014
…uoted utf8 characters.

Fixes gawel#7. Requires fix from Pylons/webtest#127 in order for debugapp not to raise encoding errors.
lrowe added a commit to lrowe/WSGIProxy2 that referenced this issue Dec 19, 2014
…cters.

Fixes gawel#7. Requires fix from Pylons/webtest#127 in order for debugapp not to raise encoding errors.
@lrowe
Copy link
Contributor Author

lrowe commented Dec 19, 2014

@gawel Apache/mod_wsgi has something similar with REQUEST_URI. It could make sense to use it if supplied, though I'm more concerned here with fixing TestApp(server_url).get(path_with_quoted_utf8).

@gawel gawel closed this as completed in #8 Dec 20, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants