WSGI applications are required to return bytes. #2

GrahamDumpleton · 2016-08-30T00:34:35Z

This code looks suspect:

https://github.com/fgallaire/wsgiserver/blob/master/wsgiserver.py#L2345

A WSGI application iterable is supposed to return byte strings for response body. If it returns Unicode strings it is an error.

A WSGI server is not meant to convert Unicode strings to byte strings automatically.

Thus concerned about what the code:

            for chunk in response:
                # "The start_response callable must not actually transmit
                # the response headers. Instead, it must store them for the
                # server or gateway to transmit only after the first
                # iteration of the application return value that yields
                # a NON-EMPTY string, or upon the application's first
                # invocation of the write() callable." (PEP 333)
                if chunk:
                    if isinstance(chunk, text_type):
                        chunk = chunk.encode('ISO-8859-1')
                    self.write(chunk)

is doing.

The text was updated successfully, but these errors were encountered:

fgallaire · 2016-08-30T00:42:39Z

Hello Graham,

Thanks for your investigations. Howerer, WSGIserver is a fork of CherryPy's WSGI Server which is widely used in production.

GrahamDumpleton · 2016-08-30T01:49:37Z

That doesn't excuse the fact that it is not compliant with the WSGI specification if it is converting Unicode strings in response byte to bytes. Someone could write code that works on this WSGI server and will then blow up when used on other WSGI compliant WSGI servers. The point of the WSGI specification is to ensure portability for peoples WSGI applications.

fgallaire · 2016-08-30T02:19:18Z

The result of using a unicode object where a string object is required, is undefined.

It's undefined, so this could be either an error or something else. It seems compliant.

GrahamDumpleton · 2016-08-30T02:58:40Z

Earlier in PEP 3333 it says:

When called by the server, the application object must return an iterable yielding zero or more bytestrings. This can be accomplished in a variety of ways, such as by returning a list of bytestrings, or by the application being a generator function that yields bytestrings, or by the application being a class whose instances are iterable. Regardless of how it is accomplished, the application object must always return an iterable yielding zero or more byte strings.

So it MUST yield byte strings.

Then in Unicode issues section it says:

For values referred to in this specification as "bytestrings" (i.e., values read from wsgi.input , passed to write() or yielded by the application), the value must be of type bytes under Python 3, and str in earlier versions of Python.

So in neither Python 2 or 3 is it allowed to be Unicode string, which would be unicode type under Python 2 and str under Python 3.

fgallaire · 2016-08-30T03:42:22Z

Yes the application MUST, and is faulty not to do so, but it's unrelated to the server behaviour.

In this case, the server behaviour is undefined, so it can do what it wants and be compliant.

GrahamDumpleton · 2016-08-30T03:49:12Z

It is a poor server implementation that doesn't hold WSGI applications to the WSGI specification and tell them when they are violating it. It is a disservice to users to not tell them there is a problem and silently allow it as it will only come back and bite them later. For example, you are converting using Latin-1. So it will work if only ASCII characters were used in the Unicode string, so hello world will work, but would then blow up if required UTF-8. A user could go along blissfully unaware until their production application starts failing in ways they don't understand.

fgallaire · 2016-09-02T12:59:13Z

@GrahamDumpleton if you see something else you're welcome :-)

GrahamDumpleton · 2016-09-03T00:32:46Z

If you are going to add a check, it should be if not isinstance(chunk, bytes) then raise an exception.

fgallaire · 2016-09-03T03:07:29Z

it's already here 7352501 !

GrahamDumpleton · 2016-09-03T03:48:20Z

Tell me what is going to happen if someone incorrectly does:

return [1,2,3,4]

You are checking whether the items are text type (presumably that maps to unicode on Python 2 and str on Python 3) and only raise an exception in that very specific case.

There are still lots of other types besides bytes which someone could accidentally return and that check will not catch them and will pass it straight through.

That is why the exception should be for anything which IS NOT bytes.

fgallaire · 2016-09-03T06:13:47Z

Correct. Fixed. Thanks a lot @GrahamDumpleton

fgallaire mentioned this issue Aug 30, 2016

WSGI applications are required to return bytes. cherrypy/cherrypy#1489

Closed

fgallaire closed this as completed in 7352501 Sep 2, 2016

fgallaire added a commit that referenced this issue Sep 3, 2016

Finish fix #2

955fa77

Dfupa mentioned this issue Oct 1, 2020

Fixed string encoding for WebPlugin webapp.py for Python3 etetoolkit/ete#478

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WSGI applications are required to return bytes. #2

WSGI applications are required to return bytes. #2

GrahamDumpleton commented Aug 30, 2016

fgallaire commented Aug 30, 2016

GrahamDumpleton commented Aug 30, 2016

fgallaire commented Aug 30, 2016 •

edited

Loading

GrahamDumpleton commented Aug 30, 2016 •

edited

Loading

fgallaire commented Aug 30, 2016

GrahamDumpleton commented Aug 30, 2016

fgallaire commented Sep 2, 2016

GrahamDumpleton commented Sep 3, 2016

fgallaire commented Sep 3, 2016

GrahamDumpleton commented Sep 3, 2016

fgallaire commented Sep 3, 2016

WSGI applications are required to return bytes. #2

WSGI applications are required to return bytes. #2

Comments

GrahamDumpleton commented Aug 30, 2016

fgallaire commented Aug 30, 2016

GrahamDumpleton commented Aug 30, 2016

fgallaire commented Aug 30, 2016 • edited Loading

GrahamDumpleton commented Aug 30, 2016 • edited Loading

fgallaire commented Aug 30, 2016

GrahamDumpleton commented Aug 30, 2016

fgallaire commented Sep 2, 2016

GrahamDumpleton commented Sep 3, 2016

fgallaire commented Sep 3, 2016

GrahamDumpleton commented Sep 3, 2016

fgallaire commented Sep 3, 2016

fgallaire commented Aug 30, 2016 •

edited

Loading

GrahamDumpleton commented Aug 30, 2016 •

edited

Loading