Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WSGI applications are required to return bytes. #2

Closed
GrahamDumpleton opened this issue Aug 30, 2016 · 11 comments
Closed

WSGI applications are required to return bytes. #2

GrahamDumpleton opened this issue Aug 30, 2016 · 11 comments

Comments

@GrahamDumpleton
Copy link

This code looks suspect:

A WSGI application iterable is supposed to return byte strings for response body. If it returns Unicode strings it is an error.

A WSGI server is not meant to convert Unicode strings to byte strings automatically.

Thus concerned about what the code:

            for chunk in response:
                # "The start_response callable must not actually transmit
                # the response headers. Instead, it must store them for the
                # server or gateway to transmit only after the first
                # iteration of the application return value that yields
                # a NON-EMPTY string, or upon the application's first
                # invocation of the write() callable." (PEP 333)
                if chunk:
                    if isinstance(chunk, text_type):
                        chunk = chunk.encode('ISO-8859-1')
                    self.write(chunk)

is doing.

@fgallaire
Copy link
Owner

Hello Graham,

Thanks for your investigations. Howerer, WSGIserver is a fork of CherryPy's WSGI Server which is widely used in production.

@GrahamDumpleton
Copy link
Author

That doesn't excuse the fact that it is not compliant with the WSGI specification if it is converting Unicode strings in response byte to bytes. Someone could write code that works on this WSGI server and will then blow up when used on other WSGI compliant WSGI servers. The point of the WSGI specification is to ensure portability for peoples WSGI applications.

@fgallaire
Copy link
Owner

fgallaire commented Aug 30, 2016

The result of using a unicode object where a string object is required, is undefined.

It's undefined, so this could be either an error or something else. It seems compliant.

@GrahamDumpleton
Copy link
Author

GrahamDumpleton commented Aug 30, 2016

Earlier in PEP 3333 it says:

When called by the server, the application object must return an iterable yielding zero or more bytestrings. This can be accomplished in a variety of ways, such as by returning a list of bytestrings, or by the application being a generator function that yields bytestrings, or by the application being a class whose instances are iterable. Regardless of how it is accomplished, the application object must always return an iterable yielding zero or more byte strings.

So it MUST yield byte strings.

Then in Unicode issues section it says:

For values referred to in this specification as "bytestrings" (i.e., values read from wsgi.input , passed to write() or yielded by the application), the value must be of type bytes under Python 3, and str in earlier versions of Python.

So in neither Python 2 or 3 is it allowed to be Unicode string, which would be unicode type under Python 2 and str under Python 3.

@fgallaire
Copy link
Owner

Yes the application MUST, and is faulty not to do so, but it's unrelated to the server behaviour.

In this case, the server behaviour is undefined, so it can do what it wants and be compliant.

@GrahamDumpleton
Copy link
Author

It is a poor server implementation that doesn't hold WSGI applications to the WSGI specification and tell them when they are violating it. It is a disservice to users to not tell them there is a problem and silently allow it as it will only come back and bite them later. For example, you are converting using Latin-1. So it will work if only ASCII characters were used in the Unicode string, so hello world will work, but would then blow up if required UTF-8. A user could go along blissfully unaware until their production application starts failing in ways they don't understand.

@fgallaire
Copy link
Owner

@GrahamDumpleton if you see something else you're welcome :-)

@GrahamDumpleton
Copy link
Author

If you are going to add a check, it should be if not isinstance(chunk, bytes) then raise an exception.

@fgallaire
Copy link
Owner

it's already here 7352501 !

@GrahamDumpleton
Copy link
Author

Tell me what is going to happen if someone incorrectly does:

return [1,2,3,4]

You are checking whether the items are text type (presumably that maps to unicode on Python 2 and str on Python 3) and only raise an exception in that very specific case.

There are still lots of other types besides bytes which someone could accidentally return and that check will not catch them and will pass it straight through.

That is why the exception should be for anything which IS NOT bytes.

fgallaire added a commit that referenced this issue Sep 3, 2016
@fgallaire
Copy link
Owner

Correct. Fixed. Thanks a lot @GrahamDumpleton

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants