New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
short reads #5
Comments
Does this fully eliminate the problem? What is the JSON body is large such that it gets fragmented into many TCP segments, and the read is issued between the arrival of one segment and another? This doesn't really remove the need to confirm that a whole JSON body is present. There are also problems the other way: if JSON bodies are very small and sent close together you may get multiple frames at once. Really, what's required is a streaming JSON parser. This should be able to emit JSON tokens as they arrive. That, combined with a good non-blocking read, might do the trick. |
So I think it depends where we draw the parsing line - since websockets defines a framing layer, we can make it a server problem. I think we'll want short read support anyway, and short read support means folk can use the io buffer management library rather than having to invent something new. |
Fair enough, I can agree with that. |
The WSGI specification says about wsgi.input:
People have potentially come to rely on the 'file-like object' status of it, including that read(n) will only return less than n bytes if the end of the stream has been reached. Changing the definition of it is probably ill advised. I have investigated and implemented as an experiment the idea in mod_wsgi of a new readchunk() method on wsgi.input which had the partial read characteristics, but which would still block if no data was available in an attempt to come up with a way of having web sockets ride on top of a modified WSGI interface, after having upgraded the connection (remove the HTTP handling which in Apache consists of the HTTP_IN input filter), but came to the conclusion that trying to expand the ways data could be read from wsgi.input was just stupid and didn't provide a good solution. Rather than trying to create a even more frankenstein version of WSGI, I believe web sockets should be handled through a separate interface altogether. That might be a low level interface where a raw socket is made available and async mechanisms are used, or a high level structured API just for web sockets. Either way, trying to merge functionality in WSGI itself to allow it is likely just to create a mess. If one accepts that, then the question of short reads for wsgi.input is moot as no change would be done to WSGI. |
In asyncio, we have read(n) which may return a partial read whenever it OTOH in Python I/O streams we have read(n) which returns short reads only My personal preference is for the asyncio version, but I understand that On Wed, Oct 15, 2014 at 4:41 AM, Graham Dumpleton notifications@github.com
--Guido van Rossum (python.org/~guido) |
So one common use with websockets is to exchange JSON docs back and forth. They are self delimiting, but doing byte-at-a-time parsing can be excruciatingly slow.
json.load does this:
with the current stream definition folk need to do a loop on read(1), repeatedly trying to decode until they get a valid object. If we permitted the normal language read has in the io - 'up to N bytes' - then it would be safe to call read(65536) to read in a decent amount at once, and yet not block.
The text was updated successfully, but these errors were encountered: