Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

short reads #5

Open
rbtcollins opened this issue Sep 26, 2014 · 5 comments
Open

short reads #5

rbtcollins opened this issue Sep 26, 2014 · 5 comments

Comments

@rbtcollins
Copy link
Contributor

So one common use with websockets is to exchange JSON docs back and forth. They are self delimiting, but doing byte-at-a-time parsing can be excruciatingly slow.

json.load does this:

return loads(fp.read(),
    cls=cls, object_hook=object_hook,
    parse_float=parse_float, parse_int=parse_int,
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
  • that is it reads the entire stream.

with the current stream definition folk need to do a loop on read(1), repeatedly trying to decode until they get a valid object. If we permitted the normal language read has in the io - 'up to N bytes' - then it would be safe to call read(65536) to read in a decent amount at once, and yet not block.

@Lukasa
Copy link
Contributor

Lukasa commented Sep 26, 2014

Does this fully eliminate the problem?

What is the JSON body is large such that it gets fragmented into many TCP segments, and the read is issued between the arrival of one segment and another? This doesn't really remove the need to confirm that a whole JSON body is present.

There are also problems the other way: if JSON bodies are very small and sent close together you may get multiple frames at once.

Really, what's required is a streaming JSON parser. This should be able to emit JSON tokens as they arrive. That, combined with a good non-blocking read, might do the trick.

@rbtcollins
Copy link
Contributor Author

So I think it depends where we draw the parsing line - since websockets defines a framing layer, we can make it a server problem. I think we'll want short read support anyway, and short read support means folk can use the io buffer management library rather than having to invent something new.

@Lukasa
Copy link
Contributor

Lukasa commented Sep 26, 2014

Fair enough, I can agree with that.

@rbtcollins rbtcollins changed the title Websockets - nonblocking reads short reads Sep 26, 2014
@GrahamDumpleton
Copy link

The WSGI specification says about wsgi.input:

An input stream (file-like object) from which the HTTP request body bytes can be read.

People have potentially come to rely on the 'file-like object' status of it, including that read(n) will only return less than n bytes if the end of the stream has been reached. Changing the definition of it is probably ill advised.

I have investigated and implemented as an experiment the idea in mod_wsgi of a new readchunk() method on wsgi.input which had the partial read characteristics, but which would still block if no data was available in an attempt to come up with a way of having web sockets ride on top of a modified WSGI interface, after having upgraded the connection (remove the HTTP handling which in Apache consists of the HTTP_IN input filter), but came to the conclusion that trying to expand the ways data could be read from wsgi.input was just stupid and didn't provide a good solution.

Rather than trying to create a even more frankenstein version of WSGI, I believe web sockets should be handled through a separate interface altogether. That might be a low level interface where a raw socket is made available and async mechanisms are used, or a high level structured API just for web sockets. Either way, trying to merge functionality in WSGI itself to allow it is likely just to create a mess. If one accepts that, then the question of short reads for wsgi.input is moot as no change would be done to WSGI.

@gvanrossum
Copy link

In asyncio, we have read(n) which may return a partial read whenever it
feels like (but still blocks if no data is available) and readexactly(n),
which reads n bytes or raises an exception (IncompleteReadError, a subclass
of EOFError).

OTOH in Python I/O streams we have read(n) which returns short reads only
near EOF, and read1(n) which blocks only if no data is immediately
available and then gives whatever it can from its buffer.

My personal preference is for the asyncio version, but I understand that
WSGI has its own tradition. But it wouldn't be horrible to add the read1(n)
interface.

On Wed, Oct 15, 2014 at 4:41 AM, Graham Dumpleton notifications@github.com
wrote:

The WSGI specification says about wsgi.input:

An input stream (file-like object) from which the HTTP request body bytes
can be read.

People have potentially come to rely on the 'file-like object' status of
it, including that read(n) will only return less than n bytes if the end of
the stream has been reached. Changing the definition of it is probably ill
advised.

I have investigated and implemented as an experiment the idea in mod_wsgi
of a new readchunk() method on wsgi.input which had the partial read
characteristics, but which would still block if no data was available in an
attempt to come up with a way of having web sockets ride on top of a
modified WSGI interface, after having upgraded the connection (remove the
HTTP handling which in Apache consists of the HTTP_IN input filter), but
came to the conclusion that trying to expand the ways data could be read
from wsgi.input was just stupid and didn't provide a good solution.

Rather than trying to create a even more frankenstein version of WSGI, I
believe web sockets should be handled through a separate interface
altogether. That might be a low level interface where a raw socket is made
available and async mechanisms are used, or a high level structured API
just for web sockets. Either way, trying to merge functionality in WSGI
itself to allow it is likely just to create a mess. If one accepts that,
then the question of short reads for wsgi.input is moot as no change would
be done to WSGI.


Reply to this email directly or view it on GitHub
#5 (comment)
.

--Guido van Rossum (python.org/~guido)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants