Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

streaming a response with iter_lines doesn't work #989

Closed
gdamjan opened this issue Dec 6, 2012 · 23 comments
Closed

streaming a response with iter_lines doesn't work #989

gdamjan opened this issue Dec 6, 2012 · 23 comments

Comments

@gdamjan
Copy link

gdamjan commented Dec 6, 2012

Using the following simple server to simulate a streaming http service, requests 0.14.2 doesn't stream the response. it waits until some amount of data is received or until the end.

The client

import requests

url = 'http://localhost:8000/a/b/c'
req = requests.get(url, prefetch=False)

for line in req.iter_lines():
    print repr(line)

The server

from wsgiref.simple_server import make_server
import time

def simple_app(environ, start_response):

    status = '200 OK'
    headers = [('Content-type', 'text/plain; charset=utf-8')]

    start_response(status, headers)

    for i in range(10):
        yield "line %d\r\n" % i
        time.sleep(1)

httpd = make_server('', 8000, simple_app)
print("Serving on port 8000...")
httpd.serve_forever()
@Lukasa
Copy link
Member

Lukasa commented Dec 6, 2012

Change your line:

for line in req.iter_lines():

to:

for line in req.iter_lines(chunk_size=10):

You'll find this works. =)

@sigmavirus24: IIRC, you worked on the streaming stuff last. Why is the default iter_lines chunk size so large (10240 bytes)? Is there a design decision I don't know about there?

@gdamjan
Copy link
Author

gdamjan commented Dec 6, 2012

confirmed, chunk_size=10 does fix the issue.

I'll leave this issue open, until it's decided what the default should be.

@sigmavirus24
Copy link
Contributor

@Lukasa I think you have me mistaken for someone but if I remember correctly
1024 was considered because that's 1MB. It isn't unreasonable, most web stuff
is much larger than that and it is configurable.

@sigmavirus24
Copy link
Contributor

Also @gdamjan it was already decided if I remember correctly so you can close this if you feel your needs were met.

At this point, with the refactor coming up we could change it to half the current size since we can really announce the breaking changes then. But that still wouldn't fix his problem. To try to phrase this how @kennethreitz will see it, 90% of people's cases will be sufficiently met by this default, and probably 10% will be affected negatively. He likes to ignore that 10% if possible. (Paraphrasing from one of his talks.)

@slingamn
Copy link
Contributor

slingamn commented Dec 6, 2012

Is this related to any of the issues discussed in #844?

@sigmavirus24
Copy link
Contributor

@slingamn yes, the first item on your list I believe is related

@Lukasa
Copy link
Member

Lukasa commented Dec 6, 2012

No, item 4 is the relevant one. =)

@Lukasa
Copy link
Member

Lukasa commented Dec 6, 2012

10 kB is simply ludicrously large. If you load my website homepage, you'll get only slightly more than 10kB of data in total. That includes CSS, images, Javascript, the lot. To use that as the default line size is braindead.

@kennethreitz
Copy link
Contributor

I agree, this seems foolishly large.

@kennethreitz
Copy link
Contributor

iter_content's existance, however, based on the concept of extremely large files.

@Lukasa
Copy link
Member

Lukasa commented Dec 6, 2012

So leave iter_content as is, and just change the default chunk_size on iter_lines? Everybody wins?

@kennethreitz
Copy link
Contributor

+1

@Lukasa
Copy link
Member

Lukasa commented Dec 6, 2012

Got a preference for what default value to use? =)

@sigmavirus24
Copy link
Contributor

Erp, my mistake, thought this was all iter_content related.

@sigmavirus24
Copy link
Contributor

From #844 seems like people liked 1024

@Lukasa
Copy link
Member

Lukasa commented Dec 6, 2012

It's certainly likely to be better than what we've got. It wouldn't have prevented this issue being raised, though.

@gdamjan
Copy link
Author

gdamjan commented Dec 6, 2012

1024 is still too much for ex. when streaming CouchDB's changes feed in continuous mode.

@sigmavirus24
Copy link
Contributor

We can also do 512, 256, 128, 64, 32, 16, or 8. Only the last of which would have prevented this issue. Pick your poison. :P

@Lukasa
Copy link
Member

Lukasa commented Dec 6, 2012

@sigmavirus24: Good point well made. =D

Nevertheless, I'd be inclined to go slightly smaller. 512 is tempting.

@gdamjan
Copy link
Author

gdamjan commented Dec 6, 2012

What are the usage scenarios of iter_lines ?
In my scenarios I don't see lines as long as 512 bytes. At most they are around 100 bytes.

@slingamn
Copy link
Contributor

slingamn commented Dec 7, 2012

Oh, OK, I get the context now.

One thing I've been meaning to get around to: understanding _fileobject.readline() from Python's socket module. This is an API to wrap a socket and read lines from it like you would from a regular file. Here's a Gist of the code for convenience: [https://gist.github.com/4231260]

It looks like it reads a byte at a time in some cases and self._rbufsize bytes (default 8192) at a time in others. It also looks like it's doing something clever. Maybe we need something like this.

@sigmavirus24
Copy link
Contributor

You mean buffer the stream so we can return the default sizes? I was thinking of this but I realized it isn't realistic for every situation.

@Lukasa
Copy link
Member

Lukasa commented Jan 22, 2013

Resolved (finally!) by #1122.

@Lukasa Lukasa closed this as completed Jan 22, 2013
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 9, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants