New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connect timeout #1801

Closed
wants to merge 8 commits into
base: master
from

Conversation

Projects
None yet
7 participants
@kevinburke
Contributor

kevinburke commented Dec 16, 2013

Per discussion with @sigmavirus24 in IRC, this PR kills the Timeout class and adds support for connect timeouts via a simple tuple interface:

r = requests.get('https://api.twilio.com', timeout=(3.05, 27))

Sometimes you try to connect to a dead host (this happens to us all the time, because of AWS) and you would like to fail fairly quickly in this case; the request has no chance of succeeding. However, once the connection succeeds, you'd like to give the server a fairly long time to process the request and return a response.

The attached tests and documentation have more information. Hope this is okay!

Kevin Burke added some commits Nov 18, 2013

Kevin Burke
Migrate connect timeout interface to use a tuple
Also:

- add comment to Makefile explaining how to run only one test
- remove classes implementing Timeout class
- add tests for invalid timeouts
@sigmavirus24

This comment has been minimized.

Show comment
Hide comment
@sigmavirus24

sigmavirus24 Dec 16, 2013

Member

I'm generally +1 on the idea and the API. I haven't combed over the diff though and a quick skim makes me think there's more there than needs to be. Also, did I see "TimeoutSauce"? Is there any reason to not simply call it "Timeout"?

Member

sigmavirus24 commented Dec 16, 2013

I'm generally +1 on the idea and the API. I haven't combed over the diff though and a quick skim makes me think there's more there than needs to be. Also, did I see "TimeoutSauce"? Is there any reason to not simply call it "Timeout"?

@Lukasa

This comment has been minimized.

Show comment
Hide comment
@Lukasa

Lukasa Dec 16, 2013

Member

In principle I'm +0.5 on this. It's a nice extension. However, as memory serves, @kennethreitz considered this approach and ended up not taking it. It might be that now that someone has written the code he'll be happy with it, but equally, it might be that he doesn't like the approach.

@sigmavirus24 Yeah, TimeoutSauce is used for the urllib3 Timeout object, because we have our own Timeout object (an exception).

Member

Lukasa commented Dec 16, 2013

In principle I'm +0.5 on this. It's a nice extension. However, as memory serves, @kennethreitz considered this approach and ended up not taking it. It might be that now that someone has written the code he'll be happy with it, but equally, it might be that he doesn't like the approach.

@sigmavirus24 Yeah, TimeoutSauce is used for the urllib3 Timeout object, because we have our own Timeout object (an exception).

@sigmavirus24

This comment has been minimized.

Show comment
Hide comment
@sigmavirus24

sigmavirus24 Dec 16, 2013

Member

@Lukasa As I understood it @kennethreitz was more concerned with the addition (and requirement) of the Timeout class from urllib3. And thanks for clearing up the naming, I still think there has to be a better name for it. (I'm shaving a yak, I know)

Member

sigmavirus24 commented Dec 16, 2013

@Lukasa As I understood it @kennethreitz was more concerned with the addition (and requirement) of the Timeout class from urllib3. And thanks for clearing up the naming, I still think there has to be a better name for it. (I'm shaving a yak, I know)

@kevinburke

This comment has been minimized.

Show comment
Hide comment
@kevinburke

kevinburke Dec 16, 2013

Contributor

@sigmavirus24 That was my understanding from IRC as well.

Contributor

kevinburke commented Dec 16, 2013

@sigmavirus24 That was my understanding from IRC as well.

@sigmavirus24

This comment has been minimized.

Show comment
Hide comment
@sigmavirus24

sigmavirus24 Dec 16, 2013

Member

@kevinburke I discussed it with you on IRC so the likelihood is that you came to that conclusion through me.

Member

sigmavirus24 commented Dec 16, 2013

@kevinburke I discussed it with you on IRC so the likelihood is that you came to that conclusion through me.

@kennethreitz

This comment has been minimized.

Show comment
Hide comment
@kennethreitz

kennethreitz Dec 18, 2013

Member

I'd rather us have our current interface and just have it support the intended use case, personally. You don't have to tweak with a bunch of timeouts when you're using a web browser :)

Member

kennethreitz commented Dec 18, 2013

I'd rather us have our current interface and just have it support the intended use case, personally. You don't have to tweak with a bunch of timeouts when you're using a web browser :)

@kennethreitz

This comment has been minimized.

Show comment
Hide comment
@kennethreitz

kennethreitz Dec 18, 2013

Member

The main reason for this is because we have the tuple interface in another spot (e.g. an expansion of the file upload api) and it's my least favorite part of the API.

Member

kennethreitz commented Dec 18, 2013

The main reason for this is because we have the tuple interface in another spot (e.g. an expansion of the file upload api) and it's my least favorite part of the API.

@kennethreitz

This comment has been minimized.

Show comment
Hide comment
@kennethreitz

kennethreitz Dec 18, 2013

Member

I'm not against the class either, I just need to brew about it for a bit, basically.

Member

kennethreitz commented Dec 18, 2013

I'm not against the class either, I just need to brew about it for a bit, basically.

@kennethreitz

This comment has been minimized.

Show comment
Hide comment
@kennethreitz

kennethreitz Dec 18, 2013

Member

I was under the impression that your concern was for the streaming API, not for more standard uses like GET?

Member

kennethreitz commented Dec 18, 2013

I was under the impression that your concern was for the streaming API, not for more standard uses like GET?

@kevinburke

This comment has been minimized.

Show comment
Hide comment
@kevinburke

kevinburke Dec 18, 2013

Contributor

Actually I care more about the connect timeout than the streaming case,
though we use requests with both at Twilio and would like both changes to
go through.

The use case is that if I want to wait 30 seconds for a request to complete
(for a server to complete processing it and return), that shouldn't also
imply that I need to wait 30 seconds for requests to tell me I can't
establish a connection to the host, which is currently true for remote
hosts that are dead (ones where your machine doesn't immediately send a TCP
Reset, like http://google.com:81 or 10.255.255.1).

There are other possible interfaces - adding a connect_timeout parameter
for example, or similar.

On Wednesday, December 18, 2013, Kenneth Reitz wrote:

I was under the impression that your concern was for the streaming API,
not for more standard uses like GET?


Reply to this email directly or view it on GitHubhttps://github.com/kennethreitz/requests/pull/1801#issuecomment-30853501
.


Kevin Burke | 415-723-4116 | www.twilio.com

Contributor

kevinburke commented Dec 18, 2013

Actually I care more about the connect timeout than the streaming case,
though we use requests with both at Twilio and would like both changes to
go through.

The use case is that if I want to wait 30 seconds for a request to complete
(for a server to complete processing it and return), that shouldn't also
imply that I need to wait 30 seconds for requests to tell me I can't
establish a connection to the host, which is currently true for remote
hosts that are dead (ones where your machine doesn't immediately send a TCP
Reset, like http://google.com:81 or 10.255.255.1).

There are other possible interfaces - adding a connect_timeout parameter
for example, or similar.

On Wednesday, December 18, 2013, Kenneth Reitz wrote:

I was under the impression that your concern was for the streaming API,
not for more standard uses like GET?


Reply to this email directly or view it on GitHubhttps://github.com/kennethreitz/requests/pull/1801#issuecomment-30853501
.


Kevin Burke | 415-723-4116 | www.twilio.com

@kennethreitz

This comment has been minimized.

Show comment
Hide comment
@kennethreitz

kennethreitz Dec 20, 2013

Member

Perhaps we can make this an option on the connection adapter itself. That was the intention for this interface.

Member

kennethreitz commented Dec 20, 2013

Perhaps we can make this an option on the connection adapter itself. That was the intention for this interface.

@Lukasa

This comment has been minimized.

Show comment
Hide comment
@Lukasa

Lukasa Dec 20, 2013

Member

That's no unreasonable, the HTTPAdapter could happily take these as parameters/properties.

Member

Lukasa commented Dec 20, 2013

That's no unreasonable, the HTTPAdapter could happily take these as parameters/properties.

@kevinburke

This comment has been minimized.

Show comment
Hide comment
@kevinburke

kevinburke Dec 28, 2013

Contributor

Yeah the HTTPAdapter seems like a good place for this. To try and avoid more go-rounds, what should the interface look like on the Adapter? so far we've proposed

  • a Timeout class - seems the least popular
  • a tuple (connect, read) which is also not implemented in very many other places in the library
  • separate parameters - a timeout parameter would apply to both, a connect_timeout param to the connect and a read_timeout param to the read.
Contributor

kevinburke commented Dec 28, 2013

Yeah the HTTPAdapter seems like a good place for this. To try and avoid more go-rounds, what should the interface look like on the Adapter? so far we've proposed

  • a Timeout class - seems the least popular
  • a tuple (connect, read) which is also not implemented in very many other places in the library
  • separate parameters - a timeout parameter would apply to both, a connect_timeout param to the connect and a read_timeout param to the read.
@kevinburke

This comment has been minimized.

Show comment
Hide comment
@kevinburke

kevinburke Jan 6, 2014

Contributor

Ping :)

Contributor

kevinburke commented Jan 6, 2014

Ping :)

@Lukasa

This comment has been minimized.

Show comment
Hide comment
@Lukasa

Lukasa Jan 6, 2014

Member

Hmm. I'd rather not do a Timeout class, I'd prefer the optional tuple I think. But hold off until @kennethreitz gets another chance to look at this.

Member

Lukasa commented Jan 6, 2014

Hmm. I'd rather not do a Timeout class, I'd prefer the optional tuple I think. But hold off until @kennethreitz gets another chance to look at this.

@Lukasa Lukasa referenced this pull request Feb 16, 2014

Merged

The timeout is in seconds. #1923

@p2

This comment has been minimized.

Show comment
Hide comment
@p2

p2 Apr 4, 2014

I found this because my Flask app currently tries to connect to a dead host for which I've set a timeout of 5 seconds but it takes forever to actually time out. What is the status on this one?

p2 commented Apr 4, 2014

I found this because my Flask app currently tries to connect to a dead host for which I've set a timeout of 5 seconds but it takes forever to actually time out. What is the status on this one?

@Lukasa

This comment has been minimized.

Show comment
Hide comment
@Lukasa

Lukasa Apr 5, 2014

Member

Right now we do apply the timeout to connections, we just don't have it configurable. Are you using stream=True?

Member

Lukasa commented Apr 5, 2014

Right now we do apply the timeout to connections, we just don't have it configurable. Are you using stream=True?

@p2

This comment has been minimized.

Show comment
Hide comment
@p2

p2 Apr 5, 2014

Hadn't been using stream=True, I guess I should use it if only to get the timeout? Just did a quick test with the following script (the host in the test, http://pillbox.nlm.nih.gov, is still down), and with stream=True it does timeout after 5 seconds, without it runs anywhere from 20 to 120 seconds—not what I would expect.

import requests

url = 'http://pillbox.nlm.nih.gov/assets/large/abc.jpg'
requests.get(url, timeout=5)

Using requests 2.2.1 with Python 3.3.3

p2 commented Apr 5, 2014

Hadn't been using stream=True, I guess I should use it if only to get the timeout? Just did a quick test with the following script (the host in the test, http://pillbox.nlm.nih.gov, is still down), and with stream=True it does timeout after 5 seconds, without it runs anywhere from 20 to 120 seconds—not what I would expect.

import requests

url = 'http://pillbox.nlm.nih.gov/assets/large/abc.jpg'
requests.get(url, timeout=5)

Using requests 2.2.1 with Python 3.3.3

@Lukasa

This comment has been minimized.

Show comment
Hide comment
@Lukasa

Lukasa Apr 5, 2014

Member

That's weird, it works fine on Python 2.7. Seems like a Python 3 bug, because I can reproduce your problem in 3.4. @kevinburke, are you aware of any timeout bugs in urllib3?

Member

Lukasa commented Apr 5, 2014

That's weird, it works fine on Python 2.7. Seems like a Python 3 bug, because I can reproduce your problem in 3.4. @kevinburke, are you aware of any timeout bugs in urllib3?

@kevinburke

This comment has been minimized.

Show comment
Hide comment
@kevinburke

kevinburke Apr 5, 2014

Contributor

I believe connection timeouts would be retried 3 times as they represent a
failure to connect to the server, but not at my computer at the moment.

On Saturday, April 5, 2014, Cory Benfield notifications@github.com wrote:

That's weird, it works fine on Python 2.7. Seems like a Python 3 bug,
because I can reproduce your problem in 3.4. @kevinburkehttps://github.com/kevinburke,
are you aware of any timeout bugs in urllib3?

Reply to this email directly or view it on GitHubhttps://github.com/kennethreitz/requests/pull/1801#issuecomment-39640149
.

Kevin Burke | Twilio
phone: 925.271.7005 | kev.inburke.com

Contributor

kevinburke commented Apr 5, 2014

I believe connection timeouts would be retried 3 times as they represent a
failure to connect to the server, but not at my computer at the moment.

On Saturday, April 5, 2014, Cory Benfield notifications@github.com wrote:

That's weird, it works fine on Python 2.7. Seems like a Python 3 bug,
because I can reproduce your problem in 3.4. @kevinburkehttps://github.com/kevinburke,
are you aware of any timeout bugs in urllib3?

Reply to this email directly or view it on GitHubhttps://github.com/kennethreitz/requests/pull/1801#issuecomment-39640149
.

Kevin Burke | Twilio
phone: 925.271.7005 | kev.inburke.com

@Lukasa

This comment has been minimized.

Show comment
Hide comment
@Lukasa

Lukasa Apr 5, 2014

Member

Weird, I'm not seeing that happen in Python 2.7 then. We'll need to investigate this inconsistency.

Member

Lukasa commented Apr 5, 2014

Weird, I'm not seeing that happen in Python 2.7 then. We'll need to investigate this inconsistency.

@kirelagin

This comment has been minimized.

Show comment
Hide comment
@kirelagin

kirelagin May 10, 2014

Hm, we definitely need a sane way of configuring both timeouts, and I think that the tuple-approach is nice. requests is for humans and humans don't really like all that extra classes and additional keyword arguments mess.

By the way, speaking about hosts that are down, I just did a test with

import requests

requests.get('http://google.com:81/', timeout=5)

and I get 35 seconds both on Python 2.7 and 3.3 (requests 2.2.1). That's not what I would expect from timeout=5… And with timeout=1 I get 7 seconds. I mean, we really need a sane interface…

kirelagin commented May 10, 2014

Hm, we definitely need a sane way of configuring both timeouts, and I think that the tuple-approach is nice. requests is for humans and humans don't really like all that extra classes and additional keyword arguments mess.

By the way, speaking about hosts that are down, I just did a test with

import requests

requests.get('http://google.com:81/', timeout=5)

and I get 35 seconds both on Python 2.7 and 3.3 (requests 2.2.1). That's not what I would expect from timeout=5… And with timeout=1 I get 7 seconds. I mean, we really need a sane interface…

@Lukasa

This comment has been minimized.

Show comment
Hide comment
@Lukasa

Lukasa May 10, 2014

Member

@kirelagin I see the same behaviour as you, but I believe this to be a socket-level problem. Dumping Wireshark shows that my OS X box makes five independent attempts to connect. Each of those connection attempts only retransmits for five seconds.

I suspect this behaviour comes down to the fact that httplib uses socket.create_connection, not socket.connect(). Python's socket module documentation has this to say (emphasis mine):

This is a higher-level function than socket.connect(): if host is a non-numeric hostname, it will try to resolve it for both AF_INET and AF_INET6, and then try to connect to all possible addresses in turn until a connection succeeds.

Closer examination of Wireshark trace shows that we are definitely hitting different addresses: five of them.

If we wanted to 'fix' this behaviour (more on those scare quotes in a minute) it would be incredibly complicated: we'd end up needing to either circumvent httplib's connection logic or use signals or some form of interrupt-processing to return control to us after a given timeout.

More generally, I don't know what 'fix' would mean here. This behaviour is naively surprising (you said we'd time out after 5 seconds but it took 30!), but makes sense once you understand what's happening. I believe that this is an 'expert friendly' interface (thanks to Nick Coghlan for the term), so I'm prepared to believe that we should change it. With that said, if we do change it, how do we give the 'I want to wait 5 seconds for each possible target' behaviour to expert users?

Member

Lukasa commented May 10, 2014

@kirelagin I see the same behaviour as you, but I believe this to be a socket-level problem. Dumping Wireshark shows that my OS X box makes five independent attempts to connect. Each of those connection attempts only retransmits for five seconds.

I suspect this behaviour comes down to the fact that httplib uses socket.create_connection, not socket.connect(). Python's socket module documentation has this to say (emphasis mine):

This is a higher-level function than socket.connect(): if host is a non-numeric hostname, it will try to resolve it for both AF_INET and AF_INET6, and then try to connect to all possible addresses in turn until a connection succeeds.

Closer examination of Wireshark trace shows that we are definitely hitting different addresses: five of them.

If we wanted to 'fix' this behaviour (more on those scare quotes in a minute) it would be incredibly complicated: we'd end up needing to either circumvent httplib's connection logic or use signals or some form of interrupt-processing to return control to us after a given timeout.

More generally, I don't know what 'fix' would mean here. This behaviour is naively surprising (you said we'd time out after 5 seconds but it took 30!), but makes sense once you understand what's happening. I believe that this is an 'expert friendly' interface (thanks to Nick Coghlan for the term), so I'm prepared to believe that we should change it. With that said, if we do change it, how do we give the 'I want to wait 5 seconds for each possible target' behaviour to expert users?

@kevinburke

This comment has been minimized.

Show comment
Hide comment
@kevinburke

kevinburke May 10, 2014

Contributor

There's a retries branch of urllib3 that would let you specify exactly this.

On Saturday, May 10, 2014, Cory Benfield notifications@github.com wrote:

@kirelagin https://github.com/kirelagin I see the same behaviour as
you, but I believe this to be a socket-level problem. Dumping Wireshark
shows that my OS X box makes five independent attempts to connect. Each of
those connection attempts only retransmits for five seconds.

I suspect this behaviour comes down to the fact that httplib uses
socket.create_connection, not socket.connect(). Python's socket module
documentation has this to sayhttps://docs.python.org/2/library/socket.html#socket.create_connection(emphasis mine):

This is a higher-level function than socket.connect(): if host is a
non-numeric hostname, it will try to resolve it for both AF_INET and
AF_INET6, and _then try to connect to all possible addresses in turn_until a connection succeeds.

Closer examination of Wireshark trace shows that we are definitely hitting
different addresses: five of them.

If we wanted to 'fix' this behaviour (more on those scare quotes in a
minute) it would be incredibly complicated: we'd end up needing to either
circumvent httplib's connection logic or use signals or some form of
interrupt-processing to return control to us after a given timeout.

More generally, I don't know what 'fix' would mean here. This behaviour is
naively surprising (you said we'd time out after 5 seconds but it took
30!), but makes sense once you understand what's happening. I believe that
this is an 'expert friendly' interface (thanks to Nick Coghlan for the
term), so I'm prepared to believe that we should change it. With that said,
if we do change it, how do we give the 'I want to wait 5 seconds for each
possible target' behaviour to expert users?


Reply to this email directly or view it on GitHubhttps://github.com/kennethreitz/requests/pull/1801#issuecomment-42735878
.

Kevin Burke | Twilio
phone: 925.271.7005 | kev.inburke.com

Contributor

kevinburke commented May 10, 2014

There's a retries branch of urllib3 that would let you specify exactly this.

On Saturday, May 10, 2014, Cory Benfield notifications@github.com wrote:

@kirelagin https://github.com/kirelagin I see the same behaviour as
you, but I believe this to be a socket-level problem. Dumping Wireshark
shows that my OS X box makes five independent attempts to connect. Each of
those connection attempts only retransmits for five seconds.

I suspect this behaviour comes down to the fact that httplib uses
socket.create_connection, not socket.connect(). Python's socket module
documentation has this to sayhttps://docs.python.org/2/library/socket.html#socket.create_connection(emphasis mine):

This is a higher-level function than socket.connect(): if host is a
non-numeric hostname, it will try to resolve it for both AF_INET and
AF_INET6, and _then try to connect to all possible addresses in turn_until a connection succeeds.

Closer examination of Wireshark trace shows that we are definitely hitting
different addresses: five of them.

If we wanted to 'fix' this behaviour (more on those scare quotes in a
minute) it would be incredibly complicated: we'd end up needing to either
circumvent httplib's connection logic or use signals or some form of
interrupt-processing to return control to us after a given timeout.

More generally, I don't know what 'fix' would mean here. This behaviour is
naively surprising (you said we'd time out after 5 seconds but it took
30!), but makes sense once you understand what's happening. I believe that
this is an 'expert friendly' interface (thanks to Nick Coghlan for the
term), so I'm prepared to believe that we should change it. With that said,
if we do change it, how do we give the 'I want to wait 5 seconds for each
possible target' behaviour to expert users?


Reply to this email directly or view it on GitHubhttps://github.com/kennethreitz/requests/pull/1801#issuecomment-42735878
.

Kevin Burke | Twilio
phone: 925.271.7005 | kev.inburke.com

@Lukasa

This comment has been minimized.

Show comment
Hide comment
@Lukasa

Lukasa May 10, 2014

Member

Uh, is there? Where? I can't see it...

Member

Lukasa commented May 10, 2014

Uh, is there? Where? I can't see it...

@kirelagin

This comment has been minimized.

Show comment
Hide comment
@kirelagin

kirelagin May 10, 2014

Ah, yeah, that makes sense.

five independent attempts

different addresses: five of them.

Wait, google.com has six A records (plus one AAAA, that's why I see 35=5*7 seconds, I guess).

Anyway, I just checked Firefox and it is trying exactly one IPv6 and exactly one IPv4 address. I believe, that multiple DNS records are mostly used for load balancing, not fault-tolerance, so attempting only the first address by default makes most sense. Having an option to control this is useful, of course.

kirelagin commented May 10, 2014

Ah, yeah, that makes sense.

five independent attempts

different addresses: five of them.

Wait, google.com has six A records (plus one AAAA, that's why I see 35=5*7 seconds, I guess).

Anyway, I just checked Firefox and it is trying exactly one IPv6 and exactly one IPv4 address. I believe, that multiple DNS records are mostly used for load balancing, not fault-tolerance, so attempting only the first address by default makes most sense. Having an option to control this is useful, of course.

@p2

This comment has been minimized.

Show comment
Hide comment
@p2

p2 May 10, 2014

Is there need to be able to specify both timeouts independently? When I specify the timeout, I'm thinking of it as "I don't want this line of code to run longer than x seconds", I don't care which part of the connection takes how long.

It seems to me this would be a true "human" interpretation and could be implemented without having to rely on urllib3 by an internal timer that kills the request if it hasn't returned within the timeout.

p2 commented May 10, 2014

Is there need to be able to specify both timeouts independently? When I specify the timeout, I'm thinking of it as "I don't want this line of code to run longer than x seconds", I don't care which part of the connection takes how long.

It seems to me this would be a true "human" interpretation and could be implemented without having to rely on urllib3 by an internal timer that kills the request if it hasn't returned within the timeout.

@kevinburke

This comment has been minimized.

Show comment
Hide comment
@kevinburke

kevinburke May 10, 2014

Contributor

@Lukasa urllib3/urllib3#326 ; though now that I read it more carefully, if the OS itself is trying each DNS record in turn then there's not much that can be done. That pull request lets you specify the number of times you would like to retry a connection failure, whether a timeout or an error.

Contributor

kevinburke commented May 10, 2014

@Lukasa urllib3/urllib3#326 ; though now that I read it more carefully, if the OS itself is trying each DNS record in turn then there's not much that can be done. That pull request lets you specify the number of times you would like to retry a connection failure, whether a timeout or an error.

@kevinburke

This comment has been minimized.

Show comment
Hide comment
@kevinburke

kevinburke May 10, 2014

Contributor

@p2 Sadly computing the wall clock time for an HTTP request remains incredibly difficult. Mostly, our tools for determining the amount of time used for system calls like DNS resolution, establishing a socket, and sending data (and then passing this information to the necessary places, and subtracting these from a total amount of time) remains difficult.

My suggestion would be to run your HTTP request in a separate thread, then use a timer to cancel the thread if it takes longer than a stated amount of time to return a value. gevent can be used for this: http://www.gevent.org/gevent.html#timeouts

Sorry I can't be more helpful :(

Contributor

kevinburke commented May 10, 2014

@p2 Sadly computing the wall clock time for an HTTP request remains incredibly difficult. Mostly, our tools for determining the amount of time used for system calls like DNS resolution, establishing a socket, and sending data (and then passing this information to the necessary places, and subtracting these from a total amount of time) remains difficult.

My suggestion would be to run your HTTP request in a separate thread, then use a timer to cancel the thread if it takes longer than a stated amount of time to return a value. gevent can be used for this: http://www.gevent.org/gevent.html#timeouts

Sorry I can't be more helpful :(

@p2

This comment has been minimized.

Show comment
Hide comment
@p2

p2 May 10, 2014

@kevinburke That is kind of what I do now, I was just wondering if this would make sense as a default approach for requests as well. I personally don't have a need to specify individual timeouts, but that assumption may be too naïve.

p2 commented May 10, 2014

@kevinburke That is kind of what I do now, I was just wondering if this would make sense as a default approach for requests as well. I personally don't have a need to specify individual timeouts, but that assumption may be too naïve.

@kirelagin

This comment has been minimized.

Show comment
Hide comment
@kirelagin

kirelagin May 10, 2014

@p2 I agree that “human” interpretation is to have a total timeout. And, by the way, that's how requests works right now. But there, still might be possibilities when you want a more fine-grained control.

kirelagin commented May 10, 2014

@p2 I agree that “human” interpretation is to have a total timeout. And, by the way, that's how requests works right now. But there, still might be possibilities when you want a more fine-grained control.

@kirelagin

This comment has been minimized.

Show comment
Hide comment
@kirelagin

kirelagin May 10, 2014

Also, as a human when I'm telling a line of code “go, try to connect to Google with this timeout” I'm not thinking about multiple DNS A-records. I'm thinking of Google as a single entity. So there are, naturally, two sane options:

  1. Do not attempt multiple IPs. If some library code does this, I consider this code broken. If some OS does this, I consider the OS poor.
  2. Do whatever library/OS wants, but have timeout guard total execution time of all the queries. That's totally weird, but somewhat reasonable and definitely more natural than what's happening now…

kirelagin commented May 10, 2014

Also, as a human when I'm telling a line of code “go, try to connect to Google with this timeout” I'm not thinking about multiple DNS A-records. I'm thinking of Google as a single entity. So there are, naturally, two sane options:

  1. Do not attempt multiple IPs. If some library code does this, I consider this code broken. If some OS does this, I consider the OS poor.
  2. Do whatever library/OS wants, but have timeout guard total execution time of all the queries. That's totally weird, but somewhat reasonable and definitely more natural than what's happening now…
@Lukasa

This comment has been minimized.

Show comment
Hide comment
@Lukasa

Lukasa May 11, 2014

Member

Ok, a lot of conversation happened here, let me deal with it in turn.

Anyway, I just checked Firefox and it is trying exactly one IPv6 and exactly one IPv4 address. I believe, that multiple DNS records are mostly used for load balancing, not fault-tolerance, so attempting only the first address by default makes most sense.

You checked Firefox against actual google.com then, not on an incorrect port. Browsers will also fall back to the other addresses if the first doesn't respond. This makes sense. Having multiple A records means "this host is available at these addresses". If I can't contact that host at one of those addresses, it's nonsensical to say "welp, clearly the host is down" when I know several other addresses I might be able to contact them at.

This feature of 'multiple addresses' is widely used for both balancing load and fault tolerance. In fact, if you really want to balance load then DNS SRV is the mechanism to use, not A/AAAA, as it provides better control over how the load is spread.

Is there need to be able to specify both timeouts independently? When I specify the timeout, I'm thinking of it as "I don't want this line of code to run longer than x seconds", I don't care which part of the connection takes how long.

The short answer is 'yes', because of the stream keyword argument. If you've set stream=True and use iter_content() or iter_lines(), it's useful to be able to set a timeout for how long those calls can block.

It seems to me this would be a true "human" interpretation and could be implemented without having to rely on urllib3 by an internal timer that kills the request if it hasn't returned within the timeout.

As @kevinburke points out, this isn't as easy as it seems. More importantly, it also leaves us exposed to implementation details. 'Until the request returns' is not a well-defined notion. What does it mean for the request to return? Do I have to download the whole body? Just the headers? Just the status line? Whatever we choose is going to be utterly arbitrary.

Also, as a human when I'm telling a line of code “go, try to connect to Google with this timeout” I'm not thinking about multiple DNS A-records. I'm thinking of Google as a single entity.

Agreed.

  1. Do not attempt multiple IPs. If some library code does this, I consider this code broken. If some OS does this, I consider the OS poor.

Woah, now you go off the rails. If you are thinking of Google as a single entity, then you would expect us to connect to it if it's up. If one time in seven we fail to connect, even though you always connect fine in your browser, you're going to assume requests is bugged as hell.

If a host is up, we must be able to connect you to it.


The ideal fix, from my position, would be to take over the logic used in socket.create_connection(). This allows us to have fine-grained control over timeouts. Unfortunately, it also complicates the timeout logic further, as you'd now have per-host connection attempt timeouts, total connection attempt timeout, and read timeout. That's beginning to become hugely complicated and to expose people to the complexities of TCP and IP in a way that I'm not delighted about.

Member

Lukasa commented May 11, 2014

Ok, a lot of conversation happened here, let me deal with it in turn.

Anyway, I just checked Firefox and it is trying exactly one IPv6 and exactly one IPv4 address. I believe, that multiple DNS records are mostly used for load balancing, not fault-tolerance, so attempting only the first address by default makes most sense.

You checked Firefox against actual google.com then, not on an incorrect port. Browsers will also fall back to the other addresses if the first doesn't respond. This makes sense. Having multiple A records means "this host is available at these addresses". If I can't contact that host at one of those addresses, it's nonsensical to say "welp, clearly the host is down" when I know several other addresses I might be able to contact them at.

This feature of 'multiple addresses' is widely used for both balancing load and fault tolerance. In fact, if you really want to balance load then DNS SRV is the mechanism to use, not A/AAAA, as it provides better control over how the load is spread.

Is there need to be able to specify both timeouts independently? When I specify the timeout, I'm thinking of it as "I don't want this line of code to run longer than x seconds", I don't care which part of the connection takes how long.

The short answer is 'yes', because of the stream keyword argument. If you've set stream=True and use iter_content() or iter_lines(), it's useful to be able to set a timeout for how long those calls can block.

It seems to me this would be a true "human" interpretation and could be implemented without having to rely on urllib3 by an internal timer that kills the request if it hasn't returned within the timeout.

As @kevinburke points out, this isn't as easy as it seems. More importantly, it also leaves us exposed to implementation details. 'Until the request returns' is not a well-defined notion. What does it mean for the request to return? Do I have to download the whole body? Just the headers? Just the status line? Whatever we choose is going to be utterly arbitrary.

Also, as a human when I'm telling a line of code “go, try to connect to Google with this timeout” I'm not thinking about multiple DNS A-records. I'm thinking of Google as a single entity.

Agreed.

  1. Do not attempt multiple IPs. If some library code does this, I consider this code broken. If some OS does this, I consider the OS poor.

Woah, now you go off the rails. If you are thinking of Google as a single entity, then you would expect us to connect to it if it's up. If one time in seven we fail to connect, even though you always connect fine in your browser, you're going to assume requests is bugged as hell.

If a host is up, we must be able to connect you to it.


The ideal fix, from my position, would be to take over the logic used in socket.create_connection(). This allows us to have fine-grained control over timeouts. Unfortunately, it also complicates the timeout logic further, as you'd now have per-host connection attempt timeouts, total connection attempt timeout, and read timeout. That's beginning to become hugely complicated and to expose people to the complexities of TCP and IP in a way that I'm not delighted about.

@kennethreitz

This comment has been minimized.

Show comment
Hide comment
@kennethreitz

kennethreitz May 12, 2014

Member

Plans have been made. Expect something soon ;)

Member

kennethreitz commented May 12, 2014

Plans have been made. Expect something soon ;)

ralphbean added a commit to fedora-infra/supybot-fedora that referenced this pull request Jul 19, 2014

Add a new command to list GH pull requests.
The command looks like this:

```
threebean │ .pulls
   zodbot │ threebean: (pulls <username/repo>) -- List the pending pull requests on a github repo.
threebean │ .pulls fedora-infra
   zodbot │ threebean: Must be GitHub repo of the format <username/repo>
threebean │ .pulls fedora-infra/tahrir
   zodbot │ threebean: @hroncok's "Reenable gravatar fallback" fedora-infra/tahrir#209
threebean │ .pulls fedora-watwat/tahrir
   zodbot │ threebean: No such GitHub repository 'fedora-watwat/tahrir'
threebean │ .pulls kennethreitz/requests
   zodbot │ threebean: @dpursehouse's "Linkify Github usernames in authors list" requests/requests#2140
   zodbot │ threebean: @jamielennox's "Apply environment features to prepared requests" requests/requests#2129
   zodbot │ threebean: @alekstorm's "Create Request.links_multi and utils function for properly parsing Link headers" requests/requests#1980
   zodbot │ threebean: @kevinburke's "Connect timeout" requests/requests#1801
```

It requires that we add an ``github.oauth_token`` to the zodbot config which we
can get from https://github.com/settings/tokens/new
@kennethreitz

This comment has been minimized.

Show comment
Hide comment
@kennethreitz

kennethreitz Aug 22, 2014

Member

@kevinburke can you rebase?

Member

kennethreitz commented Aug 22, 2014

@kevinburke can you rebase?

@kevinburke

This comment has been minimized.

Show comment
Hide comment
@kevinburke

kevinburke Aug 23, 2014

Contributor

Closing this one, I re-added it in #2176

Contributor

kevinburke commented Aug 23, 2014

Closing this one, I re-added it in #2176

@kevinburke kevinburke closed this Aug 23, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment