Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose curl options in use_course() and friends? #988

Closed
jennybc opened this issue Jan 27, 2020 · 12 comments · Fixed by #1083
Closed

Expose curl options in use_course() and friends? #988

jennybc opened this issue Jan 27, 2020 · 12 comments · Fixed by #1083
Labels
course 👩‍🏫 feature

Comments

@jennybc
Copy link
Member

@jennybc jennybc commented Jan 27, 2020

Lesson learned from rstudio::conf 2020.

If the internet is functional but slow, use_course() can timeout. It inherits the default timeout from curl.

But I could possibly afford some way to set a longer timeout in an individual call or to do so globally via an env var or option.

Timeout can be set on the handle, i.e. h <- new_handle(timeout = 10) right around here:

h <- curl::new_handle(noprogress = FALSE, progressfunction = progress_fun)

@jennybc
Copy link
Member Author

@jennybc jennybc commented Feb 14, 2020

@gaborcsardi alerted me to some relevant subtleties. It's too crude to talk about timeout, because it applies to the whole download. What we really want to know is whether we are continuing to make progress (don't timeout) vs. being dead in the water (do timeout).

Some relevant code from async:

https://github.com/r-lib/async/blob/6494de0ba468cf25e5625fc5fde2236f90d5016f/R/http.R#L187-L190

Inlining:

modifyList(
    options,
    list(
      timeout = as.integer(getopt("timeout") %||% 3600),
      connecttimeout = as.integer(getopt("connecttimeout") %||% 30),
      low_speed_time = as.integer(getopt("low_speed_time") %||% 30),
      low_speed_limit = as.integer(getopt("low_speed_limit") %||% 100)
    )
  )

So here the default is 30s for the connection, 1 hour for the whole download (but this could be even more), and it will stop if it cannot get at least 100 bytes in 30 seconds.

More about such config:

If we bump up against the above, (lib)curl will error with something like:

Operation too slow. Less than 10 bytes/sec transferred the last 2 seconds

@gaborcsardi
Copy link
Contributor

@gaborcsardi gaborcsardi commented Feb 14, 2020

Yeah, the above code has some boilerplate specific to the async package, the gist is to set the timeout, connecttimeout, low_speed_time and low_speed_limit curl options with curl::handle_setopt to some reasonable values. E.g. the above is

  • 3600s for the whole download. This could probably be much more, it is very annoying if a big download stops after one hour...
  • 30s for the connection. I think this is mostly right, maybe it could be a bit more.
  • we require less than 100 bytes in 300 to stop.

@hadley hadley added course 👩‍🏫 feature labels Mar 14, 2020
@hadley
Copy link
Member

@hadley hadley commented Mar 14, 2020

I'd be in favour of increasing the timeout, but not exposing it as an option.

@jennybc
Copy link
Member Author

@jennybc jennybc commented Mar 26, 2020

More about the relevant curl options:


From https://curl.haxx.se/libcurl/c/CURLOPT_TIMEOUT.html

  • the maximum time in seconds that you allow the libcurl transfer operation to take
  • Since this option puts a hard limit on how long time a request is allowed to take, it has limited use in dynamic use cases with varying transfer times. ...You are advised to explore CURLOPT_LOW_SPEED_LIMIT, CURLOPT_LOW_SPEED_TIME ... to implement your own timeout logic
  • Current curl package sets this to 13 seconds (see below), which can be too short.

From https://curl.haxx.se/libcurl/c/CURLOPT_CONNECTTIMEOUT.html

  • the maximum time in seconds that you allow the connection phase to the server to take
  • Current curl package sets this to 78 seconds (see below). I have no evidence that we need to mess with this.

From https://curl.haxx.se/libcurl/c/CURLOPT_LOW_SPEED_TIME.html

  • the time in number seconds that the transfer speed should be below the CURLOPT_LOW_SPEED_LIMIT for the library to consider it too slow and abort
  • Current curl package sets this to 20 seconds (see below), which may be too short.

From https://curl.haxx.se/libcurl/c/CURLOPT_LOW_SPEED_LIMIT.html

  • the average transfer speed in bytes per second that the transfer should be below during CURLOPT_LOW_SPEED_TIME seconds for libcurl to consider it to be too slow and abort.
  • Current curl package sets this to 19 bytes per second (see below).

Defaults that we inherit from the curl package:

packageVersion("curl")
#> [1] '4.3'

interesting_options <- c(
  curl::curl_options("timeout"),
  curl::curl_options("speed")
)

keepers <- c("timeout", "connecttimeout", "low_speed_limit", "low_speed_time")
interesting_options[keepers]
#>         timeout  connecttimeout low_speed_limit  low_speed_time 
#>              13              78              19              20

Created on 2020-03-26 by the reprex package (v0.3.0.9001)

@gaborcsardi
Copy link
Contributor

@gaborcsardi gaborcsardi commented Mar 26, 2020

Defaults that we inherit from the curl package:

I believe these are just the code numbers of the options (as for an enum). Not the actual default values. curl only sets timeout by default AFAIR.

@jennybc
Copy link
Member Author

@jennybc jennybc commented Mar 26, 2020

OK maybe here is where one default is set:

https://github.com/jeroen/curl/blob/d28579517f7af7646ab21300c65ee4e121398ab1/src/handle.c#L163

→ connecttimeout of 10 seconds

@jennybc
Copy link
Member Author

@jennybc jennybc commented Mar 26, 2020

@jeroen Is it true that there are no defaults set by the curl package for the overall timeout and the 2 low_speed_* options? I'm finding this hard to reconcile with what we saw at rstudio::conf, where we definitely experience widespread timeouts and I don't think it was from connecttimeout.

Also: is it true that the only way to control these is by setting them on the handle?

@jeroen
Copy link
Member

@jeroen jeroen commented Mar 26, 2020

Yes that sounds correct. The only defaults that the curl R package is setting are these:

https://github.com/jeroen/curl/blob/master/src/handle.c#L124-L210

Also: is it true that the only way to control these is by setting them on the handle?

Yes that is correct.

@jennybc
Copy link
Member Author

@jennybc jennybc commented Mar 27, 2020

So I guess we were hitting the connect timeout at rstudio::conf. I think that's the main (only?) curl option we need to provide a way to increase during a WiFi fiasco.

@jeroen
Copy link
Member

@jeroen jeroen commented Mar 27, 2020

The connect timeout is set to a default of 10 sec. I don't think raising that will help. If the host has not responded after 10 sec, likely it is unreachable.

If the wifi is connection is flaky because the network is overloaded, connections get dropped. This is of course beyond our control :)

@jennybc
Copy link
Member Author

@jennybc jennybc commented Mar 27, 2020

I am more optimistic that extending the timeout will help.

At rstudio::conf we had overloaded internet -- but it technically worked. So use_course() would fail because of the connect timeout. But we could download the exact same ZIP file "manually" with the browser and it didn't even take that long. But then we had all the usual problems with people not knowing where the ZIP was stored, not knowing how to unpack it, or where it got unpacked, etc.

I see above that @gaborcsardi uses 30 seconds for the connect timeout in async.

@gaborcsardi
Copy link
Contributor

@gaborcsardi gaborcsardi commented Mar 27, 2020

Yeah, I think at the conf the bottleneck was at the local router that only allowed a finite number of connection at any time. But once you got a connection, you were good. So it was worth having a long connection timeout.

But having that as a default can be annoying if the web server or app is down, because it will not timeout for a long time.

I think for use_course() it would make sense to increase the limit to at least a minute, and maybe also retry it a couple of times, and each time inform the user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
course 👩‍🏫 feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants