Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

download package on windows behind proxy does not work (easily) #45

Closed
cderv opened this issue Nov 25, 2016 · 12 comments

Comments

Projects
None yet
2 participants
@cderv
Copy link
Contributor

commented Nov 25, 2016

Hi,

Trying to solve this issue with devtools::install_github (r-lib/devtools#1403), I found that remotes package is probably the next thing for some functions in devtools. So I tried it.

However, I encountered an issue as I am on windows and behind a proxy. Trying to install a github package throws an error

remotes::install_github("hadley/lubridate")
#> Error in utils::download.file(url, path, method = download_method(), quiet = quiet,  :  
#> cannot download all files

and even crash R session if proxy is not configured correctly.

This issue is clearly proxy-related.

below is what I have done to make it work. See at the end for Suggestions

About proxy configuration

Proxy configuration is not the easy thing but I managed to deal with it most of the time. Configuring R to Use an HTTP or HTTPS Proxy by RStudio support team helps.

However, the easiest way on windows is to rely on wininet methods which is the default for download.file. see in windows help:

If method = "auto" is chosen (the default), on Windows the "wininet" method is used apart from for ftps:// URLs where "libcurl" is tried. The "wininet" method uses the WinINet functions (part of the OS).

wininet deals with proxy using the 'Internet Option' of the system. And it is used even if libcurl is available (capabilities("libcurl") is TRUE)

In remotes, by default remotes:::download calls remotes:::download_method in which the default is libcurl if it is available. Then, if not, it is wininet

At first, default behavior threw an error. With some efforts I manage to make it work by configuring proxy differently.

  • The way describe in Configuring R to Use an HTTP or HTTPS Proxy does not work anymore as it is correct for internal only.
  • For libcurl, the form to use is http[s]://[user:password@]machine[:port]. Downloading part of remotes package is now working correctly.
  • Forhttr package, I uses set_config and use_proxy to make it work. (Needed for devtools then as it is working with httr) but I think the http_proxy environnement variable is also working

Suggestion

At the end, I found how to deal with the issue. However, I think it could be useful to clarify all this about proxy configuration

  • By making wininet the default on windows before testing capabilities("libcurl") in remotes:::download_method
  • If libcurl stays the default, by adding some informations in a vignette or in help file to help those behind a proxy. And maybe try to prevent errors or crashes.

What do you think of all this ? I willing to help if you interested as I planned to write something about all this configuration.

@gaborcsardi

This comment has been minimized.

Copy link
Member

commented Nov 25, 2016

This is great, thanks much for investigating! I will have more time to look at it later, for now three quick comments:

  • it would be great to make proxies work on every platform. (just a general remark, not for you personally :)
  • making wininet the default is problematic AFAIR, because it does not always support HTTPS. But I need to double check this.
  • it might be worth looking at the downloader package and how it does its job: https://github.com/cran/downloader
@cderv

This comment has been minimized.

Copy link
Contributor Author

commented Nov 25, 2016

Thanks for the encouragement !

  • I could not agree more. I just have the impression that only windows users have problems with proxy. I was surely misled :) Do not know what is the most cross-platform solution.
  • In the (windows ?) help file, it is explained that wininet is used for all but ftps:: urls. But we should check this.

If method = "auto" is chosen (the default), on Windows the "wininet" method is used apart from for ftps:// URLs where "libcurl" is tried. The "wininet" method uses the WinINet functions (part of the OS).

  • What I meant by the default is testing if (os_type() == "windows") before if (isTRUE(unname(capabilities("libcurl")))). It is currently the other way around in remotes:::download_method

  • I will investigate into downloader package. I had spotted this package but not found time to use it yet. Will see with my use case. I will keep you udpated (here?)

@gaborcsardi

This comment has been minimized.

Copy link
Member

commented Nov 25, 2016

I could not agree more. I just have the impression that only windows users have problems with proxy. I was surely misled :) Do not know what is the most cross-platform solution.

Oh, I don't think there is one. We would probably need a (slightly?) different solution for each platform

What I meant by the default is testing if (os_type() == "windows") before if (isTRUE(unname(capabilities("libcurl")))). It is currently the other way around in remotes:::download_method

Yes, and I think this is because wininet does not support HTTPS on older windows versions. So we might need to check the windows version as well. Anyway, we need to investigate this. I am pretty sure that just using wininet did not work. Maybe downloader or git log has some explanation.

@cderv

This comment has been minimized.

Copy link
Contributor Author

commented Nov 25, 2016

From the help file of 'download' function in downloader package: 

This function also should follow httr redirects on all platforms, which is something that does not happen by default when curl is used, as on Mac OS X.
With Windows, it either uses the "wininet" method (for R 3.2) or uses the "internal" method after first ensuring that setInternet2, is active (which tells R to use the internet2.dll).
On other platforms, it will try to use libcurl, wget, then curl, and then lynx to download the file. R 3.2 will typically have the libcurl method and for previous versions of R Linux platforms will have wget installed, and Mac OS X will have curl.
Note that for many (perhaps most) types of files, you will want to use mode="wb" so that the file is downloaded in binary mode

If we look a the function, in kind of pseudo code it behaves as followed

If https url then
	If windows platform
		If rversion >= 3.2 then 
			Use wininet method 
		Else 
			setInternet2 and use internal method
	Else if other platform then 
		If rversion >= 3.2 and l'invit available then
			Use libcurl method
		Else if wget available then 
			Use wget method 
		Else if curl available then
			Use curl method adding -L option 
		Else if lynx available then
			Use lynx method
		Else stop
	Download file with the selected method
 else  #other than https url
	Use default or setting provided as args for utils::download.file function. 

So wininet seems compatible with https from R 3.2 and Windows platform is tested before libcurl availability. It seems a good solution.

Regarding proxy configuration, with this solution Windows platform methods seems to use IE system settings and not environnement variables. However it relies on download.file default for other types of url.

I think [user:pwd@]machine[:port] is still the common setting for environment variable working with all methods.

@gaborcsardi

This comment has been minimized.

Copy link
Member

commented Nov 26, 2016

Thanks! I don't mind making wininet the default and basically doing the same algorithm in remotes. Would you like to submit a PR?

As for proxy environment variables, we are simply using download.file, so we can just point to the relevant section of its manual page.

@cderv

This comment has been minimized.

Copy link
Contributor Author

commented Nov 26, 2016

I'll be happy to work on a PR.

Do you think we need to apply the same algorithm only for https url as it is done in downloader or to all url as it is done now ? It just means to pass url arg to remotes:::download_method function. Not the case today.
We could discuss it in the PR.

@gaborcsardi

This comment has been minimized.

Copy link
Member

commented Nov 26, 2016

I think for https only.

@gaborcsardi

This comment has been minimized.

Copy link
Member

commented Nov 26, 2016

Hmmm, actually, maybe all urls.

@cderv

This comment has been minimized.

Copy link
Contributor Author

commented Nov 26, 2016

OK I will work on something for all urls.
In fact, I asked because I was wondering why downloader selects method only for https letting base download.file handles the rest.

@gaborcsardi

This comment has been minimized.

Copy link
Member

commented Nov 26, 2016

@cderv

This comment has been minimized.

Copy link
Contributor Author

commented Nov 26, 2016

I'll see. Thanks.
We'll continue to discuss this issue in the PR if needed.

@cderv

This comment has been minimized.

Copy link
Contributor Author

commented Nov 26, 2016

I decided to dig into R methods for download.file function and understand how it evolved in last R version and what we should do know.

I compiled my research and understanding this gist

My conclusions:

  1. For R > = 3.3.0 :
    • on all builds default methods for download.file handle https without anything else to be done
      • wininet on windows
      • libcurl on others platforms
  2. For 3.2.5 <= R < 3.3.0:
    • windows handle https by default with wininet
    • and some other build too with libcurl
  3. For 3.2.0 <= R < 3.2.5: if url is https, default should be manually changed
    • on windows, method should be set to "wininet" (or "internal" and setInternet2(TRUE))
    • on other platform "libcurl" should be used
  4. For R < 3.2.0,
    • https was not supported.

"wget" & "curl" can be used in specific cases if available on the system and with specific extra command for "curl". Not sure we have to deal with this choice. We could offer to the users the ability to provides download method manually if needed.

So, depending on what R version remotes want to support, we can rely only on default options or modify default option for older version. We could offer verbose too in order to inform users of what method is used or not used.

Based on that, I will try to make a first version of a PR.
If you have new thoughts or recommandations after looking at my document, tell me.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.