Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate RCurl dependency #172

Closed
hadley opened this issue Dec 4, 2014 · 15 comments
Closed

Eliminate RCurl dependency #172

hadley opened this issue Dec 4, 2014 · 15 comments

Comments

@hadley
Copy link
Member

hadley commented Dec 4, 2014

Follow https://github.com/jeroenooms/curl strategy to link to system curl.

Need to implement:

  • RCurl::listCurlOptions()
  • RCurl::getCurlOptionsConstants()
  • RCurl::getCurlOptionTypes(constants)
  • RCurl::curlVersion()$version
  • RCurl::curlVersion()$ssl_version)
  • RCurl::curlEscape() and RCurl::curlUnescape()
  • RCurl::getCurlInfo()
  • RCurl::getCurlHandle(cookiefile = cookie_path, .defaults = list())
  • RCurl::base64(hash)
  • RCurl::curlSetOpt()
  • RCurl::.postForm()
  • RCurl::curlPerform()
  • RCurl::fileUpload()
  • RCurl::CFILE()
  • RCurl::binaryBuffer()
@hadley
Copy link
Member Author

hadley commented Dec 4, 2014

  1. Create a new branch
  2. Copy over setup from curl.
  3. Add Rcpp dependency
  4. Port simple function like curlVersion()

@jeroen
Copy link
Member

jeroen commented Dec 4, 2014

I can help with this by implementing the required stuff in the curl package if you would like.

@hadley
Copy link
Member Author

hadley commented Dec 4, 2014

@jeroenooms hmmmmm, that'd be great actually

@jeroen
Copy link
Member

jeroen commented Dec 4, 2014

Do we really need all of those functions? It seems like RCurl is overcomplicating things. For example names(getCurlOptionsConstants()) gives you listCurlOptions()? And getCurlOptionTypes doesn't call libcurl at all so you can pull that into httr. We can probably reduce the curl interface to a few basic bindings to libcurl without the bloat and craziness, and move the high level stuff into httr.

@hadley
Copy link
Member Author

hadley commented Dec 4, 2014

@jeroenooms that was just a list of all the functions I currently use. I imagine a new interface would be considerably simpler

Ideally, I'd like to eliminate curlSetOpt altogether, and always supply a complete set of options each time you curlPerform(). Similarly, it would be great if .postForm() could be eliminated, instead just being a special case of curlPerform()

@hadley
Copy link
Member Author

hadley commented Dec 4, 2014

That might be a bit specific to my needs, but it would be useful to have an interface that sets options, does the request and always resets the handle on failure.

@jeroen
Copy link
Member

jeroen commented Dec 4, 2014

I don't understand .postForm does either. There is no such function in libcurl. Why do we need to reset handles? Creating a new one takes like 0.01s.

@hadley
Copy link
Member Author

hadley commented Dec 4, 2014

That might be the better solution (most of httr is working around RCurl's binding to curl, so I'm not surprised if there's many better ways of doing things). I think the main reason to re-use a handle is to retain the http connection pool and cookies

@jeroen
Copy link
Member

jeroen commented Dec 13, 2014

It turns out that getCurlOptions is quite difficult to implement. All of the curl options are C constants which get resolved during compilation, and hence do not exist at runtime. It seems like RCurl is doing quite some nasty hacking to try and extract options constants from the curl headers at compile time.

The more typical and reliable use is to hardcode a list of curl options (and their integer) in the client, in our case the R package. If the user tries sets an option which is not available (because his package was built against an old version of libcurl), then curl_easy_setopt will return CURLE_UNKNOWN_OPTION. In this case we could raise a warning in R that the given option is not supported.

Do you think that this would be sufficient, or do we really need support for listing available options in curl.h at runtime? See also this post.

@hadley
Copy link
Member Author

hadley commented Dec 13, 2014

That seems fine to me.

@hadley
Copy link
Member Author

hadley commented Dec 13, 2014

I should mention that I'd actually prefer to include libcurl sources in the package - then we'd always know exactly what version of curl we're binding against.

@jeroen
Copy link
Member

jeroen commented Dec 13, 2014

@hadley that is unfortunately not going to work. First of all libcurl depends on openssl, libz, libidn, libssh, all of which are a bitch to build. Also CRAN has policy against duplicating sources already available on major systems, for copyright and maintenance reasons I think.

But especially on linux this is really not done, and you should always build against libraries that ship with the OS. This is quite important for system administration. For example, if there is a critical security patch in openssl, the sysadmin should be able to update libssl on the system, which automatically fixes any software on the system using ssl. If all software would ship their own version of dependencies, servers would be a complete mess.

@hadley
Copy link
Member Author

hadley commented Dec 13, 2014

I didn't realise curl shipped with most oses. I'm mostly concerned about the windows cran versions which seem to be very up to date - but you're dealing with windows separately, right?

@jeroen
Copy link
Member

jeroen commented Dec 13, 2014

Yes I'm taking care of windows.

@wibeasley
Copy link
Contributor

I saw this announced for the future R 3.2.0 release. Will this help avoid the dependency on RCurl?

Optional use of libcurl (version 7.28.0 from Oct 2012 or later) for Internet access:

  • capabilities("libcurl") reports if this is available.
  • libcurlVersion() reports the version in use, and other details of the "libcurl" build including which URL schemes it supports.
  • curlGetHeaders() retrieves the headers for http://, https://, ftp:// and ftps:// URLs: analysis of these headers can provide insights into the ‘existence’ of a URL (it might for example be permanently redirected) and is so used in R CMD check --as-cran.
  • download.file() has a new optional method "libcurl" which will handle more URL schemes, follow redirections, and allows simultaneous downloads of multiple URLs.
  • url() has a new method "libcurl" which handles more URL schemes and follows redirections. The default method is controlled by a new option url.method, which applies also to the opening of URLs via file() (which happens implicitly in functions such as read.table.)
  • When file() or url() is invoked with a https:// or ftps:// URL which the current method cannot handle, it switches to a suitable method if one is available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants