Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default user_agent: consistency across packages #653

Closed
dmi3kno opened this issue Jun 9, 2020 · 2 comments
Closed

Default user_agent: consistency across packages #653

dmi3kno opened this issue Jun 9, 2020 · 2 comments

Comments

@dmi3kno
Copy link
Contributor

dmi3kno commented Jun 9, 2020

TL;DR: I am building polite web scraping tools on top of httr/curl. I would like httr to use the same default useragent strings as curl (that I can override in my package).


Below is a quick survey of default useragent strings as used by various web facing packages.

CURL C library does not seem to have "default" setting for useragent. CURLOPT_USERAGENT is not set by default. Consequently, default useragent in RCurl is NULL, and it can only be set in curlOptions() or directly in the handle. 1

This was the reason why httr introduced user_agent()

#' Override the default RCurl user agent of `NULL`

curl uses "HTTPUserAgent" option, as default UA unless changed by curl::handle_setopt

# in terminal
getOption("HTTPUserAgent")
#> [1] "R (3.6.3 x86_64-pc-linux-gnu x86_64 linux-gnu)"

# in Rstudio
getOption("HTTPUserAgent")
#> [1] "RStudio Desktop (1.3.959); R (3.6.3 x86_64-pc-linux-gnu x86_64 linux-gnu)"

The same is also used by download.file() in base R:

# reproducible help example
library(magrittr)
h <- help("download.file") %>% 
 utils:::.getHelpFile() %>% 
 {capture.output(tools:::Rd2txt(.))} 
  
s <- grep("headers: ", h) 
h[0:3+s]
#> [1] " headers: named character vector of HTTP headers to use in HTTP"        
#> [2] "          requests.  It is ignored for non-HTTP URLs.  The ‘User-Agent’"
#> [3] "          header, coming from the ‘HTTPUserAgent’ option (see"          
#> [4] "          ‘options’) is used as the first header, automatically."   

The same convention is followed by Rstudio in rstudio/r-builds#30

Currently httr does not use option HTTPUserAgent and instead uses concatenated version numbers for libcurl, curl and httr pasted into a single string

httr/R/config.r

Line 142 in af25ebd

cache$default_ua <- paste0(names(versions), "/", versions, collapse = " ")

Therefore, on my machine httr has this as default UA

# internal function
httr:::default_ua()
#> [1] "libcurl/7.58.0 r-curl/4.3 httr/1.4.1"

Can we please prepend the current library versions with the content of option("HTTPUserAgent"). So for my system it would read:

RStudio Desktop (1.3.959); R (3.6.3 x86_64-pc-linux-gnu x86_64 linux-gnu); libcurl/7.58.0 r-curl/4.3 httr/1.4.1

This will allow personification of default useragent in .Rprofile and introduce consistency across modern web-accessing packages in R

1 Other connections in base R are a lost case, though, because they use a mix of interfaces for different platforms, which come with their own defaults. At least for Linux it populates useragent with libcurl version number.

@dmi3kno
Copy link
Contributor Author

dmi3kno commented Jun 9, 2020

Current system of caching the default useragent might be a little bit problematic as well. I see why you would want to cache (potentially expensive) sourcing of version numbers, but since useragent is always available via getOption("HTTPUserAgent") why don't we store the concatenated version string and just combine it on the fly with the conent of HTTPUserAgent option.

What I hope to accomplish with polite is running a session with temporary "exceptionally polite" human readable user agent string. In order to achieve it I will need to either pass custom header (which I may not be able to do) or temporarily swap the value of the option (like withr does). Therefore I would love the content of the HTTPUserAgent option not cached to enable last-minute updates at call time.

@hadley
Copy link
Member

hadley commented Oct 31, 2023

httr has been superseded in favour of httr2, so is no longer under active development. If this problem is still important to you in httr2, I'd suggest filing an issue offer there 😄. Thanks for using httr!

@hadley hadley closed this as completed Oct 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants