Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

httr2 #1

Closed
hadley opened this issue Apr 1, 2016 · 17 comments
Closed

httr2 #1

hadley opened this issue Apr 1, 2016 · 17 comments

Comments

@hadley
Copy link
Member

hadley commented Apr 1, 2016

httr is a library for two main tasks: creating http requests and parsing the responses. Currently this dichotomoy is a little muddled because:

  • There is no explicit request object - it is only created internally by
    the request functions. This tends to lead to large request functions that
    have many arguments passed in ...
  • There's no consistent naming scheme that indicates whether a function
    works with a request or a response. This is particuarly confusing for
    functions like content_type(): does it set the content type of
    the request or extract the content type of the response?

Additionally, httr was designed prior to the pipe and so uses ... rather than functions that modify an object. This makes the API feel rather different to similar APIs (e.g. rvest). It also makes it harder to test, because you can only easily test the result of issuing a request, not the internal request object. Overall the API feels a little dated, and a little underdesigned.

Request API

Basics

req("http://url.com/") %>% req_fetch()

# req_fetch could be a generic so you could still do
req_fetch("http://url.com/")

Performing the request

The HTTP method doesn't affect the input arguments or the output type, so that suggests that the key API verb should be the output, rather than the HTTP method:

req("http://url.com") %>% req_fetch()
req("http://url.com") %>% req_async()
req("http://url.com") %>% req_save(path = path)
req("http://url.com") %>% req_stream(fun = fun)

The request verb would default to GET, unless a body was set, in which case it would change to POST. Otherwise, you could override yourself in two ways:

req("http://url.com") %>% req_method("DELETE") %>% req_fetch()
req("http://url.com") %>% req_fetch("DELETE")

The first form would be most useful when generating partial requests for API wrappers.

Url

# Would generally make smaller functions that took multiple arguments
# This would make iterative url construction much nicer for APIs
req("http://url.com/") %>% 
  req_path("a", username, "b") %>%  # replaces
  req_path_suffix("y") %>%  # appends
  req_path_prefix("x") %>%  # for completeness
  req_query(y = 1) %>% 
  req_params() %>% 
  req_fetch()

# req would need to parse the url so you could start with
req("http://url.com/api/v1/?q=2")

Body

# This would also lead to a nicer API for bodies
req("http://url.com/") %>% 
  req_body_json(a = 1, b = 2, c = 3) %>% 
  req_fetch() 

# Setting body would change default request type, but you could
# override
req("http://url.com/") %>% 
  req_body_json(a = 1, b = 2, c = 3) %>% 
  req_fetch("PUT") 

req("http://url.com/") %>% 
  req_body_json(a = 1, b = 2, c = 3) %>% 
  req_save(path = "~/Desktop/bigfile.blah") 
  # If path is directory, automatically add name from url?

# req_body_file()
# req_body_form()
# req_body_json()
# req_body_multipart()
# req_body_raw()

Authentication

req("http://url.com/") %>% req_auth_basic()
req("http://url.com/") %>% req_auth_oauth1()
req("http://url.com/") %>% req_auth_oauth2()

Headers

req("http://url.com/") %>% 
  req_header(`Content-type` = "application/json") %>% 
  req_header(a_list_from_somewhere_else) # would unlist() inputs as needed.
# list(...) %>% flatten() %>% map_chr(as.character())

Curl

# And there would be a new function for setting specific curl requests
# These would be applied after other request parameters
req_config()

Response API

Basics

resp <- req("http://google.com") %>% req_fetch()

# Pipeable API to check that the response is what you expect
resp %>% 
  resp_check_ok() %>% 
  resp_check_body_xml() %>% 
  resp_content_xml()

# Other functions extract headers
resp %>% resp_headers()
resp %>% resp_content_type()

# And other http components
resp %>% resp_status()
resp %>% resp_url()
resp %>% resp_timings()

Content

resp %>% resp_content_raw()
resp %>% resp_content_text(encoding = "UTF-8")

# Would rely on user to check content type using helper from above
resp %>% resp_content_json()
resp %>% resp_content_xml()
resp %>% resp_content_html()
resp %>% resp_content_png()
resp %>% resp_content_jpeg()

Would not have resp_content_auto() because I think time has shown that this is a bad idea.

New package?

Should be a new package, httr2?

Pros:

  • Can start from scratch without having to work with existing API
  • Could aim for high unit test coverage from the beginning.
  • Documentation will be less confusing because it doesn't have to describe
    two APIs.

Cons:

  • May re-introduce bugs because I miss important logic
  • Will have to maintain two packages in the short term (in the long term
    would deprecate httr).

I think the pros probably outweigh the cons - the API will a sufficiently large change that it's worth starting from scratch.

@craigcitro, @jeroenooms, @jennybc I'd love your thoughts on this, if you have a little spare time.

@hadley
Copy link
Member Author

hadley commented May 22, 2016

(Also note that I probably won't work on this for a while since I have higher priority stuff that I should be working, but this might happen next time I have a couple of days downtime and a good internet connection)

@jeroen
Copy link
Member

jeroen commented May 23, 2016

It has been on my to do list to experiment with async libcurl bindings in order to support many concurrent requests. Something along the lines of jeroen/curl#51 but perhaps a bit simpler, like promises in JavaScript.

# New scheduler
pool <- multi_handle()

# Add requests with callback, returns immediately
req("http://url.com") %>% schedule(pool) %>% then(function(x){
  print("Done: ", x$status)
})

req("http://url.com") %>% schedule(pool) %>% then(function(x){
  print("Another one: ", x$status)
})

# This would block until all request are finished
complete(pool)

@jcheng5 and me briefly chatted about using the R event loop rather than something like complete and never block. But this might not be reliable on windows.

Only vaguely related to the above :)

@craigcitro
Copy link

More to say later, but one quick thought to throw out there:

I do think it'd be a bummer to no longer have a simple GET('http://some.url/') call for interactive use. There's a fine line to walk here -- I much prefer the explicit composability described above, especially when it comes to testing, but it's often useful to be able to quickly grab a page to use for examples or while experimenting.

I'm not sure what the right answer is here -- I chatted about this with @hadley who was also on the fence. What do other folks think?

@jennybc
Copy link
Member

jennybc commented May 25, 2016

Should be a new package, httr2?

It feels like that would be easier on everyone. Maybe you should name first packages foo0, so you can just drop the zero on the second pass :)

There's no consistent naming scheme that indicates whether a function works with a request or a response.
... large request functions that have many arguments passed in ...

This rings true. I have wondered whether it's best to build the pieces first (e.g. the URL), then pass to the verb. Or just do everything at once and write a monster VERB() call. The pipe-based design clears some of that up. I have pieces of the request that are conditionally included (e.g. token if URL matches some regex, progress if interactive). How do you express that? BTW progress isn't mentioned in the request API notes.

I do think it'd be a bummer to no longer have a simple GET('http://some.url/') call for interactive use.

I agree. And maybe not just for interactive use. When you're reading API docs, it's always in terms of GET, POST, etc. so it is nice to have those same verbs.

@hadley
Copy link
Member Author

hadley commented May 25, 2016

@jennybc my main worry about also having a simpler GET(), POST() etc set of functions is that it's going to be qplot() all over again - yes, it's easier to get started with, and it would be easier to copy examples from the web, but as soon as you want to do anything complicated, you'll want to switch to the full system.

(And progess would be something like req(url) %>% req_show_progress() %>% req_fetch())

@jcheng5
Copy link
Member

jcheng5 commented May 25, 2016

What if req_fetch can take a string? (Or really, any of the functions that operate on a request could take a string instead.)

@hadley
Copy link
Member Author

hadley commented May 25, 2016

@jcheng5 you mean like this example req_fetch("http://url.com/") 😛

You could certainly start any chain with a string instead of a request, but I think you'd still need to explicitly send the request with req_fetch() etc.

@jennybc
Copy link
Member

jennybc commented May 25, 2016

Will the design of .httr-oauth stay the same?

I know you are generally anti-options() but what about allowing us to set user agent package-wide with an option?

@jcheng5
Copy link
Member

jcheng5 commented May 25, 2016

@hadley Reading comprehension FTW

@craigcitro If req_fetch(url) works, do you still need GET(url)? Or you're thinking that GET would take options as parameters?

@hadley
Copy link
Member Author

hadley commented May 25, 2016

@jennybc Yes, .httr-oauth is likely to stay the same (and you'd be able to use both httr and httr2 with the same file)

I'd really prefer not to use options, but I think it won't be as important since with the pipeable syntax, you'd just create a basic prototype request that every function in your package would customise.

@jeroen
Copy link
Member

jeroen commented Jun 6, 2016

Wrote a first working version of fully async requests in the ‘curl’ package. See the ?multi help page and examples. Maybe this helps rethinking the httr api?

@jennybc
Copy link
Member

jennybc commented Jan 5, 2017

Re "Performing the request", it would be nice if the Request API offered a way to specify the behaviour available currently via RETRY().

@hadley hadley changed the title API improvements httr2 Jul 24, 2017
@hadley
Copy link
Member Author

hadley commented Jul 24, 2017

See also notes from @hrbrmstr at #396

@hadley
Copy link
Member Author

hadley commented Jul 24, 2017

Also need to break up OAuth apis into much smaller pieces as in #432

@hadley
Copy link
Member Author

hadley commented Jul 24, 2017

Need API to retrieve all requests performed when a sequence of requests are performed automatically (e.g. during redirects, #445)

@jennybc
Copy link
Member

jennybc commented Nov 20, 2017

Current logic re: adding token cache file to .Rbuildignore and .gitignore only looks at directory of cache file. usethis now has a notion of a project. Would be good to use something similar to identify if there's a relevant .Rbuildignore or .gitignore in some parent directory of the cache file.

@hadley hadley transferred this issue from r-lib/httr Nov 22, 2018
@hadley
Copy link
Member Author

hadley commented May 26, 2021

I think the major pieces are either now in place or tracked in issues, so I'm going to close this mega issue 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants