Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add helper to perform a request in chunks #303

Closed
mgirlich opened this issue Sep 4, 2023 · 5 comments · Fixed by #361
Closed

Add helper to perform a request in chunks #303

mgirlich opened this issue Sep 4, 2023 · 5 comments · Fixed by #361
Labels
feature a feature request or enhancement
Milestone

Comments

@mgirlich
Copy link
Collaborator

mgirlich commented Sep 4, 2023

I frequently use such a helper when working with REST APIs:

  • to insert/update/delete many objects
  • to get the information on many objects where the endpoint actually uses POST as it uses a body

Usually, there is a limit on the number of elements that can be updated/requested, so the requests have to be chunked.
This is kind of similar to req_paginate() but the user has to do the "pagination" himself.

Below is an example of an implementation I use:

api_perform_in_chunks <- function(req,
                                  x,
                                  f_body,
                                  chunk_size,
                                  progress = NULL,
                                  error_call = caller_env()) {
  chunks <- vec_chop_by_size(x, chunk_size)
  with_indexed_errors(
    purrr::map(
      chunks,
      \(chunk) {
        req <- httr2::req_body_raw(
          req,
          jsonlite::toJSON(f_body(chunk))
        )
        api_perform(req)
      },
      .progress = progress
    ),
    message = "In chunk {.field {cnd$location}}.",
    error_call = error_call
  )
}

vec_chop_by_size <- function(x, size) {
  n <- vctrs::vec_size(x)
  sizes <- vctrs::vec_rep(size, n %/% size)
  if (n %% size != 0) {
    sizes <- c(sizes, n %% size)
  }
  sizes <- vctrs::vec_cast(sizes, integer())

  vec_partition(x, sizes)
}

vec_partition <- function(x, sizes) {
  size <- vctrs::vec_size(x)

  if (size != sum(sizes)) {
    abort("`sizes` must add up to size of `x`.")
  }

  starts <- c(1L, sizes)
  starts <- cumsum(starts)[-length(starts)]
  starts <- starts - 1L # 0-based

  vctrs:::vec_chop_seq(x, starts, sizes)
}
@hadley hadley added the feature a feature request or enhancement label Sep 28, 2023
@hadley
Copy link
Member

hadley commented Oct 23, 2023

With the API in #353, I'm having a hard time seeing where httr2 could help out. This mostly looks pretty nice to me:

req <- request("https://api.restful-api.dev/objects")
ids <- sort(sample(100, 50))
chunks <- split(ids, (seq_along(ids) - 1) %/% (length(ids) / 10))
reqs <- chunks %>% lapply(\(idx) req %>% req_url_query(id = idx, .multi = "comma"))
resps <- reqs %>% req_perform_parallel()
resps_combine(resps, \(resp) resp_body_json(resp))

The only bit that feels like it wants a wrapper is the splitting, but it's hard to how that wrapper would fit into httr2's mission.

@mgirlich
Copy link
Collaborator Author

Yes, this is right. As this is a very common pattern maybe we should at least mention it in a vignette and use your reprex as an example in the documentation of some of the functions? At least the splitting feels non-trivial to me to get right.

@mgirlich
Copy link
Collaborator Author

But if we recommend to use req_perform_parallel() we would need to fix some issues of it, e.g. throttling (which is quite likely to be hit with bigger data) and expired OAuth token would also be nice. So either we try to fix them or add req_perform_sequential(). What do you think?

@hadley
Copy link
Member

hadley commented Oct 24, 2023

Something like this would almost work with req_perform_iterately():

iterate_with_data <- function(data, chunk_size, req_update) {
  chunks <- split(data, (seq_along(data) - 1) %/% (length(data) / chunk_size))
  
  i <- 1
  function(resp, req) {
    if (i <= length(chunkss)) {
      req <- req_update(req, chunks[[i]])
      i <<- i + 1
      req
    }
  }
}

But to make it actually work, we'd need to give req_perform_iteratively() some way to also call it on the first request, either an argument to req_perform_iteratively(), or some attribute we attach to the function.

@hadley
Copy link
Member

hadley commented Oct 24, 2023

One possible idea would be to allow you to use req = NULL in req_perform_iteratively().

@hadley hadley added this to the v0.3.0 milestone Oct 26, 2023
hadley added a commit that referenced this issue Oct 26, 2023
hadley added a commit that referenced this issue Oct 30, 2023
Fixes #303. Fixes #310

This has a very similar interface to `req_perform_parallel()` but because it just does one request after another, it doesn't have any of the limitations of that function.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants