Skip to content

Cannot upload large files with req_body_file() #524

@zacdav-db

Description

@zacdav-db

I maintain brickster, which uses httr2 for all requests.

A user on httr2 1.0.3 recently ran into issues (databrickslabs/brickster#63) uploading files.

The change to req_body_file behaviour in (#489) looks be related.

remotes::install_github(repo = "r-lib/httr2", ref = "ff16551") # before change, works
remotes::install_github(repo = "r-lib/httr2", ref = "bdb13fe") # after change, fails

The requisite code in brickster is here

I've been unable to create a minimal repro that doesn't require Databricks credentials.

However, I do have the following examples of the behaviour, with these observations:

  • If the file is sufficiently small, it works every time < ~60kb
  • If the file is > ~60kb and < ~67kb it can sometimes work
  • If the file is > ~ 67kb I've been unable to see any example work
  • Previous version of httr2 (1.0.1) successfully uploads gigabytes to the same endpoint

Example Request

# https://docs.databricks.com/api/workspace/files/upload

library(httr2)

rows <- 10

token <- brickster::db_token()
host <- brickster::db_host()
vpath <- "/Volumes/zacdav/default/test"

# create data for upload
dir <- tempdir()
fpath <- file.path(dir, "cars.csv")
write.csv(dplyr::sample_n(mtcars, rows, TRUE), fpath)

# path at to upload to
vol_dest <- file.path(vpath, "cars.csv")

# building url
url <- list(
  scheme = "https",
  hostname = host,
  path = paste0("/api/", "2.0")
)

url <- httr2::url_build(url)

# building request

req <- httr2::request(base_url = url) |>
  httr2::req_url_path_append(paste0("fs/files", vol_dest)) |>
  httr2::req_url_query(overwrite = "true") |>
  httr2::req_auth_bearer_token(token = token) |>
  httr2::req_method("PUT") |>
  httr2::req_body_file(fpath) |>
  httr2::req_verbose()

req

req$body

resp <- req |>
  httr2::req_perform()
  
resp

Successful (10 rows)

> req
<httr2_request>
PUT https://***************.databricks.com/api/2.0/fs/files/Volumes/zacdav/default/test/cars.csv?overwrite=true
Headers:
• Authorization: '<REDACTED>'
Body: a string
Options:
• debugfunction: a function
• verbose: TRUE
> 
> req$body
$data
[1] "/var/folders/s5/c339rs154nq554zg1s6pkmyr0000gp/T//RtmpZWFFdN/cars.csv"
attr(,"class")
[1] "httr2_path"

$type
[1] "raw-file"

$content_type
[1] ""

$params
list()

> 
> resp <- req |>
+   httr2::req_perform(path = NULL)
-> PUT /api/2.0/fs/files/Volumes/zacdav/default/test/cars.csv?overwrite=true HTTP/2
-> Host: ***************.cloud.databricks.com
-> User-Agent: httr2/1.0.3 r-curl/5.2.2 libcurl/8.4.0
-> Accept: */*
-> Accept-Encoding: deflate, gzip
-> Authorization: <REDACTED>
-> Content-Length: 602
-> 
<- HTTP/2 204 
<- x-request-id: eeefa1ab-589a-471b-bb83-e3761c7182d6
<- x-operation-id: 734dc31278096e10
<- x-trace-id: 6feb2f18574dd3761b87edecb89ce3e6
<- date: Mon, 2 Sep 2024 09:50:02 GMT
<- x-databricks-org-id: ***************
<- vary: Accept-Encoding
<- strict-transport-security: max-age=31536000; includeSubDomains; preload
<- x-content-type-options: nosniff
<- server: databricks
<- 
>   
> resp
<httr2_response>
PUT https://***************.cloud.databricks.com/api/2.0/fs/files/Volumes/zacdav/default/test/cars.csv?overwrite=true
Status: 204 No Content
Body: None

Failure (1500 rows)

> req
<httr2_request>
PUT https://***************.cloud.databricks.com/api/2.0/fs/files/Volumes/zacdav/default/test/cars.csv?overwrite=true
Headers:
• Authorization: '<REDACTED>'
Body: a string
Options:
• debugfunction: a function
• verbose: TRUE
> req$body
$data
[1] "/var/folders/s5/c339rs154nq554zg1s6pkmyr0000gp/T//RtmpZWFFdN/cars.csv"
attr(,"class")
[1] "httr2_path"

$type
[1] "raw-file"

$content_type
[1] ""

$params
list()

> resp <- req |>
+   httr2::req_perform(path = NULL)
-> PUT /api/2.0/fs/files/Volumes/zacdav/default/test/cars.csv?overwrite=true HTTP/2
-> Host: ***************.cloud.databricks.com
-> User-Agent: httr2/1.0.3 r-curl/5.2.2 libcurl/8.4.0
-> Accept: */*
-> Accept-Encoding: deflate, gzip
-> Authorization: <REDACTED>
-> Content-Length: 89752
-> 
Error in `httr2::req_perform()`:
! Failed to perform HTTP request.
Caused by error in `curl::curl_fetch_memory()`:
! HTTP/2 stream 1 was not closed cleanly: PROTOCOL_ERROR (err 1)
Run `rlang::last_trace()` to see where the error occurred.

> rlang::last_trace()
<error/httr2_failure>
Error in `httr2::req_perform()`:
! Failed to perform HTTP request.
Caused by error in `curl::curl_fetch_memory()`:
! HTTP/2 stream 1 was not closed cleanly: PROTOCOL_ERROR (err 1)
---
Backtrace:
    ▆
 1. └─httr2::req_perform(req, path = NULL)
 2.   └─base::tryCatch(...)
 3.     └─base (local) tryCatchList(expr, classes, parentenv, handlers)
 4.       └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
 5.         └─value[[3L]](cond)

> sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.3.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Australia/Sydney
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] httr2_1.0.3

loaded via a namespace (and not attached):
 [1] vctrs_0.6.5       cli_3.6.3         knitr_1.48        rlang_1.1.4       xfun_0.47         png_0.1-8        
 [7] purrr_1.0.2       generics_0.1.3    assertthat_0.2.1  jsonlite_1.8.8    glue_1.7.0        bit_4.0.5        
[13] fansi_1.0.6       rappdirs_0.3.3    grid_4.3.2        tibble_3.2.1      lifecycle_1.0.4   compiler_4.3.2   
[19] dplyr_1.1.4       Rcpp_1.0.13       pkgconfig_2.0.3   rstudioapi_0.16.0 brickster_0.2.5   lattice_0.22-6   
[25] R6_2.5.1          reticulate_1.37.0 tidyselect_1.2.1  utf8_1.2.4        curl_5.2.2        pillar_1.9.0     
[31] magrittr_2.0.3    Matrix_1.6-5      tools_4.3.2       bit64_4.0.5       arrow_17.0.0.1   

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions