Skip to content

Error in check_number() when using Req_retry()  #385

@nclsbarreto

Description

@nclsbarreto

Hello,

I am trying to connect to a website to do some scraping. I am trying to learn how to slow down the requests because if I request to many times in a minute, I get a AWS challenge from the website that httr2 cant push past. If i wait the change goes away and it works just fine (which is annoying because it means the full pull I have to do will take a while, but whatever. Below is only going to one URL, but I have a couple hundred I want to pull (that are different URLs in the same website).

Any clarification, or advice, would be super helpful.

Thank you for rvest, httr, and httr2 in general, I have taught myself how to use them with the good good documentation and it has helped streamline a lot of my work!

Here is the normal code working just fine (i.e. i have not tried to call it quickly)

> table_url[1] %>%
+   request() %>% 
+   req_headers(
+     authority = "www.hospitalsafetygrade.org",
+     accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
+     `accept-language` = "en-US,en;q=0.9",
+     `cache-control` = "no-cache",
+   ) %>% 
+   req_retry(max_seconds = 15,
+             is_transient = ~resp_status(.x) %in% c(429, 500, 503, 202),
+             after = ~.x) %>%
+   req_perform(verbosity = 1) 
-> GET /table-details/the-queens-medical-center HTTP/1.1
-> Host: www.hospitalsafetygrade.org
-> User-Agent: httr2/0.2.3 r-curl/4.3.2 libcurl/7.64.1
-> Accept-Encoding: deflate, gzip
-> authority: www.hospitalsafetygrade.org
-> accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7
-> accept-language: en-US,en;q=0.9
-> cache-control: no-cache
-> 
<- HTTP/1.1 200 OK
<- Date: Sat, 11 Nov 2023 01:02:28 GMT
<- Content-Type: text/html; charset=utf-8
<- Transfer-Encoding: chunked
<- Connection: keep-alive
<- Server: Apache/2.2.34 (Amazon)
<- X-AWC-Cache: partial
<- Set-Cookie: sid=6b3aad3f8ec0fd5382d3bde1d0dbc2d0; path=/; HttpOnly
<- 
<httr2_response>
GET https://www.hospitalsafetygrade.org/table-details/the-queens-medical-center
Status: 200 OK
Content-Type: text/html
Body: In memory (80435 bytes)

Here is the code that causes the errors - at the bottom it says "Error in check_number():! seconds must be a number" but I don't understand the error...I have been working on this a bit (took me a while to realize the status 202 meant it was giving me a redirect to a challenge)

> table_url[1] %>%
+   request() %>% 
+   req_headers(
+     authority = "www.hospitalsafetygrade.org",
+     accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
+     `accept-language` = "en-US,en;q=0.9",
+     `cache-control` = "no-cache",
+   ) %>% 
+   req_retry(max_seconds = 15,
+             is_transient = ~resp_status(.x) %in% c(429, 500, 503, 202),
+             after = ~.x) %>%
+   req_perform(verbosity = 1) 
-> GET /table-details/the-queens-medical-center HTTP/1.1
-> Host: www.hospitalsafetygrade.org
-> User-Agent: httr2/0.2.3 r-curl/4.3.2 libcurl/7.64.1
-> Accept-Encoding: deflate, gzip
-> authority: www.hospitalsafetygrade.org
-> accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7
-> accept-language: en-US,en;q=0.9
-> cache-control: no-cache
-> 
<- HTTP/1.1 202 Accepted
<- Server: awselb/2.0
<- Date: Sat, 11 Nov 2023 00:50:46 GMT
<- Content-Length: 2411
<- Connection: keep-alive
<- x-amzn-waf-action: challenge
<- Cache-Control: no-store, max-age=0
<- Content-Type: text/html; charset=UTF-8
<- 
Error in `check_number()`:
! `seconds` must be a number
Run `rlang::last_trace()` to see where the error occurred.
Warning message:
In if (is.na(after)) { :
  the condition has length > 1 and only the first element will be used
> rlang::last_trace()
<error/rlang_error>
Error in `check_number()`:
! `seconds` must be a number
---
Backtrace:
    x
 1. +-... %>% req_perform(verbosity = 1)
 2. \-httr2::req_perform(., verbosity = 1)
 3.   \-httr2:::sys_sleep(delay)
 4.     \-httr2:::check_number(seconds, "`seconds`")
Run rlang::last_trace(drop = FALSE) to see 1 hidden frame.
> ?httr2::check_number()
Error in .helpForCall(topicExpr, parent.frame()) : 
  no methods for ‘check_number’ and no documentation for it as a function
> ?check_number()
Error in .helpForCall(topicExpr, parent.frame()) : 
  no methods for ‘check_number’ and no documentation for it as a function

here is my system information

R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] RPostgreSQL_0.7-4 tmap_3.3-3        odbc_1.3.4        logger_0.2.2      DBI_1.1.3         glue_1.6.2        httr2_0.2.3       chromote_0.1.2   
 [9] magrittr_2.0.3    jsonlite_1.8.4    xml2_1.3.3        openxlsx_4.2.5    dbplyr_2.3.2      rvest_1.0.3       lubridate_1.9.2   forcats_1.0.0    
[17] stringr_1.5.0     dplyr_1.1.1       purrr_1.0.1       readr_2.1.4       tidyr_1.3.0       tibble_3.2.1      ggplot2_3.4.2     tidyverse_2.0.0  
[25] pacman_0.5.1     

loaded via a namespace (and not attached):
 [1] sf_1.0-12           bit64_4.0.5         RColorBrewer_1.1-3  httr_1.4.4          tools_4.1.1         utf8_1.2.2          R6_2.5.1           
 [8] KernSmooth_2.23-20  colorspace_2.0-3    raster_3.6-20       sp_1.5-0            withr_2.5.0         tidyselect_1.2.0    tictoc_1.2         
[15] processx_3.7.0      leaflet_2.1.2       curl_4.3.2          bit_4.0.4           compiler_4.1.1      leafem_0.2.0        cli_3.6.1          
[22] scales_1.2.1        classInt_0.4-7      proxy_0.4-27        rappdirs_0.3.3      digest_0.6.29       base64enc_0.1-3     dichromat_2.0-0.1  
[29] pkgconfig_2.0.3     htmltools_0.5.3     sessioninfo_1.2.2   fastmap_1.1.0       htmlwidgets_1.5.4   rlang_1.1.0         readxl_1.4.2       
[36] Microsoft365R_2.4.0 rstudioapi_0.14     generics_0.1.3      crosstalk_1.2.0     zip_2.2.1           AzureGraph_1.3.2    Rcpp_1.0.10        
[43] munsell_0.5.0       fansi_1.0.3         abind_1.4-5         terra_1.7-23        lifecycle_1.0.3     stringi_1.7.6       leafsync_0.1.0     
[50] snakecase_0.11.0    tmaptools_3.1-1     grid_4.1.1          blob_1.2.3          parallel_4.1.1      promises_1.2.0.1    lattice_0.20-45    
[57] stars_0.6-1         hms_1.1.2           ps_1.7.1            pillar_1.8.1        codetools_0.2-18    XML_3.99-0.10       selectr_0.4-2      
[64] png_0.1-7           vctrs_0.6.1         tzdb_0.3.0          cellranger_1.1.0    gtable_0.3.1        AzureAuth_1.3.3     janitor_2.1.0      
[71] lwgeom_0.2-11       e1071_1.7-11        later_1.3.0         viridisLite_0.4.1   class_7.3-20        websocket_1.4.1     units_0.8-0  

Metadata

Metadata

Assignees

No one assigned

    Labels

    featurea feature request or enhancement

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions