Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lares::holidays produces empty frame #45

Closed
Patrikios opened this issue Jun 27, 2023 · 11 comments
Closed

lares::holidays produces empty frame #45

Patrikios opened this issue Jun 27, 2023 · 11 comments
Assignees
Labels
bug Something isn't working

Comments

@Patrikios
Copy link
Contributor

Patrikios commented Jun 27, 2023

I run the basic example:

> lares::holidays(countries = "Argentina")
>>> Extracting Argentina's holidays for 2023
# A tibble: 0 × 10
# ℹ 10 variables: holiday <date>, holiday_name <chr>, holiday_type <chr>, national <lgl>, observance <lgl>, bank <lgl>, nonwork <lgl>, season <lgl>,
#   hother <lgl>, county <fct>

Gives back empty data.frame, simmilar to other countries I tried.

Session Info:

> sessionInfo()
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 10 (buster)

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.3.5.so;  LAPACK version 3.8.0

locale:
 [1] LC_CTYPE=de_DE.UTF-8       LC_NUMERIC=C               LC_TIME=de_DE.UTF-8        LC_COLLATE=de_DE.UTF-8     LC_MONETARY=de_DE.UTF-8   
 [6] LC_MESSAGES=de_DE.UTF-8    LC_PAPER=de_DE.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Berlin
tzcode source: system (glibc)

attached base packages:
[1] parallel  stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
 [1] parsnip_1.1.0            scales_1.2.1             rsample_1.1.1            timetk_2.8.3             recipes_1.0.6            dplyr_1.1.2             
 [7] modeltime.h2o_0.1.1.9000 h2o_3.40.0.4             modeltime_1.2.6.9000     lubridate_1.9.2          prophet_1.0              rlang_1.1.1             
[13] Rcpp_1.0.10              glue_1.6.2               bettermc_1.2.2           readr_2.1.4              plotly_4.10.2            ggplot2_3.4.2           
[19] readxl_1.4.2             data.table_1.14.8        withr_2.5.0             

loaded via a namespace (and not attached):
  [1] rstudioapi_0.14     jsonlite_1.8.5      magrittr_2.0.3      farver_2.1.1        fs_1.6.2            vctrs_0.6.3         RCurl_1.98-1.12    
  [8] htmltools_0.5.5     dials_1.2.0         progress_1.2.2      curl_5.0.1          cellranger_1.1.0    pROC_1.18.2         parallelly_1.36.0  
 [15] StanHeaders_2.26.27 htmlwidgets_1.6.2   plyr_1.8.8          extraDistr_1.9.1    zoo_1.8-12          lifecycle_1.0.3     iterators_1.0.14   
 [22] pkgconfig_2.0.3     Matrix_1.5-4.1      R6_2.5.1            fastmap_1.1.1       future_1.32.0       tune_1.1.1          selectr_0.4-2      
 [29] digest_0.6.32       colorspace_2.1-0    furrr_0.3.1         patchwork_1.1.2     ps_1.7.5            crosstalk_1.2.0     labeling_0.4.2     
 [36] fansi_1.0.4         yardstick_1.2.0     timechange_0.2.0    httr_1.4.6          compiler_4.3.0      bit64_4.0.5         backports_1.4.1    
 [43] inline_0.3.19       pkgbuild_1.4.2      highr_0.10          R.utils_2.12.2      MASS_7.3-60         lava_1.7.2.1        sessioninfo_1.2.2  
 [50] loo_2.6.0           tools_4.3.0         zip_2.3.0           future.apply_1.11.0 nnet_7.3-18         R.oo_1.25.0         Metrics_0.1.4      
 [57] callr_3.7.3         R.cache_0.16.0      grid_4.3.0          checkmate_2.2.0     generics_0.1.3      gtable_0.3.3        tzdb_0.4.0         
 [64] R.methodsS3_1.8.2   class_7.3-21        tidyr_1.3.0         hms_1.1.3           xml2_1.3.4          utf8_1.2.3          foreach_1.5.2      
 [71] pillar_1.9.0        stringr_1.5.0       splines_4.3.0       lhs_1.1.6           lattice_0.21-8      renv_0.17.3         survival_3.5-3     
 [78] bit_4.0.5           tidyselect_1.2.0    knitr_1.43          gridExtra_2.3       stats4_4.3.0        xfun_0.39           hardhat_1.3.0      
 [85] timeDate_4022.108   matrixStats_1.0.0   rstan_2.21.8        stringi_1.7.12      DiceDesign_1.9      lazyeval_0.2.2      yaml_2.3.7         
 [92] workflows_1.1.3     evaluate_0.21       codetools_0.2-19    rpart.plot_3.1.1    lares_5.2.2         tibble_3.2.1        cli_3.6.1          
 [99] RcppParallel_5.1.7  rpart_4.1.19        munsell_0.5.0       processx_3.8.1      styler_1.10.1       globals_0.16.2      ellipsis_0.3.2     
[106] gower_1.0.1         prettyunits_1.1.1   bitops_1.0-7        GPfit_1.0-8         listenv_0.9.0       viridisLite_0.4.2   ipred_0.9-14       
[113] xts_0.13.1          prodlim_2023.03.31  openxlsx_4.2.5.2    purrr_1.0.1         crayon_1.5.2        rvest_1.0.3  
@laresbernardo
Copy link
Owner

Hi @Patrikios thanks for reporting this. Would you mind trying again? I've just tested it and it seems it's working. Maybe that was a temporary issue with your/the site's connection?
Screenshot 2023-06-27 at 11 38 32 AM

@laresbernardo laresbernardo self-assigned this Jun 27, 2023
@Patrikios
Copy link
Contributor Author

Thnx for response.
No, on my end does not work. I guess it is the firewall, however its set up to cooperate witht the usual culprits like curl, bash adn of course R. Does the function utilise a specialised system tool that needs setting?

As I suspected firewall I tried as well installing the package locally and it as well produces empty frame.

@laresbernardo
Copy link
Owner

Must be a connection issue then. Does it provide any warning or error? Can you try updating to the latest dev version to see if that helps? Lastly, you could try using prophet's holidays dataset if you want a simple alternative solution.

@Patrikios
Copy link
Contributor Author

I debuged a bit on my maschine.
The problems lies in the last expression bcs the column holiday is all NAs, as follows.

The fisrt part incl. the loop runs OK:

countries = "Venezuela"
years = year(Sys.Date())

results <- NULL
year <- year(Sys.Date())
years <- years[years %in% ((year - 5):(year + 5))]
combs <- expand.grid(years, countries) %>% dplyr::rename(year = "Var1", country = "Var2")
for (i in seq_len(nrow(combs))) {
  message(paste0(">>> Extracting ", combs$country[i], "'s holidays for ", combs$year[i]))
  url <- paste0("https://www.timeanddate.com/holidays/", tolower(combs$country[i]), "/", combs$year[i])
  holidays <- content(GET(url))
  holidays <- holidays %>%
    html_nodes(".table") %>%
    html_table(fill = TRUE) %>%
    data.frame(.) %>%
    filter(!is.na(.data$Date))
  holidays <- holidays[, -2]
  colnames(holidays) <- c("Date", "Holiday", "Holiday.Type")
  holidays$Date <- paste(holidays$Date, combs$year[i])
  if (sum(grepl("de", holidays$Date)) > 0) {
    holidays$Date <- gsub("de ", "", holidays$Date)
  }
  holidays <- holidays[-1, ]
  first <- suppressWarnings(as.numeric(as.character(substr(holidays$Date, 1, 1))))
  if (!is.na(first[1])) {
    holidays$Date <- as.Date(holidays$Date, format = "%d %b %Y")
  } else {
    holidays$Date <- as.Date(holidays$Date, format = "%b %d %Y")
  }
  result <- data.frame(
    holiday = holidays$Date,
    holiday_name = holidays$Holiday,
    holiday_type = holidays$Holiday.Type
  ) %>%
    mutate(
      national = grepl("National|Federal", holidays$Holiday.Type),
      observance = grepl("Observance", holidays$Holiday.Type),
      bank = grepl("Bank", holidays$Holiday.Type),
      nonwork = grepl("Non-working", holidays$Holiday.Type),
      season = grepl("Season", holidays$Holiday.Type),
      hother = !grepl("National|Federal|Observance|Season", holidays$Holiday.Type)
    ) %>%
    {
      if (length(unique(countries)) > 1) {
        mutate(., country = combs$country[i])
      } else {
        .
      }
    }
  result$county <- combs$country[i]
  results <- bind_rows(results, result)
}

results in:

results

   holiday                                    holiday_name                holiday_type national observance  bank nonwork season hother    county
1     <NA>                          The Three Wise Men Day                  Observance    FALSE       TRUE FALSE   FALSE  FALSE  FALSE Venezuela
2     <NA>                          The Three Wise Men Day                Bank holiday    FALSE      FALSE  TRUE   FALSE  FALSE   TRUE Venezuela
3     <NA>                         Three Kings Day Holiday                Bank holiday    FALSE      FALSE  TRUE   FALSE  FALSE   TRUE Venezuela
4     <NA>                              Divina Pastora Day                  Observance    FALSE       TRUE FALSE   FALSE  FALSE  FALSE Venezuela
5     <NA>                        Carnival / Shrove Monday            National holiday     TRUE      FALSE FALSE   FALSE  FALSE  FALSE Venezuela
6     <NA>         Carnival / Shrove Tuesday / Pancake Day            National holiday     TRUE      FALSE FALSE   FALSE  FALSE  FALSE Venezuela
7     <NA>                              Saint Joseph's Day                  Observance    FALSE       TRUE FALSE   FALSE  FALSE  FALSE Venezuela
8     <NA>                                   March Equinox                      Season    FALSE      FALSE FALSE   FALSE   TRUE  FALSE Venezuela
9     <NA>                                     Palm Sunday       Observance, Christian    FALSE       TRUE FALSE   FALSE  FALSE  FALSE Venezuela
10    <NA>                                 Maundy Thursday National holiday, Christian     TRUE      FALSE FALSE   FALSE  FALSE  FALSE Venezuela
11    <NA>                                     Good Friday National holiday, Christian     TRUE      FALSE FALSE   FALSE  FALSE  FALSE Venezuela
12    <NA>                                   Easter Sunday       Observance, Christian    FALSE       TRUE FALSE   FALSE  FALSE  FALSE Venezuela
13    <NA>                     Declaration of Independence            National holiday     TRUE      FALSE FALSE   FALSE  FALSE  FALSE Venezuela
14    <NA>                             Labor Day / May Day            National holiday     TRUE      FALSE FALSE   FALSE  FALSE  FALSE Venezuela
15    <NA>                                   Ascension Day                  Observance    FALSE       TRUE FALSE   FALSE  FALSE  FALSE Venezuela
16    <NA>                           Ascension Day Holiday                Bank holiday    FALSE      FALSE  TRUE   FALSE  FALSE   TRUE Venezuela
17    <NA>                                  Corpus Christi                  Observance    FALSE       TRUE FALSE   FALSE  FALSE  FALSE Venezuela
18    <NA>                          Corpus Christi Holiday                Bank holiday    FALSE      FALSE  TRUE   FALSE  FALSE   TRUE Venezuela
19    <NA>                                St Anthony's Day                  Observance    FALSE       TRUE FALSE   FALSE  FALSE  FALSE Venezuela
20    <NA>                        St Anthony's Day Holiday                Bank holiday    FALSE      FALSE  TRUE   FALSE  FALSE   TRUE Venezuela
21    <NA>                                   June Solstice                      Season    FALSE      FALSE FALSE   FALSE   TRUE  FALSE Venezuela
22    <NA>                                 Carabobo Battle            National holiday     TRUE      FALSE FALSE   FALSE  FALSE  FALSE Venezuela
23    <NA>                      Saint Peter and Saint Paul                  Observance    FALSE       TRUE FALSE   FALSE  FALSE  FALSE Venezuela
24    <NA>              Saint Peter and Saint Paul Holiday                Bank holiday    FALSE      FALSE  TRUE   FALSE  FALSE   TRUE Venezuela
25    <NA>                                Independence Day            National holiday     TRUE      FALSE FALSE   FALSE  FALSE  FALSE Venezuela
26    <NA>                        Simón Bolívar's Birthday            National holiday     TRUE      FALSE FALSE   FALSE  FALSE  FALSE Venezuela
27    <NA>                                      Flag's Day                  Observance    FALSE       TRUE FALSE   FALSE  FALSE  FALSE Venezuela
28    <NA>                            National Guard's Day                  Observance    FALSE       TRUE FALSE   FALSE  FALSE  FALSE Venezuela
29    <NA>                      Assumption of Mary Holiday                Bank holiday    FALSE      FALSE  TRUE   FALSE  FALSE   TRUE Venezuela
30    <NA>                              Assumption of Mary                  Observance    FALSE       TRUE FALSE   FALSE  FALSE  FALSE Venezuela
31    <NA>                            Our Lady of Coromoto                Bank holiday    FALSE      FALSE  TRUE   FALSE  FALSE   TRUE Venezuela
32    <NA>                               September Equinox                      Season    FALSE      FALSE FALSE   FALSE   TRUE  FALSE Venezuela
33    <NA>                    Day of Indigenous Resistance            National holiday     TRUE      FALSE FALSE   FALSE  FALSE  FALSE Venezuela
34    <NA>                 Dr. José Gregorio Hernández Day                  Observance    FALSE       TRUE FALSE   FALSE  FALSE  FALSE Venezuela
35    <NA>     Holiday for Dr. José Gregorio Hernández Day                Bank holiday    FALSE      FALSE  TRUE   FALSE  FALSE   TRUE Venezuela
36    <NA>                                  All Saints Day                  Observance    FALSE       TRUE FALSE   FALSE  FALSE  FALSE Venezuela
37    <NA>                                   All Souls Day                  Observance    FALSE       TRUE FALSE   FALSE  FALSE  FALSE Venezuela
38    <NA>                          All Saints Day Holiday                Bank holiday    FALSE      FALSE  TRUE   FALSE  FALSE   TRUE Venezuela
39    <NA> Day of the Virgin of the Rosary of Chiquinquirá                  Observance    FALSE       TRUE FALSE   FALSE  FALSE  FALSE Venezuela
40    <NA>                                    Aviation Day                  Observance    FALSE       TRUE FALSE   FALSE  FALSE  FALSE Venezuela
41    <NA>              Feast of the Immaculate Conception                  Observance    FALSE       TRUE FALSE   FALSE  FALSE  FALSE Venezuela
42    <NA>  Holiday for Feast of the Immaculate Conception                Bank holiday    FALSE      FALSE  TRUE   FALSE  FALSE   TRUE Venezuela
43    <NA>                               December Solstice                      Season    FALSE      FALSE FALSE   FALSE   TRUE  FALSE Venezuela
44    <NA>                                   Christmas Eve National holiday, Christian     TRUE      FALSE FALSE   FALSE  FALSE  FALSE Venezuela
45    <NA>                                   Christmas Day National holiday, Christian     TRUE      FALSE FALSE   FALSE  FALSE  FALSE Venezuela
46    <NA>                                  New Year's Eve            National holiday     TRUE      FALSE FALSE   FALSE  FALSE  FALSE Venezuela

Then comes:

results %>%
  filter(!is.na(.data$holiday)) %>%
  cleanNames() %>%
  as_tibble()

and voila:

results

# A tibble: 0 × 10
# ℹ 10 variables: holiday <date>, holiday_name <chr>, holiday_type <chr>, national <lgl>, observance <lgl>, bank <lgl>, nonwork <lgl>, season <lgl>, hother <lgl>, county <fct>

@Patrikios
Copy link
Contributor Author

So the problem is here:

as.Date(holidays$Date, format = "%d %b %Y")

trying to parse format:

18. Nov 2023

could be rather, more dynamic:

lubridate::dmy(holidays$Date)

But the problem is there. I wonder, does the table extraction from the HTTP response depend on my German local? If so then withr would solve the issue, temporarily changing the local to English needed.

@Patrikios
Copy link
Contributor Author

There is still isuues, aas it involves formats that are not german local like the formats: 19. Mär 2023 or 20. Mär 2023 which are not kind of ISO should be 19. Mrz 2023. In that case best would be to set options at the call to GET in order to get proper unified response.

@laresbernardo
Copy link
Owner

Hi @Patrikios thanks for taking the time to check this. I've just deployed your suggestion on using lubridate::dmy() instead + enabling ... in case setting up the parameter locale would help fix the issue. Let me know if that helps!

@Patrikios
Copy link
Contributor Author

Patrikios commented Jul 20, 2023

Hi,
there is still some mismatch on the german local, the website gives back some abbreviated months strings in non-standard format, for instance for the month of 'März'. Will look into it.

@Patrikios
Copy link
Contributor Author

Issued a pull request, take a look please. The best way to solve the problem I think is to set header poperly touse en language in the response.

@laresbernardo
Copy link
Owner

I've merged with main. Thanks for the improvement.
But, for some reason I can't replicate, there's a "All formats failed to parse. No formats found." error when running R CMD check. Can you please help me debug?

@Patrikios
Copy link
Contributor Author

Patrikios commented Jul 27, 2023

Yes of course.

Can u give excerpt of the table holidays just before running the try catch part? Are the dates in expected English format?

Otherwise on my windows and Debian 10 Maschinen ran okay, hence I cannot debug locally here unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants