Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider delaying data download #15

Closed
Enchufa2 opened this issue Sep 23, 2022 · 4 comments · Fixed by #17 or #25
Closed

Consider delaying data download #15

Enchufa2 opened this issue Sep 23, 2022 · 4 comments · Fixed by #17 or #25

Comments

@Enchufa2
Copy link

Data downloads are executed on package load. As a result, the package cannot be installed on systems without an Internet connection (which is the case for package builders for RPM creation, connectivity is not allowed there for security reasons). This data could be placed e.g. in user dirs using tools::R_user_dir.

@guga31bb
Copy link
Member

Yes, I realize this is annoying, sorry. The reason for this is that CRAN has a very strict package size limit that the models included in nfl4th exceed, so the package has to load these models externally and there's no way to make it work without internet connectivity.

An option for not using the package itself is just getting the pre-computed data here.

@Enchufa2
Copy link
Author

Enchufa2 commented Sep 23, 2022

Yes, I know the CRAN policy. My suggestion would be to provide a function that the user needs to call in order to download that data and place it under the user dir, instead of doing that on package load. But yet again, that's a suggestion. :)

The reason for this request is that I have an RPM repository for Fedora with all (most) CRAN packages and this one cannot be built due to this issue. For now, I had to remove nfl4th and nflverse (which depends on nfl4th) for now.

If you decide to address this, please let me know to add them back.

@mrcaseb
Copy link
Member

mrcaseb commented Sep 27, 2022

Instead of loading the files on load we could add something like this to the package and call it from inside the other functions

init_nfl4th <- function() {
  pkg_env <- ls("package:nfl4th")
  
  if (!".games_nfl4th" %in% pkg_env){
    .games_nfl4th <- get_games_file()
    assign(".games_nfl4th", .games_nfl4th, envir = parent.env(environment()))
  }
  
  if (!"fd_model" %in% pkg_env){
    fd_model <- load_fd_model()
    assign("fd_model", fd_model, envir = parent.env(environment()))
  }
}

@Enchufa2
Copy link
Author

If these downloads are not small, an improvement would be to cache them for the current session under R's temporary directory. In this way, it would be easier to enable an option so that the user can specify a permanent cache directory.

Alternatively, tools::R_user_dir returns an appropriate directory in every platform to cache data permanently. This is essentially the same functionality as the rappdirs package provides, which was ported to R.

@mrcaseb mrcaseb linked a pull request Sep 29, 2022 that will close this issue
@guga31bb guga31bb linked a pull request Jul 25, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants