Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache_filesystem() shared folder across computers unexpectedly causes twice as many cache files #106

Closed
katrinleinweber opened this issue Jan 24, 2020 · 5 comments

Comments

@katrinleinweber
Copy link

katrinleinweber commented Jan 24, 2020

I'm querying a public API (metadata for DOIs) and share the cache folder between 2 computers. From a fixed set ca. 10k DOIs, my script uses sample_frac(0.1) in each run, so that the 2 computers (macOS & Win8) have a fair chance of downloading different DOIs and then obtaining the API response from the cache.

However, I observe that with:

download_from_DC_ <- function(DOI, test = FALSE) {
  if (test == FALSE) Sys.sleep(1)

  return(rdatacite::dc_dois(query = DOI))
}

c <- memoise::cache_filesystem(paste0(rprojroot::find_root("DESCRIPTION"), "/data/cache"))
download_from_DC <- memoise::memoise(download_from_DC_, cache = c)

plus:

  1. a README.Rmd file in which the dataset is loaded, sampled & plotted, and
  2. a bash script to loop through rmarkdown::render('README.Rmd'),

my shared chache grows towards twice the size (20k files in the cache folder) I expect (10k). Otherwise, everything seems fine: each subsequent rendering is faster than the one before, logging of memoise::hash_cache(download_from_DC )(DOI) output shows that each DOI does get cached after its first download, etc.

Do I need to set the envir parameter in memoise() in a way that is device-independent?
Or could the different absolute paths from rprojroot::find_root() (Mac & Win) cause this?

I'm about to check this latter possibility, and/or rerun the check with 3 computers, but I'd also be happy about any advice about this :-) Thank you!

@katrinleinweber
Copy link
Author

Removing rprojroot::find_root() in favor of the relative cache_filesystem("data/cache/") did not abolish the unexpected behavious of twice as many cache files.

@katrinleinweber katrinleinweber changed the title Can cache_filesystem() be used with a shared folder to share download burden across several computers? cache_filesystem() shared folder across computers unexpectedly causes twice as many cache files Jan 24, 2020
@katrinleinweber
Copy link
Author

katrinleinweber commented Jan 24, 2020

Possibly related to #105 because my project is also organised as an R package. The analysis code in README.Rmd uses devtools::load_all(".") to get the function definitions from R/download_from_DC.R.

@katrinleinweber
Copy link
Author

I've given up on this approach. Not sure why or how exactly, but something about the local enviroment is being hashed into the cache files' names, thus preventing the cache from being shared between my 2 computers.

@jimhester
Copy link
Member

jimhester commented Mar 6, 2020

It is likely the source references for the functions, there are some fixes in the devel version of memoise but they might not be on CRAN.

@katrinleinweber
Copy link
Author

katrinleinweber commented Mar 6, 2020

some fixes in the devel version

Thank you for the hint :-) I'll install from there when I need a shared cache again. Alternatively: How about a minor release?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants