Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add use_memoise helper to memoise chull fct on the fly #80

Merged
merged 29 commits into from
Apr 3, 2024

Conversation

Bisaloo
Copy link
Collaborator

@Bisaloo Bisaloo commented Nov 17, 2023

This is a test to fix #77. I seem to remember we tried a similar approach initially but encountered some issues. I don't remember what exactly so this PR will at least serve the purpose of documenting our design choice better.

To do:

  • verify the function is actually memoised and behaving as expected
  • update tests
  • update docs
  • find a better name than f
  • add a check if we're running in parallel in use_memoise()

@Rekyt
Copy link
Member

Rekyt commented Dec 2, 2023

I like the way you designed it!
I also wonder with the need to check requireNamespace("fundiversity") within fundiversity functions themselves!
Because if we're in fd_fric() of course fundiversity must have been installed. Or maybe I missed something?

I have however a question, because, as I understood the way memoise works, each time you're doing memoise::memoise(fun) it reinitializes the function and thus resets the cache.

What I envisioned initially was to create memoized version of fd_chull() at load time (as recommended in the memoise docs) but name it differently like fd_chull_memoised() and use your branching ifelse() pattern to say:

if (use_memoise()) {
  f <- fd_chull_memoised
} else {
  f <- fd_chull
}

@Rekyt
Copy link
Member

Rekyt commented Dec 2, 2023

I'm coming back with some data to support my understanding.
So I've benchmarked the solution you wrote in this branch which gave me:

# A tibble: 2 × 13
  expression              min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result      memory                  time       gc      
  <bch:expr>         <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>      <list>                  <list>     <list>  
1 regular_no_memoise    1.16s    1.24s     0.811      26MB     7.30     5    45      6.17s <list [50]> <Rprofmem [21,217 × 3]> <bench_tm> <tibble>
2 regular_memoised      1.07s    1.27s     0.818    32.5MB    12.8      3    47      3.67s <list [50]> <Rprofmem [25,040 × 3]> <bench_tm> <tibble>

Compared to the solution that creates a memoised version of chull when loading the package with:

.onLoad <- function(libname, pkgname) {
  fd_chull_memoised <<- memoise::memoise(fd_chull)
}

and

f <- if (use_memoise()) {
    fd_chull_memoised
  } else {
    fd_chull
  }

These are the results I'm getting:

> bench::mark(
+   onload_no_memoise = {
+   options(fundiversity.memoise = FALSE)
+   
+   lapply(1:50, \(x) fd_fric(traits_birds, site_sp_birds))
+ },
+   onload_memoised = {
+   options(fundiversity.memoise = TRUE)
+   
+   lapply(1:50, \(x) fd_fric(traits_birds, site_sp_birds))
+ }, iterations = 50)
# A tibble: 2 × 13
  expression             min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result      memory                  time            gc      
  <bch:expr>        <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>      <list>                  <list>          <list>  
1 onload_no_memoise    1.12s    2.26s     0.492    26.1MB     3.02     7    43      14.2s <list [50]> <Rprofmem [28,169 × 3]> <bench_tm [50]> <tibble>
2 onload_memoised   730.43ms 792.26ms     1.26     20.5MB     1.74    21    29      16.7s <list [50]> <Rprofmem [15,607 × 3]> <bench_tm [50]> <tibble>

So, clearly memoizing fd_chull() each time we perform fd_fric() doesn't allow as much time gain.
But then it's more a philosophy issue then. Do we want to memoise at the package scale, in which case the cache may grow bigger, but there is higher potential of time gain (unless you're doing really a lot of computation)?
Or do we want to do this at the function scale which reduces the usefulness to large site-species matrices where you can encounter several times the same trait and species combination?

I would rather go for the former because I think it may be more useful, but this depends on how people are using fundiversity.

@Bisaloo
Copy link
Collaborator Author

Bisaloo commented Dec 3, 2023

Yes, I think you're right and what you're proposing makes sense. I can implement it next week if that sounds good to you. I also plan to address #71 (comment) in this PR, as mentioned in the first comment.

@Rekyt
Copy link
Member

Rekyt commented Dec 3, 2023

Sounds great!

@codecov-commenter
Copy link

codecov-commenter commented Mar 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (d11d749) to head (fffbaab).

Additional details and impacted files
@@            Coverage Diff            @@
##              main       #80   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           10        11    +1     
  Lines          269       295   +26     
=========================================
+ Hits           269       295   +26     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Rekyt Rekyt marked this pull request as ready for review March 20, 2024 08:12
@Rekyt
Copy link
Member

Rekyt commented Mar 20, 2024

Hey @Bisaloo, I finally took the time to work on this PR.
I think it answers most of the comment.

I have two main issues with it for now:

  • It adds withr as Suggests because it's the cleanest way to fiddle with options in the test suite
  • It doesn't actually check that the convex hull function that are used are the memoised version, because for this it would need to access fd_fric() (and others) internals

Please tell me if you would have the time to review the PR

@Rekyt
Copy link
Member

Rekyt commented Mar 21, 2024

One thing to do before merging is to bump the minor version number and update the NEWS file accordingly.

I also wonder if we should document memoization elsewhere in the package.

R/fd_fric_intersect.R Outdated Show resolved Hide resolved
Comment on lines +7 to +9
if (!inherits(future::plan(), "sequential")) {
return(FALSE)
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to warn users here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we should!
A warning or a message?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be a message, but then the risk is because this is triggered each time use_memoise() is used. We wouldn't want for the user to be swamped by such messages when running in parallel.
Do you know a way to deal with that in base R?
I don't want to add more dependencies to fundiversity...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first intuition would be to have a logical variable warned_future_memoise in an internal environment that starts at FALSE and then is set to TRUE once the message has been printed once.

It seems like overengineering a little bit though and I'm not completely sure how it would work in parallel (is the package attached each time in each new worker?) 🤔

Maybe we can just document it very clearly somewhere and leave it as is. It's probably not an actionable piece of info for the users anyways.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll document as thoroughly as possible everywhere I find it useful.

R/use_memoise.R Show resolved Hide resolved
R/use_memoise.R Outdated Show resolved Hide resolved
Bisaloo and others added 2 commits March 22, 2024 15:30
Co-authored-by: Hugo Gruson <10783929+Bisaloo@users.noreply.github.com>
@Rekyt Rekyt merged commit cc35fc4 into main Apr 3, 2024
14 checks passed
@Rekyt Rekyt deleted the memoise-on-the-fly branch April 3, 2024 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Be more specific in the help with memoization
3 participants