Add use_memoise helper to memoise chull fct on the fly #80

Bisaloo · 2023-11-17T16:21:20Z

This is a test to fix #77. I seem to remember we tried a similar approach initially but encountered some issues. I don't remember what exactly so this PR will at least serve the purpose of documenting our design choice better.

To do:

verify the function is actually memoised and behaving as expected
update tests
update docs
find a better name than f
add a check if we're running in parallel in use_memoise()

Rekyt · 2023-12-02T08:34:02Z

I like the way you designed it!
I also wonder with the need to check requireNamespace("fundiversity") within fundiversity functions themselves!
Because if we're in fd_fric() of course fundiversity must have been installed. Or maybe I missed something?

I have however a question, because, as I understood the way memoise works, each time you're doing memoise::memoise(fun) it reinitializes the function and thus resets the cache.

What I envisioned initially was to create memoized version of fd_chull() at load time (as recommended in the memoise docs) but name it differently like fd_chull_memoised() and use your branching ifelse() pattern to say:

if (use_memoise()) {
  f <- fd_chull_memoised
} else {
  f <- fd_chull
}

Rekyt · 2023-12-02T13:09:25Z

I'm coming back with some data to support my understanding.
So I've benchmarked the solution you wrote in this branch which gave me:

# A tibble: 2 × 13
  expression              min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result      memory                  time       gc      
  <bch:expr>         <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>      <list>                  <list>     <list>  
1 regular_no_memoise    1.16s    1.24s     0.811      26MB     7.30     5    45      6.17s <list [50]> <Rprofmem [21,217 × 3]> <bench_tm> <tibble>
2 regular_memoised      1.07s    1.27s     0.818    32.5MB    12.8      3    47      3.67s <list [50]> <Rprofmem [25,040 × 3]> <bench_tm> <tibble>

Compared to the solution that creates a memoised version of chull when loading the package with:

.onLoad <- function(libname, pkgname) {
  fd_chull_memoised <<- memoise::memoise(fd_chull)
}

and

f <- if (use_memoise()) {
    fd_chull_memoised
  } else {
    fd_chull
  }

These are the results I'm getting:

> bench::mark(
+   onload_no_memoise = {
+   options(fundiversity.memoise = FALSE)
+   
+   lapply(1:50, \(x) fd_fric(traits_birds, site_sp_birds))
+ },
+   onload_memoised = {
+   options(fundiversity.memoise = TRUE)
+   
+   lapply(1:50, \(x) fd_fric(traits_birds, site_sp_birds))
+ }, iterations = 50)
# A tibble: 2 × 13
  expression             min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result      memory                  time            gc      
  <bch:expr>        <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>      <list>                  <list>          <list>  
1 onload_no_memoise    1.12s    2.26s     0.492    26.1MB     3.02     7    43      14.2s <list [50]> <Rprofmem [28,169 × 3]> <bench_tm [50]> <tibble>
2 onload_memoised   730.43ms 792.26ms     1.26     20.5MB     1.74    21    29      16.7s <list [50]> <Rprofmem [15,607 × 3]> <bench_tm [50]> <tibble>

So, clearly memoizing fd_chull() each time we perform fd_fric() doesn't allow as much time gain.
But then it's more a philosophy issue then. Do we want to memoise at the package scale, in which case the cache may grow bigger, but there is higher potential of time gain (unless you're doing really a lot of computation)?
Or do we want to do this at the function scale which reduces the usefulness to large site-species matrices where you can encounter several times the same trait and species combination?

I would rather go for the former because I think it may be more useful, but this depends on how people are using fundiversity.

Bisaloo · 2023-12-03T15:49:32Z

Yes, I think you're right and what you're proposing makes sense. I can implement it next week if that sounds good to you. I also plan to address #71 (comment) in this PR, as mentioned in the first comment.

Rekyt · 2023-12-03T21:12:33Z

Sounds great!

codecov-commenter · 2024-03-19T20:48:37Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (d11d749) to head (fffbaab).

Additional details and impacted files

@@            Coverage Diff            @@
##              main       #80   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           10        11    +1     
  Lines          269       295   +26     
=========================================
+ Hits           269       295   +26

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Rekyt · 2024-03-20T08:14:12Z

Hey @Bisaloo, I finally took the time to work on this PR.
I think it answers most of the comment.

I have two main issues with it for now:

It adds withr as Suggests because it's the cleanest way to fiddle with options in the test suite
It doesn't actually check that the convex hull function that are used are the memoised version, because for this it would need to access fd_fric() (and others) internals

Please tell me if you would have the time to review the PR

Rekyt · 2024-03-21T08:31:55Z

One thing to do before merging is to bump the minor version number and update the NEWS file accordingly.

I also wonder if we should document memoization elsewhere in the package.

R/fd_fric_intersect.R

Bisaloo · 2024-03-22T13:48:17Z

R/use_memoise.R

+  if (!inherits(future::plan(), "sequential")) {
+    return(FALSE)
+  }


Do we need to warn users here?

Yes we should!
A warning or a message?

It should be a message, but then the risk is because this is triggered each time use_memoise() is used. We wouldn't want for the user to be swamped by such messages when running in parallel.
Do you know a way to deal with that in base R?
I don't want to add more dependencies to fundiversity...

My first intuition would be to have a logical variable warned_future_memoise in an internal environment that starts at FALSE and then is set to TRUE once the message has been printed once.

It seems like overengineering a little bit though and I'm not completely sure how it would work in parallel (is the package attached each time in each new worker?) 🤔

Maybe we can just document it very clearly somewhere and leave it as is. It's probably not an actionable piece of info for the users anyways.

I'll document as thoroughly as possible everywhere I find it useful.

R/use_memoise.R

Co-authored-by: Hugo Gruson <10783929+Bisaloo@users.noreply.github.com>

…ndiversity into memoise-on-the-fly

Add use_memoise helper to memoise chull fct on the fly

ca52076

Rekyt mentioned this pull request Dec 3, 2023

Using future and memoise together #71

Closed

Bisaloo and others added 7 commits December 9, 2023 13:36

Memoise once but dispatch on the fly

b6f87e4

Also memoise first computation

a31c964

Add basic tests for use_memoise()

e47fc0c

Slightly better error message

ec60d07

Test for non sequential plan

2b7bc37

Add 'withr' as Suggests

eb169c1

Fix tests for fd_chull() and fd_chull_intersect()

d8bc50f

Rekyt added 8 commits March 19, 2024 22:11

Remove global memoise option in tests

a82b8ca

Memoise fd_chull_intersect() as well

3e8219b

Update function documentation of memoisation

5d902a9

Update vignettes to explain memoisation

2535ec2

Made internal function name more explicit than 'f'

d56a970

Warn memoise with caution (fixes #71 #77)

1475a3e

Skip line

21a1f39

Add tests for unmemoised versions

33509b8

Rekyt marked this pull request as ready for review March 20, 2024 08:12

Merged tests

61c4576

Document fundiversity.memoise option via roxygen2

15ab2bc

Bisaloo commented Mar 22, 2024

View reviewed changes

Bisaloo and others added 2 commits March 22, 2024 15:30

Add test to ensure use_memoise() really triggers memoise

899ab3e

Update R/fd_fric_intersect.R

7c9ea13

Co-authored-by: Hugo Gruson <10783929+Bisaloo@users.noreply.github.com>

Rekyt and others added 9 commits March 22, 2024 16:38

Update R/use_memoise.R

4b89657

Co-authored-by: Hugo Gruson <10783929+Bisaloo@users.noreply.github.com>

Add comment on exists()

e1976ee

Merge branch 'memoise-on-the-fly' of https://github.com/funecology/fu…

6f4160d

…ndiversity into memoise-on-the-fly

Fix pkgdown

2e4a8a1

Add warning in README

82b297a

Update Rmd workflow

b42284e

Render README

114e95e

Update doc on memoization vs. parallelization

246c196

Correct typo

fffbaab

Rekyt merged commit cc35fc4 into main Apr 3, 2024
14 checks passed

Rekyt deleted the memoise-on-the-fly branch April 3, 2024 20:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add use_memoise helper to memoise chull fct on the fly #80

Add use_memoise helper to memoise chull fct on the fly #80

Bisaloo commented Nov 17, 2023 •

edited by Rekyt

Rekyt commented Dec 2, 2023 •

edited

Rekyt commented Dec 2, 2023

Bisaloo commented Dec 3, 2023

Rekyt commented Dec 3, 2023

codecov-commenter commented Mar 19, 2024 •

edited by codecov bot

Rekyt commented Mar 20, 2024 •

edited

Rekyt commented Mar 21, 2024

Bisaloo Mar 22, 2024

Rekyt Mar 22, 2024

Rekyt Mar 22, 2024

Bisaloo Mar 25, 2024

Rekyt Mar 25, 2024

Add use_memoise helper to memoise chull fct on the fly #80

Add use_memoise helper to memoise chull fct on the fly #80

Conversation

Bisaloo commented Nov 17, 2023 • edited by Rekyt

Rekyt commented Dec 2, 2023 • edited

Rekyt commented Dec 2, 2023

Bisaloo commented Dec 3, 2023

Rekyt commented Dec 3, 2023

codecov-commenter commented Mar 19, 2024 • edited by codecov bot

Codecov Report

Rekyt commented Mar 20, 2024 • edited

Rekyt commented Mar 21, 2024

Bisaloo Mar 22, 2024

Choose a reason for hiding this comment

Rekyt Mar 22, 2024

Choose a reason for hiding this comment

Rekyt Mar 22, 2024

Choose a reason for hiding this comment

Bisaloo Mar 25, 2024

Choose a reason for hiding this comment

Rekyt Mar 25, 2024

Choose a reason for hiding this comment

Bisaloo commented Nov 17, 2023 •

edited by Rekyt

Rekyt commented Dec 2, 2023 •

edited

codecov-commenter commented Mar 19, 2024 •

edited by codecov bot

Rekyt commented Mar 20, 2024 •

edited