-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rcpp implementation of final size with risk groups #66
Conversation
dd8697d
to
0c3332a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The C++ implementation is triggering segfaults on my computer. Please leave me some time to try and figure out what's going on.
@@ -38,7 +38,7 @@ solve_final_size_iterative <- function(contact_matrix, | |||
epi_final_size[i_here] <- 0.0 | |||
|
|||
# matrix filled by columns | |||
contact_matrix_ <- contact_matrix * demography_vector %o% susceptibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it normal that the fix here differs from the fix in R/newton_solver.R
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great question, which I don't have the theoretical background to answer. I was also surprised to find that the contact matrix filling differs between the Newton and iterative solvers in Edwin's C++ code, see
- Iterative solver (only multiplying cols by demography): https://gitlab.com/epidemics-r/code_snippets/-/blob/feature/newton_solver/include/finalsize.hpp#L51
- Newton solver (multiplying rows by susceptibility, and cols by demography): https://gitlab.com/epidemics-r/code_snippets/-/blob/feature/newton_solver/include/finalsize.hpp#L144
Initially, I thought that this must be a mistake, so I implemented both to be the same (taking the Newton solver as 'correct'). While fixing #69 I decided to go with what Edwin had written in case I was wrong (about him making a mistake), and returned it to his implementation. I tried both implementations, and they both pass the solver equivalence tests in both R and C++, even with differences among susceptibility groups, which should make some difference. I have no idea why.
# partial pivoting LU decomposition | ||
cache_m_pivlu <- Matrix::lu(cache_m) | ||
cache_m_pivlu <- Matrix::expand(cache_m_pivlu) | ||
cache_m_pivlu <- cache_m_pivlu$L %*% cache_m_pivlu$U |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you comment why you removed this? Was it a mistake? Or just unnecessary?
It would probably be good to re-run the benchmarks now as this looks like a pretty expensive step performance-wise (and it was called inside the loop).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially added this step when the Newton solver was failing at particular r0
(~2, 4, 12), because (partial?) pivoting LU decomposition is used in Edwin's code: https://gitlab.com/epidemics-r/code_snippets/-/blob/feature/newton_solver/include/finalsize.hpp#L166, and using Matrix::lu
appeared to be the correct way to do this in R.
After finding that the implementation of contact matrix filling (your comment above) did not impact the results, I looked to see what other code might be redundant. This chunk seemed suspicious, and the full test suite (including language and solver equivalence) passed after removing it, so I did. (I've also completed a Julia implementation, which doesn't require this pivoting LU decomp either). Overall the evidence points to this step being unnecessary except for Eigen
's solve
. Once again, I don't have the theoretical background to guess why this is the case.
You're correct about the speed gain, of course, see this benchmarking from this morning from a conversation with Tim. The Newton solver is now takes about 2x time of the iterative solver in R, whereas it was about 5x before this step was removed.
library(finalsize)
# prepare arguments
contact_matrix <- c(
5.329620e-08, 1.321156e-08, 1.832293e-08, 7.743492e-09, 5.888440e-09,
2.267918e-09, 1.321156e-08, 4.662496e-08, 1.574182e-08, 1.510582e-08,
7.943038e-09, 3.324235e-09, 1.832293e-08, 1.574182e-08, 2.331416e-08,
1.586565e-08, 1.146566e-08, 5.993247e-09, 7.743492e-09, 1.510582e-08,
1.586565e-08, 2.038011e-08, 1.221124e-08, 9.049331e-09, 5.888440e-09,
7.943038e-09, 1.146566e-08, 1.221124e-08, 1.545822e-08, 8.106812e-09,
2.267918e-09, 3.324235e-09, 5.993247e-09, 9.049331e-09, 8.106812e-09,
1.572736e-08
) |> matrix(6, 6)
# make a demography vector
demography_vector <- c(
10831795, 11612456, 13511496,
11499398, 8167102, 4587765
)
# get an example r0
r0 <- 1.3
contact_matrix = r0 * contact_matrix
# a p_susceptibility matrix
p_susceptibility <- matrix(0.5, nrow(contact_matrix), 2)
susceptibility <- matrix(c(0.1, 0.7), nrow(contact_matrix), 2, byrow = T)
# benchmark implementations
microbenchmark::microbenchmark(
times = 1000,
"iterative_solver_r" = final_size_grps(
contact_matrix = contact_matrix,
demography_vector = demography_vector,
susceptibility = susceptibility,
p_susceptibility = p_susceptibility,
solver = "iterative",
control = list(
iterations = 10000,
tolerance = 1e-6
)
),
"newton_solver_r" = final_size_grps(
contact_matrix = contact_matrix,
demography_vector = demography_vector,
susceptibility = susceptibility,
p_susceptibility = p_susceptibility,
solver = "newton",
control = list(
iterations = 10000,
tolerance = 1e-6
)
),
"iterative_solver_cpp" = final_size_grps_cpp(
contact_matrix = contact_matrix,
demography_vector = demography_vector,
susceptibility = susceptibility,
p_susceptibility = p_susceptibility,
solver = "iterative",
control = list(
iterations = 10000,
tolerance = 1e-6,
step_rate = 1.9,
adapt_step = TRUE
)
),
"newton_solver_cpp" = final_size_grps_cpp(
contact_matrix = contact_matrix,
demography_vector = demography_vector,
susceptibility = susceptibility,
p_susceptibility = p_susceptibility,
solver = "newton",
control = list(
iterations = 10000,
tolerance = 1e-6
)
)
)
#> Warning in microbenchmark::microbenchmark(times = 1000, iterative_solver_r =
#> final_size_grps(contact_matrix = contact_matrix, : less accurate nanosecond
#> times to avoid potential integer overflows
#> Unit: microseconds
#> expr min lq mean median uq max
#> iterative_solver_r 63.509 67.7730 75.760374 69.577 74.3535 4012.588
#> newton_solver_r 136.653 145.2425 166.597596 148.789 156.4970 5910.601
#> iterative_solver_cpp 5.371 5.8630 6.413671 6.273 6.6420 15.170
#> newton_solver_cpp 12.628 13.3660 14.171486 13.858 14.5140 53.628
#> neval
#> 1000
#> 1000
#> 1000
#> 1000
Created on 2022-10-12 by the reprex package (v2.0.1)
7.943038e-09, 1.146566e-08, 1.221124e-08, 1.545822e-08, 8.106812e-09, | ||
2.267918e-09, 3.324235e-09, 5.993247e-09, 9.049331e-09, 8.106812e-09, | ||
1.572736e-08 | ||
) |> matrix(6, 6) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of opening an issue about this topic later but please refrain from using the native pipe and lambda functions anywhere in the package, including the examples and the tests as we want:
- all users to be able to run examples
- R CMD check to pass on all supported R versions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm converting this to an issue as there are multiple instances of pipes and lambdas in the tests, will remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# check for correct final size calculation in complex data case | ||
# using newton solver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a case, e.g. a published paper, with known values of final size? This would be even better than checking the range.
You now added tests to ensure all implementations return the same result but what if the result was wrong in all cases? Do we have a reference implementation / set of values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look around, and also ask one of the PIs. My understanding is that the tests for 'correct answers' in the files test-*_solver.R
, as well as those using the upper_limit
function in the files test-*_solver_vary_r0.R
ensure that the answers are exactly correct in some simple cases at least.
Co-authored-by: Hugo Gruson <Bisaloo@users.noreply.github.com>
Co-authored-by: Hugo Gruson <Bisaloo@users.noreply.github.com>
Thanks @Bisaloo, hope the C++ code works for you - if the errors persist I can also look into them. |
Turns out it's not a segfault. I assumed wrongly when RStudio crashed. When running from the terminal, I get the following error report. Not entirely sure what to get from this 🤔:
|
Thanks. I see that |
I've changed how |
This PR implements an Rcpp function
final_size_grps_cpp
, using the Eigen C++ library, and fixes #40.The Rcpp code ports C++ code from https://gitlab.com/epidemics-r/code_snippets/-/blob/feature/newton_solver/include/finalsize.hpp.
The PR also fixes #69 by correcting the preparation of the contact matrix, the corrected version in the
*_solver.R
follows its*_solver.h
implementation. Language equivalence tests are included in 3c6ea00.