Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rcpp implementation of final size with risk groups #66

Merged
merged 27 commits into from
Oct 18, 2022

Conversation

pratikunterwegs
Copy link
Member

@pratikunterwegs pratikunterwegs commented Oct 7, 2022

This PR implements an Rcpp function final_size_grps_cpp, using the Eigen C++ library, and fixes #40.

The Rcpp code ports C++ code from https://gitlab.com/epidemics-r/code_snippets/-/blob/feature/newton_solver/include/finalsize.hpp.

The PR also fixes #69 by correcting the preparation of the contact matrix, the corrected version in the *_solver.R follows its *_solver.h implementation. Language equivalence tests are included in 3c6ea00.

@pratikunterwegs pratikunterwegs self-assigned this Oct 7, 2022
@pratikunterwegs pratikunterwegs added the New feature New feature or request label Oct 7, 2022
src/iterative_solver.h Outdated Show resolved Hide resolved
@pratikunterwegs pratikunterwegs added Cleanup Clean up files or code for readability. Bug Something isn't working labels Oct 12, 2022
@pratikunterwegs pratikunterwegs marked this pull request as ready for review October 12, 2022 12:31
Copy link
Member

@Bisaloo Bisaloo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The C++ implementation is triggering segfaults on my computer. Please leave me some time to try and figure out what's going on.

@@ -38,7 +38,7 @@ solve_final_size_iterative <- function(contact_matrix,
epi_final_size[i_here] <- 0.0

# matrix filled by columns
contact_matrix_ <- contact_matrix * demography_vector %o% susceptibility
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it normal that the fix here differs from the fix in R/newton_solver.R?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great question, which I don't have the theoretical background to answer. I was also surprised to find that the contact matrix filling differs between the Newton and iterative solvers in Edwin's C++ code, see

  1. Iterative solver (only multiplying cols by demography): https://gitlab.com/epidemics-r/code_snippets/-/blob/feature/newton_solver/include/finalsize.hpp#L51
  2. Newton solver (multiplying rows by susceptibility, and cols by demography): https://gitlab.com/epidemics-r/code_snippets/-/blob/feature/newton_solver/include/finalsize.hpp#L144

Initially, I thought that this must be a mistake, so I implemented both to be the same (taking the Newton solver as 'correct'). While fixing #69 I decided to go with what Edwin had written in case I was wrong (about him making a mistake), and returned it to his implementation. I tried both implementations, and they both pass the solver equivalence tests in both R and C++, even with differences among susceptibility groups, which should make some difference. I have no idea why.

Comment on lines -57 to -60
# partial pivoting LU decomposition
cache_m_pivlu <- Matrix::lu(cache_m)
cache_m_pivlu <- Matrix::expand(cache_m_pivlu)
cache_m_pivlu <- cache_m_pivlu$L %*% cache_m_pivlu$U
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you comment why you removed this? Was it a mistake? Or just unnecessary?

It would probably be good to re-run the benchmarks now as this looks like a pretty expensive step performance-wise (and it was called inside the loop).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially added this step when the Newton solver was failing at particular r0 (~2, 4, 12), because (partial?) pivoting LU decomposition is used in Edwin's code: https://gitlab.com/epidemics-r/code_snippets/-/blob/feature/newton_solver/include/finalsize.hpp#L166, and using Matrix::lu appeared to be the correct way to do this in R.

After finding that the implementation of contact matrix filling (your comment above) did not impact the results, I looked to see what other code might be redundant. This chunk seemed suspicious, and the full test suite (including language and solver equivalence) passed after removing it, so I did. (I've also completed a Julia implementation, which doesn't require this pivoting LU decomp either). Overall the evidence points to this step being unnecessary except for Eigen's solve. Once again, I don't have the theoretical background to guess why this is the case.

You're correct about the speed gain, of course, see this benchmarking from this morning from a conversation with Tim. The Newton solver is now takes about 2x time of the iterative solver in R, whereas it was about 5x before this step was removed.

library(finalsize)
# prepare arguments
contact_matrix <- c(
  5.329620e-08, 1.321156e-08, 1.832293e-08, 7.743492e-09, 5.888440e-09,
  2.267918e-09, 1.321156e-08, 4.662496e-08, 1.574182e-08, 1.510582e-08,
  7.943038e-09, 3.324235e-09, 1.832293e-08, 1.574182e-08, 2.331416e-08,
  1.586565e-08, 1.146566e-08, 5.993247e-09, 7.743492e-09, 1.510582e-08,
  1.586565e-08, 2.038011e-08, 1.221124e-08, 9.049331e-09, 5.888440e-09,
  7.943038e-09, 1.146566e-08, 1.221124e-08, 1.545822e-08, 8.106812e-09,
  2.267918e-09, 3.324235e-09, 5.993247e-09, 9.049331e-09, 8.106812e-09,
  1.572736e-08
) |> matrix(6, 6)

# make a demography vector
demography_vector <- c(
  10831795, 11612456, 13511496,
  11499398, 8167102, 4587765
)
# get an example r0
r0 <- 1.3
contact_matrix = r0 * contact_matrix

# a p_susceptibility matrix
p_susceptibility <- matrix(0.5, nrow(contact_matrix), 2)
susceptibility <- matrix(c(0.1, 0.7), nrow(contact_matrix), 2, byrow = T)

# benchmark implementations
microbenchmark::microbenchmark(
  times = 1000,
  "iterative_solver_r" = final_size_grps(
    contact_matrix = contact_matrix,
    demography_vector = demography_vector,
    susceptibility = susceptibility,
    p_susceptibility = p_susceptibility,
    solver = "iterative",
    control = list(
      iterations = 10000,
      tolerance = 1e-6
    )
  ),
  "newton_solver_r" = final_size_grps(
    contact_matrix = contact_matrix,
    demography_vector = demography_vector,
    susceptibility = susceptibility,
    p_susceptibility = p_susceptibility,
    solver = "newton",
    control = list(
      iterations = 10000,
      tolerance = 1e-6
    )
  ),
  "iterative_solver_cpp" = final_size_grps_cpp(
    contact_matrix = contact_matrix,
    demography_vector = demography_vector,
    susceptibility = susceptibility,
    p_susceptibility = p_susceptibility,
    solver = "iterative",
    control = list(
      iterations = 10000,
      tolerance = 1e-6,
      step_rate = 1.9,
      adapt_step = TRUE
    )
  ),
  "newton_solver_cpp" = final_size_grps_cpp(
    contact_matrix = contact_matrix,
    demography_vector = demography_vector,
    susceptibility = susceptibility,
    p_susceptibility = p_susceptibility,
    solver = "newton",
    control = list(
      iterations = 10000,
      tolerance = 1e-6
    )
  )
)
#> Warning in microbenchmark::microbenchmark(times = 1000, iterative_solver_r =
#> final_size_grps(contact_matrix = contact_matrix, : less accurate nanosecond
#> times to avoid potential integer overflows
#> Unit: microseconds
#>                  expr     min       lq       mean  median       uq      max
#>    iterative_solver_r  63.509  67.7730  75.760374  69.577  74.3535 4012.588
#>       newton_solver_r 136.653 145.2425 166.597596 148.789 156.4970 5910.601
#>  iterative_solver_cpp   5.371   5.8630   6.413671   6.273   6.6420   15.170
#>     newton_solver_cpp  12.628  13.3660  14.171486  13.858  14.5140   53.628
#>  neval
#>   1000
#>   1000
#>   1000
#>   1000

Created on 2022-10-12 by the reprex package (v2.0.1)

7.943038e-09, 1.146566e-08, 1.221124e-08, 1.545822e-08, 8.106812e-09,
2.267918e-09, 3.324235e-09, 5.993247e-09, 9.049331e-09, 8.106812e-09,
1.572736e-08
) |> matrix(6, 6)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of opening an issue about this topic later but please refrain from using the native pipe and lambda functions anywhere in the package, including the examples and the tests as we want:

  • all users to be able to run examples
  • R CMD check to pass on all supported R versions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm converting this to an issue as there are multiple instances of pipes and lambdas in the tests, will remove.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f0de8d7 fixes #70

Comment on lines +149 to +150
# check for correct final size calculation in complex data case
# using newton solver
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a case, e.g. a published paper, with known values of final size? This would be even better than checking the range.

You now added tests to ensure all implementations return the same result but what if the result was wrong in all cases? Do we have a reference implementation / set of values?

Copy link
Member Author

@pratikunterwegs pratikunterwegs Oct 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look around, and also ask one of the PIs. My understanding is that the tests for 'correct answers' in the files test-*_solver.R, as well as those using the upper_limit function in the files test-*_solver_vary_r0.R ensure that the answers are exactly correct in some simple cases at least.

tests/testthat/test-finalsize_grps_cpp_iterative.R Outdated Show resolved Hide resolved
Co-authored-by: Hugo Gruson <Bisaloo@users.noreply.github.com>
pratikunterwegs and others added 2 commits October 12, 2022 19:02
Co-authored-by: Hugo Gruson <Bisaloo@users.noreply.github.com>
@pratikunterwegs
Copy link
Member Author

The C++ implementation is triggering segfaults on my computer. Please leave me some time to try and figure out what's going on.

Thanks @Bisaloo, hope the C++ code works for you - if the errors persist I can also look into them.

@Bisaloo
Copy link
Member

Bisaloo commented Oct 13, 2022

Turns out it's not a segfault. I assumed wrongly when RStudio crashed. When running from the terminal, I get the following error report. Not entirely sure what to get from this 🤔:

R: /home/hugo/.local/share/R/x86_64-pc-linux-gnu-library/4.2/RcppEigen/include/Eigen/src/Core/CwiseBinaryOp.h:110: Eigen::CwiseBinaryOp<BinaryOp, Lhs, Rhs>::CwiseBinaryOp(const Lhs&, const Rhs&, const BinaryOp&) [with BinaryOp = Eigen::internal::scalar_product_op<double, double>; LhsType = const Eigen::ArrayWrapper<Eigen::Map<Eigen::Matrix<double, -1, -1>, 0, Eigen::Stride<0, 0> > >; RhsType = const Eigen::ArrayWrapper<Eigen::Map<Eigen::Matrix<double, -1, -1>, 0, Eigen::Stride<0, 0> > >; Eigen::CwiseBinaryOp<BinaryOp, Lhs, Rhs>::Lhs = Eigen::ArrayWrapper<Eigen::Map<Eigen::Matrix<double, -1, -1>, 0, Eigen::Stride<0, 0> > >; Eigen::CwiseBinaryOp<BinaryOp, Lhs, Rhs>::Rhs = Eigen::ArrayWrapper<Eigen::Map<Eigen::Matrix<double, -1, -1>, 0, Eigen::Stride<0, 0> > >]: Assertion `aLhs.rows() == aRhs.rows() && aLhs.cols() == aRhs.cols()' failed.

@pratikunterwegs
Copy link
Member Author

Turns out it's not a segfault. I assumed wrongly when RStudio crashed. When running from the terminal, I get the following error report. Not entirely sure what to get from this 🤔:

R: /home/hugo/.local/share/R/x86_64-pc-linux-gnu-library/4.2/RcppEigen/include/Eigen/src/Core/CwiseBinaryOp.h:110: Eigen::CwiseBinaryOp<BinaryOp, Lhs, Rhs>::CwiseBinaryOp(const Lhs&, const Rhs&, const BinaryOp&) [with BinaryOp = Eigen::internal::scalar_product_op<double, double>; LhsType = const Eigen::ArrayWrapper<Eigen::Map<Eigen::Matrix<double, -1, -1>, 0, Eigen::Stride<0, 0> > >; RhsType = const Eigen::ArrayWrapper<Eigen::Map<Eigen::Matrix<double, -1, -1>, 0, Eigen::Stride<0, 0> > >; Eigen::CwiseBinaryOp<BinaryOp, Lhs, Rhs>::Lhs = Eigen::ArrayWrapper<Eigen::Map<Eigen::Matrix<double, -1, -1>, 0, Eigen::Stride<0, 0> > >; Eigen::CwiseBinaryOp<BinaryOp, Lhs, Rhs>::Rhs = Eigen::ArrayWrapper<Eigen::Map<Eigen::Matrix<double, -1, -1>, 0, Eigen::Stride<0, 0> > >]: Assertion `aLhs.rows() == aRhs.rows() && aLhs.cols() == aRhs.cols()' failed.

Thanks. I see that Eigen::Map is invovled, and I think it can be safely removed in epi_spread.h, will do so and convert to vectors instead, maybe that will help

@pratikunterwegs
Copy link
Member Author

Turns out it's not a segfault. I assumed wrongly when RStudio crashed. When running from the terminal, I get the following error report. Not entirely sure what to get from this 🤔:

R: /home/hugo/.local/share/R/x86_64-pc-linux-gnu-library/4.2/RcppEigen/include/Eigen/src/Core/CwiseBinaryOp.h:110: Eigen::CwiseBinaryOp<BinaryOp, Lhs, Rhs>::CwiseBinaryOp(const Lhs&, const Rhs&, const BinaryOp&) [with BinaryOp = Eigen::internal::scalar_product_op<double, double>; LhsType = const Eigen::ArrayWrapper<Eigen::Map<Eigen::Matrix<double, -1, -1>, 0, Eigen::Stride<0, 0> > >; RhsType = const Eigen::ArrayWrapper<Eigen::Map<Eigen::Matrix<double, -1, -1>, 0, Eigen::Stride<0, 0> > >; Eigen::CwiseBinaryOp<BinaryOp, Lhs, Rhs>::Lhs = Eigen::ArrayWrapper<Eigen::Map<Eigen::Matrix<double, -1, -1>, 0, Eigen::Stride<0, 0> > >; Eigen::CwiseBinaryOp<BinaryOp, Lhs, Rhs>::Rhs = Eigen::ArrayWrapper<Eigen::Map<Eigen::Matrix<double, -1, -1>, 0, Eigen::Stride<0, 0> > >]: Assertion `aLhs.rows() == aRhs.rows() && aLhs.cols() == aRhs.cols()' failed.

Thanks. I see that Eigen::Map is invovled, and I think it can be safely removed in epi_spread.h, will do so and convert to vectors instead, maybe that will help

I've changed how Eigen::Map is used, and cut down on its use where possible. I also edited the members of the epi_spread function to be VectorXd rather than MatrixXd (although this shouldn't be an issue). Hopefully this will resolve the error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working Cleanup Clean up files or code for readability. New feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Discrepency between R and C implementations Cpp implementation of final size with susceptibility groups
3 participants