Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault on making use of ode_rk45 with integrate_1d #2848

Open
jpmvferreira opened this issue Nov 15, 2022 · 11 comments
Open

Segmentation fault on making use of ode_rk45 with integrate_1d #2848

jpmvferreira opened this issue Nov 15, 2022 · 11 comments
Assignees

Comments

@jpmvferreira
Copy link

Description

When solving an ODE using ode_rk45 which takes the form
$\left.u^{\prime}=\frac{e^{-\lambda u^2}}{\lambda u\left(u^{-2}-2 \lambda\right)-u^{-3}}\left[\frac{3}{2}\Omega_m(1+z)^2+2 \Omega_r(1+z)^3\right)\right]$,

where $\Omega_m$, $\Omega_r$ and $\lambda$ are parameters, and then integrating $u(t)$ using integrate_1d, i.e. computing

$X(t) = \int_0^t u(x) dx$,

where the vector of observations is the set of values $(t, X, \sigma)$, yields a segmentation fault.

However, if I make use of ode_ckrk instead, it does not give a segmentation fault, and the program runs as expected.
Even though it runs as expected it takes a long time due to the fact that the sampler rejects a lot of steps due to the error estimate in the integral being great than the given relative tolerance.

Using ode_rk45 successfully

Here's a Stan model file that makes use of ode_rk45 to solve the previous ODE successfully:

functions {
  // return u'(t) to use in ODE solver
  vector ode(real t, vector y, array[] real theta) {
    real Omega_m = theta[1];
    real Omega_r = theta[2];
    real lambda = theta[3];
    real u = y[1];

    real f1 = exp(-lambda*u^2) / (lambda*u*(u^(-2)-2*lambda) - u^(-3));
    real f2 = 1.5*(Omega_m)*(1+t)^2 + 2*Omega_r*(1+t)^3;

    vector[1] uderiv;
    uderiv[1] = f1*f2;

    return uderiv;
  }
}

data {
  array[10] real t;
  array[10] real uobs;
  array[10] real error;
}

transformed data {
}

parameters {
  real Omega_m;
  real Omega_r;
  real lambda;
}

transformed parameters {
  // parameter array
  array[3] real theta = {Omega_m, Omega_r, lambda};

  // initial conditions
  real t0 = 0;
  vector[1] init;
  init[1] = 1;

  // obtain u(t) using ODE solver
  array[10] real u;
  array[10] vector[1] sol = ode_rk45(ode, init, t0, t, theta);
  for (i in 1:10) {
    u[i] = sol[i][1];
  }
}

model {
  // priors
  Omega_m ~ normal(0.3, 10);
  Omega_r ~ normal(0, 10);
  lambda ~ normal(1, 10);

  // likelihood
  u ~ normal(uobs, error);
}

generated quantities {
}

Compiling and running this model using stanc reveals no issues whatsoever, and I am able to recover the parameters I have used to generate a mock dataset (given in data.csv).

Adding the integration routine to the previous model

If I now add a new function, which returns the value of $u(t)$, i.e. it solves the ODE for a specific time, so that I can then integrate it using integrate_1d in the transformed parameters block.
Doing so the model now looks as follows:

functions {
  // return u'(t) to use in ODE solver
  vector dudt(real t, vector y, array[] real theta) {
    real Omega_m = theta[1];
    real Omega_r = theta[2];
    real lambda = theta[3];
    real u = y[1];

    real f1 = exp(-lambda*u^2) / (lambda*u*(u^(-2)-2*lambda) - u^(-3));
    real f2 = 1.5*(Omega_m)*(1+t)^2 + 2*Omega_r*(1+t)^3;

    vector[1] uderiv;
    uderiv[1] = f1*f2;

    return uderiv;
  }

  // return u(t) to use in integration routine
  real ut(real t, real xc, array[] real theta, array[] real x_r, array[] int x_i) {
    // initial conditions
    real t0 = 0;
    vector[1] init;
    init[1] = 1;

    // obtain u(t) using ODE solver
    real sol = ode_rk45(dudt, init, t0, {t}, theta)[1][1];

    return sol;
  }
}

data {
  array[10] real t;
  array[10] real intofuobs;
  array[10] real error;
}

transformed data {
  // create empty x_r and x_i arrays to provide to integrate_1d given that they are required arguments
  array[0] real x_r;
  array[0] int x_i;
}

parameters {
  real Omega_m;
  real Omega_r;
  real lambda;
}

transformed parameters {
  // parameter array
  array[3] real theta = {Omega_m, Omega_r, lambda};

  // value of intofu given by the current parameters
  array[10] real intofu;
  for (i in 1:10) {
    intofu[i] = integrate_1d(ut, 0, t[i], theta, x_r, x_i);
  }
}

model {
  // priors
  Omega_m ~ normal(0.3, 10);
  Omega_r ~ normal(0, 10);
  lambda ~ normal(1, 10);

  // likelihood
  intofu ~ normal(intofuobs, error);
}

generated quantities {
}

I compiled this model with two additional flags CXXFLAGS += -fsanitize=undefined -fsanitize=address and when running this model with a new mock dataset (given in data-int.csv) the following error was caught during runtime:

method = sample (Default)
  sample
    num_samples = 1000 (Default)
    num_warmup = 1000 (Default)
    save_warmup = 0 (Default)
    thin = 1 (Default)
    adapt
      engaged = 1 (Default)
      gamma = 0.050000000000000003 (Default)
      delta = 0.80000000000000004 (Default)
      kappa = 0.75 (Default)
      t0 = 10 (Default)
      init_buffer = 75 (Default)
      term_buffer = 50 (Default)
      window = 25 (Default)
    algorithm = hmc (Default)
      hmc
        engine = nuts (Default)
          nuts
            max_depth = 10 (Default)
        metric = diag_e (Default)
        metric_file =  (Default)
        stepsize = 1 (Default)
        stepsize_jitter = 0 (Default)
    num_chains = 1 (Default)
id = 1 (Default)
data
  file = tmp/data.json
init = 2 (Default)
random
  seed = 2184794190 (Default)
output
  file = output.csv (Default)
  diagnostic_file =  (Default)
  refresh = 100 (Default)
  sig_figs = -1 (Default)
  profile_file = profile.csv (Default)
num_threads = 1 (Default)

/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/include/c++/bits/stl_iterator.h:1026:17: runtime error: reference binding to null pointer of type 'const double'
stan/lib/stan_math/lib/boost_1.78.0/boost/numeric/odeint/algebra/detail/for_each.hpp:90:15: runtime error: reference binding to null pointer of type 'const double'
/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/include/c++/bits/stl_iterator.h:1026:17: runtime error: reference binding to null pointer of type 'const double'
stan/lib/stan_math/lib/boost_1.78.0/boost/numeric/odeint/algebra/detail/for_each.hpp:90:15: runtime error: reference binding to null pointer of type 'const double'
/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/include/c++/bits/stl_iterator.h:1026:17: runtime error: reference binding to null pointer of type 'const double'
stan/lib/stan_math/lib/boost_1.78.0/boost/numeric/odeint/algebra/detail/for_each.hpp:90:15: runtime error: reference binding to null pointer of type 'const double'
/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/include/c++/bits/stl_iterator.h:1026:17: runtime error: reference binding to null pointer of type 'const double'
stan/lib/stan_math/lib/boost_1.78.0/boost/numeric/odeint/algebra/detail/for_each.hpp:90:15: runtime error: reference binding to null pointer of type 'const double'
stan/lib/stan_math/lib/boost_1.78.0/boost/numeric/odeint/algebra/default_operations.hpp:198:59: runtime error: load of null pointer of type 'const double'
AddressSanitizer:DEADLYSIGNAL
=================================================================
==708412==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x564fa26a2936 bp 0x7fffc985a6f0 sp 0x7fffc98587a0 T0)
==708412==The signal is caused by a READ memory access.
==708412==Hint: address points to the zero page.
    #0 0x564fa26a2936 in std::vector<Eigen::Matrix<stan::return_type<Eigen::Matrix<double, -1, 1, 0, -1, 1>, double, double, std::vector<double, std::allocator<double> > >::type, -1, 1, 0, -1, 1>, std::allocator<Eigen::Matrix<stan::return_type<Eigen::Matrix<double, -1, 1, 0, -1, 1>, double, double, std::vector<double, std::allocator<double> > >::type, -1, 1, 0, -1, 1> > > stan::math::ode_rk45_tol_impl<model_model_namespace::dudt_odefunctor__, Eigen::Matrix<double, -1, 1, 0, -1, 1>, double, double, std::vector<double, std::allocator<double> >, (void*)0>(char const*, model_model_namespace::dudt_odefunctor__ const&, Eigen::Matrix<double, -1, 1, 0, -1, 1> const&, double, std::vector<double, std::allocator<double> > const&, double, double, long, std::ostream*, std::vector<double, std::allocator<double> > const&) (/home/undercover/downloads/cmdstan-2.30.1/tmp/model+0x82f936)
    #1 0x564fa24f3940 in boost::math::tools::promote_args<double, double, double, double, float, float>::type model_model_namespace::ut<double, double, double, double, (void*)0>(double const&, double const&, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&, std::vector<int, std::allocator<int> > const&, std::ostream*) [clone .isra.0] (/home/undercover/downloads/cmdstan-2.30.1/tmp/model+0x680940)
    #2 0x564fa26ac79b in _ZZN5boost4math10quadrature9tanh_sinhIdNS0_8policies6policyINS3_14default_policyES5_S5_S5_S5_S5_S5_S5_S5_S5_S5_S5_S5_EEE9integrateIZN4stan4math9integrateIZNSA_17integrate_1d_implI20integrate_1d_adapterIN21model_model_namespace12ut_functor__EEJSt6vectorIdSaIdEESJ_SH_IiSaIiEEELPv0EEEdRKT_dddPSoDpRKT0_EUlSP_RKT0_E_EEdSP_dddEUlddE2_EEKDTclcl7declvalISN_EEcl7declvalIdEEcl7declvalIdEEEESN_dddPdS12_PmENKUlddE_clEdd (/home/undercover/downloads/cmdstan-2.30.1/tmp/model+0x83979b)
    #3 0x564fa26c6b35 in _ZNK5boost4math10quadrature6detail16tanh_sinh_detailIdNS0_8policies6policyINS4_14default_policyES6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_EEE9integrateIZNS1_9tanh_sinhIdS7_E9integrateIZN4stan4math9integrateIZNSE_17integrate_1d_implI20integrate_1d_adapterIN21model_model_namespace12ut_functor__EEJSt6vectorIdSaIdEESN_SL_IiSaIiEEELPv0EEEdRKT_dddPSoDpRKT0_EUlST_RKT0_E_EEdST_dddEUlddE2_EEKDTclcl7declvalISR_EEcl7declvalIdEEcl7declvalIdEEEESR_dddPdS16_PmEUlddE_EES14_SR_S16_S16_PKcdddS17_ (/home/undercover/downloads/cmdstan-2.30.1/tmp/model+0x853b35)
    #4 0x564fa26d388a in _ZN5boost4math10quadrature9tanh_sinhIdNS0_8policies6policyINS3_14default_policyES5_S5_S5_S5_S5_S5_S5_S5_S5_S5_S5_S5_EEE9integrateIZN4stan4math9integrateIZNSA_17integrate_1d_implI20integrate_1d_adapterIN21model_model_namespace12ut_functor__EEJSt6vectorIdSaIdEESJ_SH_IiSaIiEEELPv0EEEdRKT_dddPSoDpRKT0_EUlSP_RKT0_E_EEdSP_dddEUlddE2_EEKDTclcl7declvalISN_EEcl7declvalIdEEcl7declvalIdEEEESN_dddPdS12_Pm (/home/undercover/downloads/cmdstan-2.30.1/tmp/model+0x86088a)
    #5 0x564fa28dbcb3 in double stan::math::integrate<stan::math::integrate_1d_impl<integrate_1d_adapter<model_model_namespace::ut_functor__>, std::vector<double, std::allocator<double> >, std::vector<double, std::allocator<double> >, std::vector<int, std::allocator<int> >, (void*)0>(integrate_1d_adapter<model_model_namespace::ut_functor__> const&, double, double, double, std::ostream*, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&, std::vector<int, std::allocator<int> > const&)::{lambda(auto:1 const&, auto:2 const&)#1}>(stan::math::integrate_1d_impl<integrate_1d_adapter<model_model_namespace::ut_functor__>, std::vector<double, std::allocator<double> >, std::vector<double, std::allocator<double> >, std::vector<int, std::allocator<int> >, (void*)0>(integrate_1d_adapter<model_model_namespace::ut_functor__> const&, double, double, double, std::ostream*, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&, std::vector<int, std::allocator<int> > const&)::{lambda(auto:1 const&, auto:2 const&)#1} const&, double, double, double) (/home/undercover/downloads/cmdstan-2.30.1/tmp/model+0xa68cb3)
    #6 0x564fa28e8a81 in stan::scalar_type<std::vector<double, std::allocator<double> >, void>::type model_model_namespace::model_model::log_prob_impl<false, true, std::vector<double, std::allocator<double> >, std::vector<int, std::allocator<int> >, (void*)0, (void*)0>(std::vector<double, std::allocator<double> >&, std::vector<int, std::allocator<int> >&, std::ostream*) const (/home/undercover/downloads/cmdstan-2.30.1/tmp/model+0xa75a81)
    #7 0x564fa2992af0 in std::vector<double, std::allocator<double> > stan::services::util::initialize<true, stan::model::model_base, stan::io::var_context, boost::random::additive_combine_engine<boost::random::linear_congruential_engine<unsigned int, 40014u, 0u, 2147483563u>, boost::random::linear_congruential_engine<unsigned int, 40692u, 0u, 2147483399u> > >(stan::model::model_base&, stan::io::var_context const&, boost::random::additive_combine_engine<boost::random::linear_congruential_engine<unsigned int, 40014u, 0u, 2147483563u>, boost::random::linear_congruential_engine<unsigned int, 40692u, 0u, 2147483399u> >&, double, bool, stan::callbacks::logger&, stan::callbacks::writer&) (/home/undercover/downloads/cmdstan-2.30.1/tmp/model+0xb1faf0)
    #8 0x564fa29ae76c in int stan::services::sample::hmc_nuts_diag_e_adapt<stan::model::model_base>(stan::model::model_base&, stan::io::var_context const&, stan::io::var_context const&, unsigned int, unsigned int, double, int, int, int, bool, int, double, double, int, double, double, double, double, unsigned int, unsigned int, unsigned int, stan::callbacks::interrupt&, stan::callbacks::logger&, stan::callbacks::writer&, stan::callbacks::writer&, stan::callbacks::writer&) (/home/undercover/downloads/cmdstan-2.30.1/tmp/model+0xb3b76c)
    #9 0x564fa29b1374 in int stan::services::sample::hmc_nuts_diag_e_adapt<stan::model::model_base, std::shared_ptr<stan::io::var_context>, stan::callbacks::writer, stan::callbacks::unique_stream_writer<std::ostream>, stan::callbacks::unique_stream_writer<std::ostream> >(stan::model::model_base&, unsigned long, std::vector<std::shared_ptr<stan::io::var_context>, std::allocator<std::shared_ptr<stan::io::var_context> > > const&, unsigned int, unsigned int, double, int, int, int, bool, int, double, double, int, double, double, double, double, unsigned int, unsigned int, unsigned int, stan::callbacks::interrupt&, stan::callbacks::logger&, std::vector<stan::callbacks::writer, std::allocator<stan::callbacks::writer> >&, std::vector<stan::callbacks::unique_stream_writer<std::ostream>, std::allocator<stan::callbacks::unique_stream_writer<std::ostream> > >&, std::vector<stan::callbacks::unique_stream_writer<std::ostream>, std::allocator<stan::callbacks::unique_stream_writer<std::ostream> > >&) (/home/undercover/downloads/cmdstan-2.30.1/tmp/model+0xb3e374)
    #10 0x564fa2950200 in cmdstan::command(int, char const**) (/home/undercover/downloads/cmdstan-2.30.1/tmp/model+0xadd200)
    #11 0x564fa24d27f5 in main (/home/undercover/downloads/cmdstan-2.30.1/tmp/model+0x65f7f5)
    #12 0x7f45713c228f  (/usr/lib/libc.so.6+0x2328f)
    #13 0x7f45713c2349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
    #14 0x564fa24d2fc4 in _start ../sysdeps/x86_64/start.S:115

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/home/undercover/downloads/cmdstan-2.30.1/tmp/model+0x82f936) in std::vector<Eigen::Matrix<stan::return_type<Eigen::Matrix<double, -1, 1, 0, -1, 1>, double, double, std::vector<double, std::allocator<double> > >::type, -1, 1, 0, -1, 1>, std::allocator<Eigen::Matrix<stan::return_type<Eigen::Matrix<double, -1, 1, 0, -1, 1>, double, double, std::vector<double, std::allocator<double> > >::type, -1, 1, 0, -1, 1> > > stan::math::ode_rk45_tol_impl<model_model_namespace::dudt_odefunctor__, Eigen::Matrix<double, -1, 1, 0, -1, 1>, double, double, std::vector<double, std::allocator<double> >, (void*)0>(char const*, model_model_namespace::dudt_odefunctor__ const&, Eigen::Matrix<double, -1, 1, 0, -1, 1> const&, double, std::vector<double, std::allocator<double> > const&, double, double, long, std::ostream*, std::vector<double, std::allocator<double> > const&)
==708412==ABORTING

Replacing ode_rk45 with ode_rkck

By replacing ode_rk45 with ode_rkck the model runs successfully, making use of the previously mentioned dataset.
However, as mentioned before, the model takes a long time to run and a lot of samples are rejected due to the the error estimate in the integral being great than the given relative tolerance.

System information:

  • O.S.: Manjaro
  • Kernel: 5.4.218-2-MANJARO
  • Architecture: x86_64
  • CmdStan: v. 2.30-1 (manual installation from Github)
  • GCC: v. 11.3.0

Additional information

This bug report was the result of a thread I posted over at the Stan's forums, where the user WardBrian was able to replicate this issue and helped me a lot with the debugging (link).

@wds15
Copy link
Contributor

wds15 commented Nov 15, 2022

Thanks for reporting this. Seems to be an ugly bug! FYI: ode_ckrk uses the CVODES utilities while ode_rk45 is based on odeint. Both should work, of course.

@wds15 wds15 self-assigned this Nov 16, 2022
@jpmvferreira
Copy link
Author

jpmvferreira commented Nov 16, 2022

I would like to state that using ode_ckrk isn't a viable solution.
I'm saying this because after waiting almost a day for the code to run the amount of errors printed to STDOUT has 23k lines and weights 3.1Mb.
Almost all of them seem to be due to low integral precision, and a few about singularities in tanh_sinh.

Could it be that this model is problematic to solve numerically? How would I go about and test that?

Regardless of how troublesome the model, it shouldn't give a segmentation fault, right?

@rok-cesnovar
Copy link
Member

Regardless of how troublesome the model, it shouldn't give a segmentation fault, right?

100%

@wds15
Copy link
Contributor

wds15 commented Nov 16, 2022

ode_rk45 should not segfault, really not. This is a legitimate bug which needs to be hunted down. Thanks for making this reproducible. I assigned myself to work on it, but I am hesitant to make promises as to when I get to it.

@wds15
Copy link
Contributor

wds15 commented Mar 9, 2023

Looks like I forgot this one...just got the example to run and can confirm that the segfault happens on my system as well.

@wds15
Copy link
Contributor

wds15 commented Aug 24, 2023

Alright! After a bit of debugging this I am pretty sure that the problem is in RK45 integrator of odeint. The segfault does even happen if you run this thing without any var stuff in it and you even trigger the segfault when getting rid of all parameters of the problem.

The problem is caused by the integration for integrate_1d starting at 0 AND the ode integrator is also being asked to start integrating at 0. That will trigger the integrator to evaluate time-points for the function which are super close to 0. So close that the odeint integrator is being put off the rails (which id should NOT).

So this slightly modified Stan program runs just fine:

functions {
  // return u'(t) to use in ODE solver
  vector dudt(real t, vector y, array[] real theta) {
    real Omega_m = theta[1];
    real Omega_r = theta[2];
    real lambda = theta[3];
    real u = y[1];

    real f1 = exp(-lambda*u^2) / (lambda*u*(u^(-2)-2*lambda) - u^(-3));
    real f2 = 1.5*(Omega_m)*(1+t)^2 + 2*Omega_r*(1+t)^3;

    vector[1] uderiv;
    uderiv[1] = f1*f2;

    return uderiv;
  }

  // return u(t) to use in integration routine
  real ut(real t, real xc, array[] real theta, array[] real x_r, array[] int x_i) {
    // initial conditions
    real t0 = 0;
    vector[1] init;
    init[1] = 1;

    // obtain u(t) using ODE solver
    real sol = ode_rk45(dudt, init, t0, {t}, theta)[1][1];

    return sol;
  }
}

data {
  array[10] real t;
  array[10] real intofuobs;
  array[10] real error;
}

transformed data {
  // create empty x_r and x_i arrays to provide to integrate_1d given that they are required arguments
  array[0] real x_r;
  array[0] int x_i;
}

parameters {
  real Omega_m;
  real Omega_r;
  real lambda;
}

transformed parameters {
  // parameter array
  array[3] real theta = {Omega_m, Omega_r, lambda};

  // value of intofu given by the current parameters
  array[10] real intofu;
  for (i in 1:10) {
   // NOTE: integrate from almost 0 here...since odeint will always start at 0 we do not run into issues.
    intofu[i] = integrate_1d(ut, 1E-2, t[i], theta, x_r, x_i);
  }
}

model {
  // priors
  Omega_m ~ normal(0.3, 10);
  Omega_r ~ normal(0, 10);
  lambda ~ normal(1, 10);

  // likelihood
  intofu ~ normal(intofuobs, error);
}

generated quantities {
}

As a next step I would try to compile a simple example which throws over odeint using a small example. I'd guess that the odeint author will fix this quite likely. Not sure if we need this issue here open any longer. At least I am confident that this is nothing wrong in Stan-math here.

@jpmvferreira
Copy link
Author

Interesting, can I know where is Stan getting the ode_rk45 function from? I haven't found it in the documentation.
May I ask if you tested this with other ODE or just this one?
And now that I'm here, what should be the recommended way of solving a second order ODE? Should I turn it into a first order system of ODE's and use the builtin solver instead of doing as I did in this example right here?

@wds15
Copy link
Contributor

wds15 commented Aug 24, 2023

I solved a much simpler ODE and triggered the same problem.

You should turn things into a first oder oder, of course!

@wds15
Copy link
Contributor

wds15 commented Aug 24, 2023

ode_rk45 is based on the odeint library part of boost.

@jpmvferreira
Copy link
Author

Many thanks!
If this isn't an issue related to Stan, feel free to close this issue when you see fit.

@wds15
Copy link
Contributor

wds15 commented Aug 25, 2023

I just saw that there has not been any activity on the odeint GitHub for ~4 years. Not a good sign. So let's keep this open here as we may have to deal with the issue from the Stan-math side. Still... I am very much relieved to see that there is not any severe issues on Stan-math side of things. It always worried me, but we are good here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants