Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows compiler segfaults #1615

Closed
rok-cesnovar opened this issue Jan 14, 2020 · 20 comments
Closed

Windows compiler segfaults #1615

rok-cesnovar opened this issue Jan 14, 2020 · 20 comments
Assignees

Comments

@rok-cesnovar
Copy link
Member

Description

We are experiencing some compiler segmentation faults lately on Jenkins Windows machines. These are most probably related to the flatten and limitations of g++ 4.9.3.

If the flatten and increased file sizes is not the culprit I would suspect PR #1471 that introduced additional templating.

Examples of failures:

https://jenkins.mc-stan.org/blue/organizations/jenkins/Math%20Pipeline/detail/PR-1603/9/pipeline/
https://jenkins.mc-stan.org/blue/organizations/jenkins/Math%20Pipeline/detail/PR-1610/4/

If anyone sees a similar error on Jenkins please post it here! And if anyone can replicate this locally, also please post more info. So far I was unable to replicate locally.

We have 4 more days until the release to figure out:

  • how to avoid this
  • if this only effects the Jenkins machines

Current Version:

v3.0.0

@rok-cesnovar rok-cesnovar self-assigned this Jan 14, 2020
@andrjohns
Copy link
Collaborator

Thanks for catching this! I've got a similar issue here:

https://jenkins.mc-stan.org/blue/organizations/jenkins/Stan/detail/downstream_tests/1148/pipeline

The math tests pass on Windows, but the downstream tests segfault.

@rok-cesnovar
Copy link
Member Author

@t4c1 brought this to my attention. I am now trying to get a reliable way to replicate this locally or on Jenkins. It seems that #1558 is experiencing this issue the most reliably on the upstream test. Will branch off of that and abuse Jenkins for debugging a bit.

Its probably either the master includes like <stan/math/prim/err.hpp> or the added templating. I am hoping its the first, as the latter would be really disappointing.

@mcol
Copy link
Contributor

mcol commented Jan 15, 2020

Not sure if you need other reports, but here's one: https://jenkins.mc-stan.org/blue/organizations/jenkins/Math%20Pipeline/detail/PR-1612/9/pipeline/174

@rok-cesnovar
Copy link
Member Author

Thanks to everyone tagging me, I should have time to get to the bottom of this tomorrow.

@rok-cesnovar
Copy link
Member Author

@serban-nicusor-toptal
Copy link
Contributor

Stan, downstream_tests, 1147: Running on gelman-group-win-new
Stan, downstream_tests, 1148: Running on gelman-group-win-new
Stan, downstream_tests, 1149: Running on gelman-group-win-new
Stan, downstream_tests, 1149: Running on gelman-group-win2

Getting this through Jenkins is a pain because Chromes just hangs with more than 4MB data on a page ( even on high-end pcs ).
So what I do is the following:

  1. Download the log with wget, since Jenkins is part public anyone can do it without authentication or tokens.
    wget https://jenkins.mc-stan.org/job/Stan/job/downstream_tests/1150/consoleText
  2. Open that file in an editor like Visual Studio Code then simply CTRL + F after Running on and you'll see what machines were used for that particular job.

I'm currently in the process of building more detailed documentation about Jenkins from jobs to debug procedures so it brings a bit more transparency and others can review CI/CD stages and spot issues more easily.

If I can help with anything else please let me know!

@rok-cesnovar
Copy link
Member Author

Thanks, that more or less confirms it that is not an issue related to a specific machine. I am still struggling to reproduce this locally.

How much RAM is given to Jenkins on Windows? The first issue some time ago was that it was only allowing it to use 1GB of RAM right? How much does it have now? That could potentially be an issue here, since more C++ templating and using master includes means more RAM is needed for the compilation stage.

@serban-nicusor-toptal
Copy link
Contributor

serban-nicusor-toptal commented Jan 16, 2020

Just Jenkins java process was affected by RAM limits, I was wrong in thinking that it will affect processes ran by it. So any process ran by Jenkins will use all the available RAM.

What I found out while fixing some bugs is that some jobs require more swap that the default allocated. Windows allocate around 50 MB which is very low so I had to increase it. Can't give you exact amount/server now but I will update the below when I get home. ( forgot to add swap specifications )

If you want to see other machine specs I've made a list here note that it may change!

@rok-cesnovar
Copy link
Member Author

@serban-nicusor-toptal you fixed those swaps a few days ago right? There were no segfaults on Windows Jenkins tests for a day now and I still cant reproduce locally...

@serban-nicusor-toptal
Copy link
Contributor

serban-nicusor-toptal commented Jan 16, 2020

Hey, I've checked now and the windows machines have respectively 20 and 25 GB swap so more than enough.
To be honest I didn't change anything on the windows machines this week, been writing docs.
The single anomaly I saw was that the new windows machine went a few times down, up in the past few days. Everything else was running fine.


The issue with cc1plus.exe: out of memory allocating 65536 bytes is still happening on the ec2 instances. Didn't yet find a way to fix it ... If by swaps you're referring to this error.


If it helps in any way, math PRs that failed lately are: PR-1525, PR-1607, PR-1604, PR-1558, PR-1612.
I've tried to find a pattern but can't, doesn't make much sense. It failed a few times and then just works normally ... on both our windows machines

Please tell me how I can help you to figure this out

While going through the logs I found these, not sure if related:

lib/stan_math/lib/sundials_4.1.0/lib/libsundials_idas.a: file not recognized: File format not recognized
collect2.exe: error: ld returned 1 exit status
lib/stan_math/lib/sundials_4.1.0/lib/libsundials_idas.a: file not recognized: File format not recognized
collect2.exe: error: ld returned 1 exit status
lib/stan_math/lib/sundials_4.1.0/lib/libsundials_idas.a: file not recognized: File format not recognized
collect2.exe: error: ld returned 1 exit status
lib/stan_math/lib/sundials_4.1.0/lib/libsundials_idas.a: file not recognized: File format not recognized
collect2.exe: error: ld returned 1 exit status
lib/stan_math/lib/sundials_4.1.0/lib/libsundials_idas.a: file not recognized: File format not recognized
collect2.exe: error: ld returned 1 exit status
lib/stan_math/lib/sundials_4.1.0/lib/libsundials_idas.a: file not recognized: File format not recognized
collect2.exe: error: ld returned 1 exit status
make/tests:10: recipe for target 'test/unit/version_test.exe' failed
mingw32-make: *** [test/unit/version_test.exe] Error 1
mingw32-make: *** Waiting for unfinished jobs....
make/tests:10: recipe for target 'test/unit/callbacks/stream_writer_test.exe' failed
mingw32-make: *** [test/unit/callbacks/stream_writer_test.exe] Error 1
make/tests:10: recipe for target 'test/unit/analyze/mcmc/compute_potential_scale_reduction_test.exe' failed
mingw32-make: *** [test/unit/analyze/mcmc/compute_potential_scale_reduction_test.exe] Error 1
lib/stan_math/lib/sundials_4.1.0/lib/libsundials_idas.a: file not recognized: File format not recognized
collect2.exe: error: ld returned 1 exit status
lib/stan_math/lib/sundials_4.1.0/lib/libsundials_idas.a: file not recognized: File format not recognized
collect2.exe: error: ld returned 1 exit status
lib/stan_math/lib/sundials_4.1.0/lib/libsundials_idas.a: file not recognized: File format not recognizedlib/stan_math/lib/sundials_4.1.0/lib/libsundials_idas.a: file not recognized: File format not recognized
collect2.exe: error: ld returned 1 exit status

collect2.exe: error: ld returned 1 exit status
lib/stan_math/lib/sundials_4.1.0/lib/libsundials_idas.a: file not recognized: File format not recognized
collect2.exe: error: ld returned 1 exit status
make/tests:10: recipe for target 'test/unit/callbacks/logger_test.exe' failed
mingw32-make: *** [test/unit/callbacks/logger_test.exe] Error 1
make/tests:10: recipe for target 'test/unit/callbacks/writer_test.exe' failed
mingw32-make: *** [test/unit/callbacks/writer_test.exe] Error 1
lib/stan_math/lib/sundials_4.1.0/lib/libsundials_idas.a: file not recognized: File format not recognized
collect2.exe: error: ld returned 1 exit status
make/tests:10: recipe for target 'test/unit/analyze/mcmc/compute_effective_sample_size_test.exe' failed
mingw32-make: *** [test/unit/analyze/mcmc/compute_effective_sample_size_test.exe] Error 1
lib/stan_math/lib/sundials_4.1.0/lib/libsundials_idas.a: file not recognized: File format not recognized
collect2.exe: error: ld returned 1 exit status
make/tests:10: recipe for target 'test/unit/io/cmd_line_test.exe' failed
mingw32-make: *** [test/unit/io/cmd_line_test.exe] Error 1
make/tests:10: recipe for target 'test/unit/analyze/mcmc/autocovariance_test.exe' failed
mingw32-make: *** [test/unit/analyze/mcmc/autocovariance_test.exe] Error 1
make/tests:10: recipe for target 'test/unit/analyze/mcmc/split_chains_test.exe' failed
mingw32-make: *** [test/unit/analyze/mcmc/split_chains_test.exe] Error 1
make/tests:10: recipe for target 'test/unit/callbacks/tee_writer_test.exe' failed
mingw32-make: *** [test/unit/callbacks/tee_writer_test.exe] Error 1
make/tests:10: recipe for target 'test/unit/io/empty_var_context_test.exe' failed
mingw32-make: *** [test/unit/io/empty_var_context_test.exe] Error 1
lib/stan_math/lib/sundials_4.1.0/lib/libsundials_idas.a: file not recognized: File format not recognized
collect2.exe: error: ld returned 1 exit status
make/tests:10: recipe for target 'test/unit/io/chained_var_context_test.exe' failed
mingw32-make: *** [test/unit/io/chained_var_context_test.exe] Error 1
make/tests:10: recipe for target 'test/unit/callbacks/stream_logger_test.exe' failed
mingw32-make: *** [test/unit/callbacks/stream_logger_test.exe] Error 1
lib/stan_math/lib/sundials_4.1.0/lib/libsundials_idas.a: file not recognized: File format not recognized
collect2.exe: error: ld returned 1 exit status
make/tests:10: recipe for target 'test/unit/io/array_var_context_test.exe' failed
mingw32-make: *** [test/unit/io/array_var_context_test.exe] Error 1
make/tests:10: recipe for target 'test/unit/io/dump_test.exe' failed
mingw32-make: *** [test/unit/io/dump_test.exe] Error 1
lib/stan_math/lib/sundials_4.1.0/lib/libsundials_idas.a: file not recognized: File format not recognized
collect2.exe: error: ld returned 1 exit status
make/tests:10: recipe for target 'test/unit/callbacks/interrupt_test.exe' failed
mingw32-make: *** [test/unit/callbacks/interrupt_test.exe] Error 1
Generating local tbbvars.bat
Generating local tbbvars.sh
Generating local tbbvars.csh
g++   -o tbb.dll concurrent_hash_map.o concurrent_queue.o concurrent_vector.o dynamic_link.o itt_notify.o cache_aligned_allocator.o pipeline.o queuing_mutex.o queuing_rw_mutex.o reader_writer_lock.o spin_rw_mutex.o x86_rtm_rw_mutex.o spin_mutex.o critical_section.o mutex.o recursive_mutex.o condition_variable.o tbb_thread.o concurrent_monitor.o semaphore.o private_server.o rml_tbb.o tbb_misc.o tbb_misc_ex.o task.o task_group_context.o governor.o market.o arena.o scheduler.o observer_proxy.o tbb_statistics.o tbb_main.o concurrent_vector_v2.o concurrent_queue_v2.o spin_rw_mutex_v2.o task_v2.o    -lpsapi -shared -m64 -Wl,-L,"C:/Jenkins/workspace/Stan_downstream_tests/lib/stan_math/lib/tbb" -Wl,-rpath,"C:/Jenkins/workspace/Stan_downstream_tests/lib/stan_math/lib/tbb" -Wl,--version-script,tbb.def
task.o: duplicate section `.rdata$_ZTIN3tbb4taskE[_ZTIN3tbb4taskE]' has different size
arena.o: duplicate section `.rdata$_ZTIN3tbb4taskE[_ZTIN3tbb4taskE]' has different size
scheduler.o: duplicate section `.rdata$_ZTIN3tbb4taskE[_ZTIN3tbb4taskE]' has different size
mingw32-make[1]: Leaving directory 'C:/Jenkins/workspace/Stan_downstream_tests/lib/stan_math/lib/tbb'
In file included from <built-in>:1:
./test/test-models/good/nullary-unconflicted.hpp:351:104: error: no matching function for call to 'log10'
            stan::math::assign(mu, ((((((((stan::math::e() + stan::math::pi()) + stan::math::log2()) + stan::math::log10()) + stan::math::sqrt2()) + stan::math::not_a_number()) + stan::math::positive_infinity()) + stan::math::negative_infinity()) + stan::math::machine_precision()));
                                                                                                       ^~~~~~~~~~~~~~~~~
lib/stan_math/stan/math/prim/mat/fun/log10.hpp:35:13: note: candidate function template not viable: requires single argument 'x', but no arguments were provided
inline auto log10(const T& x) {
            ^
lib/stan_math/stan/math/prim/mat/fun/log10.hpp:47:13: note: candidate function template not viable: requires single argument 'x', but no arguments were provided
inline auto log10(const Eigen::MatrixBase<Derived>& x) {
            ^
lib/stan_math/stan/math/rev/fun/log10.hpp:51:12: note: candidate function not viable: requires single argument 'a', but no arguments were provided
inline var log10(const var& a) { return var(new internal::log10_vari(a.vi_)); }
           ^
In file included from <built-in>:1:
./test/test-models/good/nullary-unconflicted.hpp:351:127: error: no member named 'sqrt2' in namespace 'stan::math'; did you mean simply 'sqrt2'?
            stan::math::assign(mu, ((((((((stan::math::e() + stan::math::pi()) + stan::math::log2()) + stan::math::log10()) + stan::math::sqrt2()) + stan::math::not_a_number()) + stan::math::positive_infinity()) + stan::math::negative_infinity()) + stan::math::machine_precision()));
                                                                                                                              ^~~~~~~~~~~~~~~~~
                                                                                                                              sqrt2
./test/test-models/good/nullary-unconflicted.hpp:303:30: note: 'sqrt2' declared here
            local_scalar_t__ sqrt2;
                             ^
./test/test-models/good/nullary-unconflicted.hpp:492:104: error: no matching function for call to 'log10'
            stan::math::assign(mu, ((((((((stan::math::e() + stan::math::pi()) + stan::math::log2()) + stan::math::log10()) + stan::math::sqrt2()) + stan::math::not_a_number()) + stan::math::positive_infinity()) + stan::math::negative_infinity()) + stan::math::machine_precision()));
                                                                                                       ^~~~~~~~~~~~~~~~~
lib/stan_math/stan/math/prim/mat/fun/log10.hpp:35:13: note: candidate function template not viable: requires single argument 'x', but no arguments were provided
inline auto log10(const T& x) {
            ^
lib/stan_math/stan/math/prim/mat/fun/log10.hpp:47:13: note: candidate function template not viable: requires single argument 'x', but no arguments were provided
inline auto log10(const Eigen::MatrixBase<Derived>& x) {
            ^
lib/stan_math/stan/math/rev/fun/log10.hpp:51:12: note: candidate function not viable: requires single argument 'a', but no arguments were provided
inline var log10(const var& a) { return var(new internal::log10_vari(a.vi_)); }
           ^
In file included from <built-in>:1:
./test/test-models/good/nullary-unconflicted.hpp:492:139: error: no member named 'sqrt2' in namespace 'stan::math'
            stan::math::assign(mu, ((((((((stan::math::e() + stan::math::pi()) + stan::math::log2()) + stan::math::log10()) + stan::math::sqrt2()) + stan::math::not_a_number()) + stan::math::positive_infinity()) + stan::math::negative_infinity()) + stan::math::machine_precision()));
                                                                                                                              ~~~~~~~~~~~~^
4 errors generated.
make/tests:98: recipe for target 'test/test-models/good/nullary-unconflicted.hpp-test' failed
make: *** [test/test-models/good/nullary-unconflicted.hpp-test] Error 1
make: *** Waiting for unfinished jobs....
rm test/integration/mtu/model_1.o test/integration/mtu/model_2.o test/integration/mtu/model_1.cpp test/integration/mtu/model_2.cpp test/integration/mtu/mcmc_1.o test/integration/mtu/mcmc_2.cpp test/integration/mtu/mcmc_1.cpp test/integration/mtu/mcmc_2.o
make -j25 test/integration/multiple_translation_units_test test/integration/compile_models_test test/integration/compile_standalone_functions_test failed
exit now (01/11/20 05:49:03 UTC)
In file included from <built-in>:1:
./test/test-models/good/vec-expr/row_vector_expr_terms.hpp:178:107: error: call to 'pow' is ambiguous
            stan::math::assign(td_rv1, stan::math::to_row_vector(stan::math::array_builder<double >().add(pow(x, 2)).add(pow(y, 2)).array()));
                                                                                                          ^~~
/usr/include/x86_64-linux-gnu/bits/mathcalls.h:153:17: note: candidate function
__MATHCALL_VEC (pow,, (_Mdouble_ __x, _Mdouble_ __y));
                ^
lib/stan_math/stan/math/rev/fun/pow.hpp:125:12: note: candidate function [with Arith = int, $1 = <>]
inline var pow(const var& base, Arith exponent) {
           ^
lib/stan_math/stan/math/rev/fun/pow.hpp:160:12: note: candidate function [with Arith = double, $1 = <>]
inline var pow(Arith base, const var& exponent) {
           ^
lib/stan_math/stan/math/rev/fun/pow.hpp:108:12: note: candidate function
inline var pow(const var& base, const var& exponent) {
           ^
In file included from <built-in>:1:
./test/test-models/good/vec-expr/row_vector_expr_terms.hpp:178:122: error: call to 'pow' is ambiguous
            stan::math::assign(td_rv1, stan::math::to_row_vector(stan::math::array_builder<double >().add(pow(x, 2)).add(pow(y, 2)).array()));
                                                                                                                         ^~~
/usr/include/x86_64-linux-gnu/bits/mathcalls.h:153:17: note: candidate function
__MATHCALL_VEC (pow,, (_Mdouble_ __x, _Mdouble_ __y));
                ^
lib/stan_math/stan/math/rev/fun/pow.hpp:125:12: note: candidate function [with Arith = int, $1 = <>]
inline var pow(const var& base, Arith exponent) {
           ^
lib/stan_math/stan/math/rev/fun/pow.hpp:160:12: note: candidate function [with Arith = double, $1 = <>]
inline var pow(Arith base, const var& exponent) {
           ^
lib/stan_math/stan/math/rev/fun/pow.hpp:108:12: note: candidate function
inline var pow(const var& base, const var& exponent) {
           ^
2 errors generated.
make/tests:98: recipe for target 'test/test-models/good/vec-expr/row_vector_expr_terms.hpp-test' failed
make: *** [test/test-models/good/vec-expr/row_vector_expr_terms.hpp-test] Error 1
make: *** Waiting for unfinished jobs....
rm test/integration/mtu/model_1.o test/integration/mtu/model_2.o test/integration/mtu/model_1.cpp test/integration/mtu/model_2.cpp test/integration/mtu/mcmc_1.o test/integration/mtu/mcmc_2.cpp test/integration/mtu/mcmc_1.cpp test/integration/mtu/mcmc_2.o
make -j25 test/integration/compile_models_test test/integration/multiple_translation_units_test test/integration/compile_standalone_functions_test failed
exit now (01/13/20 06:18:10 UTC)
clang++-6.0  -std=c++1y -D_REENTRANT -Wno-sign-compare      -I lib/stan_math/lib/tbb_2019_U8/include -O3 -I src -I . -I lib/stan_math/ -I lib/stan_math/lib/eigen_3.3.3 -I lib/stan_math/lib/boost_1.72.0 -I lib/stan_math/lib/sundials_4.1.0/include -I lib/stan_math/lib/gtest_1.8.1/include -I lib/stan_math/lib/gtest_1.8.1 -I lib/stan_math/lib/gtest_1.8.1/include -I lib/stan_math/lib/gtest_1.8.1      -DBOOST_DISABLE_ASSERTS        -c src/test/unit/mcmc/hmc/static_uniform/base_static_uniform_test.cpp -o test/unit/mcmc/hmc/static_uniform/base_static_uniform_test.o
In file included from lib/stan_math/lib/eigen_3.3.3/Eigen/Core:440:0,
                 from lib/stan_math/lib/eigen_3.3.3/Eigen/Dense:1,
                 from lib/stan_math/stan/math/prim/fun/Eigen.hpp:13,
                 from lib/stan_math/stan/math/prim/meta/append_return_type.hpp:4,
                 from lib/stan_math/stan/math/prim/meta.hpp:4,
                 from lib/stan_math/stan/math/prim/err/invalid_argument.hpp:4,
                 from lib/stan_math/stan/math/prim/core/init_threadpool_tbb.hpp:4,
                 from lib/stan_math/stan/math/prim/core.hpp:4,
                 from lib/stan_math/stan/math/prim.hpp:6,
                 from ./src/stan/analyze/mcmc/autocovariance.hpp:4,
                 from <command-line>:0:
lib/stan_math/lib/eigen_3.3.3/Eigen/src/Core/CwiseBinaryOp.h: In instantiation of 'Eigen::CwiseBinaryOp<BinaryOp, Lhs, Rhs>::CwiseBinaryOp(const Lhs&, const Rhs&, const BinaryOp&) [with BinaryOp = Eigen::internal::scalar_quotient_op<double, double>; LhsType = const Eigen::Matrix<double, -1, -1>; RhsType = const Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_op<double>, const Eigen::Matrix<double, -1, -1> >; Eigen::CwiseBinaryOp<BinaryOp, Lhs, Rhs>::Lhs = Eigen::Matrix<double, -1, -1>; Eigen::CwiseBinaryOp<BinaryOp, Lhs, Rhs>::Rhs = Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_op<double>, const Eigen::Matrix<double, -1, -1> >]':
lib/stan_math/lib/eigen_3.3.3/Eigen/src/Core/../plugins/CommonCwiseBinaryOps.h:69:1:   required from 'typename Eigen::internal::enable_if<true, const Eigen::CwiseBinaryOp<Eigen::internal::scalar_quotient_op<typename Eigen::internal::traits<T>::Scalar, typename Eigen::internal::promote_scalar_arg<typename Eigen::internal::traits<T>::Scalar, T, (Eigen::internal::has_ReturnType<Eigen::ScalarBinaryOpTraits<typename Eigen::internal::traits<T>::Scalar, T, Eigen::internal::scalar_quotient_op<typename Eigen::internal::traits<T>::Scalar, T> > >::value)>::type>, const Derived, const typename Eigen::internal::plain_constant_type<Derived, typename Eigen::internal::promote_scalar_arg<typename Eigen::internal::traits<T>::Scalar, T, (Eigen::internal::has_ReturnType<Eigen::ScalarBinaryOpTraits<typename Eigen::internal::traits<T>::Scalar, T, Eigen::internal::scalar_quotient_op<typename Eigen::internal::traits<T>::Scalar, T> > >::value)>::type>::type> >::type Eigen::MatrixBase<Derived>::operator/(const T&) const [with T = double; Derived = Eigen::Matrix<double, -1, -1>; typename Eigen::internal::enable_if<true, const Eigen::CwiseBinaryOp<Eigen::internal::scalar_quotient_op<typename Eigen::internal::traits<T>::Scalar, typename Eigen::internal::promote_scalar_arg<typename Eigen::internal::traits<T>::Scalar, T, (Eigen::internal::has_ReturnType<Eigen::ScalarBinaryOpTraits<typename Eigen::internal::traits<T>::Scalar, T, Eigen::internal::scalar_quotient_op<typename Eigen::internal::traits<T>::Scalar, T> > >::value)>::type>, const Derived, const typename Eigen::internal::plain_constant_type<Derived, typename Eigen::internal::promote_scalar_arg<typename Eigen::internal::traits<T>::Scalar, T, (Eigen::internal::has_ReturnType<Eigen::ScalarBinaryOpTraits<typename Eigen::internal::traits<T>::Scalar, T, Eigen::internal::scalar_quotient_op<typename Eigen::internal::traits<T>::Scalar, T> > >::value)>::type>::type> >::type = const Eigen::CwiseBinaryOp<Eigen::internal::scalar_quotient_op<double, double>, const Eigen::Matrix<double, -1, -1>, const Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_op<double>, const Eigen::Matrix<double, -1, -1> > >]'
lib/stan_math/stan/math/prim/fun/welford_covar_estimator.hpp:37:40:   required from here
lib/stan_math/lib/eigen_3.3.3/Eigen/src/Core/CwiseBinaryOp.h:111:5: internal compiler error: Segmentation fault
     }
     ^
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://sourceforge.net/projects/mingw-w64> for instructions.
clang++-6.0  -std=c++1y -D_REENTRANT -Wno-sign-compare      -I lib/stan_math/lib/tbb_2019_U8/include -O3 -I src -I . -I lib/stan_math/ -I lib/stan_math/lib/eigen_3.3.3 -I lib/stan_math/lib/boost_1.72.0 -I lib/stan_math/lib/sundials_4.1.0/include -I lib/stan_math/lib/gtest_1.8.1/include -I lib/stan_math/lib/gtest_1.8.1 -I lib/stan_math/lib/gtest_1.8.1/include -I lib/stan_math/lib/gtest_1.8.1      -DBOOST_DISABLE_ASSERTS        -c src/test/unit/mcmc/hmc/static_uniform/derived_static_uniform_test.cpp -o test/unit/mcmc/hmc/static_uniform/derived_static_uniform_test.o
make/tests:76: recipe for target 'src/stan/analyze/mcmc/autocovariance.hpp-test' failed
mingw32-make: *** [src/stan/analyze/mcmc/autocovariance.hpp-test] Error 1
clang++-6.0  -std=c++1y -D_REENTRANT -Wno-sign-compare      -I lib/stan_math/lib/tbb_2019_U8/include -O3 -I src -I . -I lib/stan_math/ -I lib/stan_math/lib/eigen_3.3.3 -I lib/stan_math/lib/boost_1.72.0 -I lib/stan_math/lib/sundials_4.1.0/include -I lib/stan_math/lib/gtest_1.8.1/include -I lib/stan_math/lib/gtest_1.8.1 -I lib/stan_math/lib/gtest_1.8.1/include -I lib/stan_math/lib/gtest_1.8.1      -DBOOST_DISABLE_ASSERTS        -c src/test/unit/lang/parser/algebra_solver_test.cpp -o test/unit/lang/parser/algebra_solver_test.o
In file included from C:/Rtools/mingw_64/x86_64-w64-mingw32/include/crtdefs.h:10:0,
                 from C:/Rtools/mingw_64/x86_64-w64-mingw32/include/wchar.h:9,
                 from C:/Rtools/mingw_64/x86_64-w64-mingw32/include/c++/cwchar:44,
                 from C:/Rtools/mingw_64/x86_64-w64-mingw32/include/c++/bits/postypes.h:40,
                 from C:/Rtools/mingw_64/x86_64-w64-mingw32/include/c++/iosfwd:40,
                 from C:/Rtools/mingw_64/x86_64-w64-mingw32/include/c++/ios:38,
                 from C:/Rtools/mingw_64/x86_64-w64-mingw32/include/c++/istream:38,
                 from C:/Rtools/mingw_64/x86_64-w64-mingw32/include/c++/sstream:38,
                 from src/stan/io/var_context.hpp:4,
                 from ./src/stan/io/array_var_context.hpp:4,
                 from <command-line>:0:
C:/Rtools/mingw_64/x86_64-w64-mingw32/include/_mingw.h:660:1: internal compiler error: Segmentation fault
 }
 ^
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://sourceforge.net/projects/mingw-w64> for instructions.
clang++-6.0  -std=c++1y -D_REENTRANT -Wno-sign-compare      -I lib/stan_math/lib/tbb_2019_U8/include -O3 -I src -I . -I lib/stan_math/ -I lib/stan_math/lib/eigen_3.3.3 -I lib/stan_math/lib/boost_1.72.0 -I lib/stan_math/lib/sundials_4.1.0/include -I lib/stan_math/lib/gtest_1.8.1/include -I lib/stan_math/lib/gtest_1.8.1 -I lib/stan_math/lib/gtest_1.8.1/include -I lib/stan_math/lib/gtest_1.8.1      -DBOOST_DISABLE_ASSERTS        -c src/test/unit/lang/parser/get_lp_test.cpp -o test/unit/lang/parser/get_lp_test.o
clang++-6.0  -std=c++1y -D_REENTRANT -Wno-sign-compare      -I lib/stan_math/lib/tbb_2019_U8/include -O3 -I src -I . -I lib/stan_math/ -I lib/stan_math/lib/eigen_3.3.3 -I lib/stan_math/lib/boost_1.72.0 -I lib/stan_math/lib/sundials_4.1.0/include -I lib/stan_math/lib/gtest_1.8.1/include -I lib/stan_math/lib/gtest_1.8.1 -I lib/stan_math/lib/gtest_1.8.1/include -I lib/stan_math/lib/gtest_1.8.1      -DBOOST_DISABLE_ASSERTS        -c src/test/unit/lang/ast/variable_map_test.cpp -o test/unit/lang/ast/variable_map_test.o
make/tests:76: recipe for target 'src/stan/io/array_var_context.hpp-test' failed
mingw32-make: *** [src/stan/io/array_var_context.hpp-test] Error 1

@rok-cesnovar
Copy link
Member Author

The ones with "internal compiler error: Segmentation fault" are related and those should be looked at.

I looked at

https://jenkins.mc-stan.org/blue/organizations/jenkins/Stan/detail/downstream_tests/1147/pipeline/66
https://jenkins.mc-stan.org/blue/organizations/jenkins/Math%20Pipeline/detail/PR-1603/9/pipeline/
https://jenkins.mc-stan.org/blue/organizations/jenkins/Math%20Pipeline/detail/PR-1610/4/
https://jenkins.mc-stan.org/blue/organizations/jenkins/Stan/detail/downstream_tests/1148/pipeline/66
https://jenkins.mc-stan.org/blue/organizations/jenkins/Math%20Pipeline/detail/PR-1612/9/pipeline/174

which all failed for this reason and they all ran on gelman-group-win-new. And none have failed in the last day. If anyone finds any other segfaults on windows, please keep posting those or tagging me.

@rok-cesnovar
Copy link
Member Author

If these are limited to a single machine that doesnt have enough resources to handle this, that would be the best case scenario. We will just reduce the parallel env var and should be good.

@serban-nicusor-toptal
Copy link
Contributor

All of the above are extracted from segfault jobs.
This is only related to RAM from what I understood and it's either that software is trying to access bad parts of the memory ( maybe a code/compiler issue in multi-threading ) or a faulty RAM stick. Source
I would assume it's not related to just one machine since on both it failed and then succeeded.

@rok-cesnovar
Copy link
Member Author

I am thinking more in the line that it segfaults as it runs out of RAM in multi-threaded compilation.
Have you seen any segfaults on gelman-group-win2? If we find any of those then its certainly not limited to one machine.

@serban-nicusor-toptal
Copy link
Contributor

serban-nicusor-toptal commented Jan 16, 2020

Yes, I've found one. Give me a sec to find it.

@serban-nicusor-toptal
Copy link
Contributor

You are actually right, it only happened on the new windows instance. The failures on the other one weren't related to segmentation fault.
I've rebooted the machine maybe it wasn't rebooted after swap increase ... ? That may be the case.
I've started this job it should pick gelman-group-win-new since gelman-group-win2 is busy.

@rok-cesnovar
Copy link
Member Author

Thanks!

@serban-nicusor-toptal
Copy link
Contributor

serban-nicusor-toptal commented Jan 16, 2020

I think this confirms it: https://jenkins.mc-stan.org/blue/organizations/jenkins/Stan/detail/downstream_tests/1156/pipeline/66
To be entirely sure we should wait for it to finish successfully.

@rok-cesnovar
Copy link
Member Author

Thank you for the help. Closing for now, if any such error occurs again, we can reopen.

@wds15
Copy link
Contributor

wds15 commented Jan 17, 2020

Thanks all for figuring this out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants