Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Autodetect fatal error in gallery 302 #94

Closed
sophiec20 opened this issue May 14, 2018 · 4 comments · Fixed by #96
Closed

[ML] Autodetect fatal error in gallery 302 #94

sophiec20 opened this issue May 14, 2018 · 4 comments · Fixed by #96

Comments

@sophiec20
Copy link

sophiec20 commented May 14, 2018

Found in "native_code_info": { "version": "7.0.0-alpha1-SNAPSHOT", "build_hash": "d219beab8c7d34" }

Linux version 3.10.0-693.11.1.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Mon Dec 4 23:52:40 UTC 2017

  • Using gallery2018 dataset
  • count at bucket_span=15m
  • datafeed query {"bool":{"must":[{"term":{"status":{"value":"302"}}}]}}
  • Datafeed started (from: 1970-01-02T10:00:00.000Z to: 2018-12-31T00:00:00.000Z)

Job fails with unexpected death of autodetect.
Occurs in repeated runs, at same latest timestamp 2018-10-09 15:29:11 (UTC).
This corresponds with the end of the timeseries for 302; however job end date was specified as 2018-12-31 00:00:00

[2018-05-14T10:14:04,309][INFO ][o.e.x.m.a.TransportPutDatafeedAction] [node1] Created datafeed [datafeed-ga3-count-302-15m]
[2018-05-14T10:14:04,499][INFO ][o.e.x.m.j.p.a.AutodetectProcessManager] [node1] Opening job [ga3-count-302-15m]
[2018-05-14T10:14:04,555][INFO ][o.e.x.m.j.p.a.AutodetectProcessManager] [node1] [ga3-count-302-15m] Loading model snapshot [N/A], job latest_record_timestamp [N/A]
[2018-05-14T10:14:04,705][INFO ][o.e.x.m.j.p.l.CppLogMessageHandler] [ga3-count-302-15m] [autodetect/6086] [CResourceMonitor.cc@67] Setting model memory limit to 20 MB
[2018-05-14T10:14:04,740][INFO ][o.e.x.m.j.p.a.AutodetectProcessManager] [node1] Successfully set job state to [opened] for job [ga3-count-302-15m]
[2018-05-14T10:14:05,227][INFO ][o.e.x.m.d.DatafeedJob    ] [ga3-count-302-15m] Datafeed started (from: 1970-01-02T10:00:00.000Z to: 2018-12-31T00:00:00.000Z) with frequency [450000ms]
[2018-05-14T10:14:06,887][INFO ][o.e.x.m.j.p.DataCountsReporter] [node1] [ga3-count-200-15m] 700000 records written to autodetect; missingFieldCount=0, invalidDateCount=0, outOfOrderCount=0
[2018-05-14T10:14:26,360][INFO ][o.e.x.m.a.TransportPutDatafeedAction] [node1] Created datafeed [datafeed-ga3-count-400-15m]
[2018-05-14T10:14:48,150][INFO ][o.e.x.m.a.TransportPutDatafeedAction] [node1] Created datafeed [datafeed-ga3-count-303-15m]
[2018-05-14T10:14:51,946][INFO ][o.e.x.m.j.p.DataCountsReporter] [node1] [ga3-count-200-15m] 800000 records written to autodetect; missingFieldCount=0, invalidDateCount=0, outOfOrderCount=0
[2018-05-14T10:14:52,463][ERROR][o.e.x.m.j.p.a.NativeAutodetectProcess] [ga3-count-302-15m] autodetect process stopped unexpectedly: Fatal error: 'si_signo 11, si_code: 1, si_errno: 0, address: 0x7f39a97c3eaa, library: /opt/elastic/7.0.0-20180511/elasticsearch-7.0.0-alpha1-SNAPSHOT/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlMaths.so, base: 0x7f39a948e000, normalized address: 0x335eaa'

[2018-05-14T10:14:52,465][INFO ][o.e.x.m.j.p.a.NativeAutodetectProcess] [ga3-count-302-15m] State output finished
[2018-05-14T10:14:52,482][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [controller/24993] [CDetachedProcessSpawner.cc@184] Child process with PID 6086 was terminated by signal 11
[2018-05-14T10:14:52,529][INFO ][o.e.x.m.j.p.a.AutodetectProcessManager] [node1] Successfully set job state to [failed] for job [ga3-count-302-15m]
[2018-05-14T10:14:52,763][ERROR][o.e.x.m.j.p.a.AutodetectCommunicator] [ga3-count-302-15m] Unexpected exception writing to process
org.elasticsearch.ElasticsearchException: [ga3-count-302-15m] Unexpected death of autodetect: Fatal error: 'si_signo 11, si_code: 1, si_errno: 0, address: 0x7f39a97c3eaa, library: /opt/elastic/7.0.0-20180511/elasticsearch-7.0.0-alpha1-SNAPSHOT/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlMaths.so, base: 0x7f39a948e000, normalized address: 0x335eaa'

        at org.elasticsearch.xpack.ml.job.process.autodetect.AutodetectCommunicator.checkProcessIsAlive(AutodetectCommunicator.java:307) ~[?:?]
        at org.elasticsearch.xpack.ml.job.process.autodetect.AutodetectCommunicator.waitFlushToCompletion(AutodetectCommunicator.java:282) ~[?:?]
        at org.elasticsearch.xpack.ml.job.process.autodetect.AutodetectCommunicator.lambda$flushJob$4(AutodetectCommunicator.java:241) ~[?:?]
        at org.elasticsearch.xpack.ml.job.process.autodetect.AutodetectCommunicator$1.doRun(AutodetectCommunicator.java:363) ~[?:?]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:724) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
        at org.elasticsearch.xpack.ml.job.process.autodetect.AutodetectProcessManager$AutodetectWorkerExecutorService.start(AutodetectProcessManager.java:678) ~[?:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_151]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_151]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:625) [elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
[2018-05-14T10:14:52,767][ERROR][o.e.x.m.j.p.a.AutodetectProcessManager] [node1] [ga3-count-302-15m] exception while flushing job
[2018-05-14T10:14:52,772][INFO ][o.e.x.m.d.DatafeedManager] [no_realtime] attempt to stop datafeed [datafeed-ga3-count-302-15m] for job [ga3-count-302-15m]
[2018-05-14T10:14:52,780][INFO ][o.e.x.m.d.DatafeedManager] [no_realtime] try lock [20s] to stop datafeed [datafeed-ga3-count-302-15m] for job [ga3-count-302-15m]...
[2018-05-14T10:14:52,780][INFO ][o.e.x.m.d.DatafeedManager] [no_realtime] stopping datafeed [datafeed-ga3-count-302-15m] for job [ga3-count-302-15m], acquired [true]...
[2018-05-14T10:14:52,781][INFO ][o.e.x.m.d.DatafeedManager] [no_realtime] datafeed [datafeed-ga3-count-302-15m] for job [ga3-count-302-15m] has been stopped
[2018-05-14T10:14:54,614][WARN ][o.e.x.m.j.p.a.o.AutoDetectResultProcessor] [ga3-count-302-15m] some results not processed due to the termination of autodetect
@sophiec20
Copy link
Author

(gdb) bt
#0  0x00007f9a42d4a59b in raise () from /lib64/libpthread.so.0
#1  0x00007f9a41ffed80 in ml::core::crashHandler(int, siginfo*, void*) () from /opt/elastic/7.0.0-20180511/elasticsearch-7.0.0-alpha1-SNAPSHOT/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlCore.so
#2  <signal handler called>
#3  0x00007f9a41b97eaa in ml::maths::CTimeSeriesDecompositionDetail::CComponents::reweightOutliers(long, long, std::function<double (long)>, std::vector<ml::maths::CBasicStatistics::SSampleCentralMoments<ml::core::CFloatStorage, 1u>, std::allocator<ml::maths::CBasicStatistics::SSampleCentralMoments<ml::core::CFloatStorage, 1u> > >&) const () from /opt/elastic/7.0.0-20180511/elasticsearch-7.0.0-alpha1-SNAPSHOT/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlMaths.so
#4  0x00007f9a41ba05b5 in ml::maths::CTimeSeriesDecompositionDetail::CComponents::addSeasonalComponents(ml::maths::CPeriodicityHypothesisTestsResult const&, ml::maths::CExpandingWindow const&, std::function<double (long)> const&, ml::maths::CTrendComponent&, std::vector<ml::maths::CSeasonalComponent, std::allocator<ml::maths::CSeasonalComponent> >&, std::vector<ml::maths::CTimeSeriesDecompositionDetail::CComponents::CComponentErrors, std::allocator<ml::maths::CTimeSeriesDecompositionDetail::CComponents::CComponentErrors> >&) const ()
   from /opt/elastic/7.0.0-20180511/elasticsearch-7.0.0-alpha1-SNAPSHOT/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlMaths.so
#5  0x00007f9a41ba1534 in ml::maths::CTimeSeriesDecompositionDetail::CComponents::handle(ml::maths::CTimeSeriesDecompositionDetail::SDetectedSeasonal const&) ()
   from /opt/elastic/7.0.0-20180511/elasticsearch-7.0.0-alpha1-SNAPSHOT/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlMaths.so
#6  0x00007f9a41b939b7 in ml::maths::CTimeSeriesDecompositionDetail::CPeriodicityTest::test(ml::maths::CTimeSeriesDecompositionDetail::SAddValue const&) ()
   from /opt/elastic/7.0.0-20180511/elasticsearch-7.0.0-alpha1-SNAPSHOT/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlMaths.so
#7  0x00007f9a41b93bb5 in ml::maths::CTimeSeriesDecompositionDetail::CPeriodicityTest::handle(ml::maths::CTimeSeriesDecompositionDetail::SAddValue const&) ()
   from /opt/elastic/7.0.0-20180511/elasticsearch-7.0.0-alpha1-SNAPSHOT/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlMaths.so
#8  0x00007f9a41b8666c in ml::maths::CTimeSeriesDecomposition::addPoint(long, double, boost::array<double, 4ul> const&) () from /opt/elastic/7.0.0-20180511/elasticsearch-7.0.0-alpha1-SNAPSHOT/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlMaths.so
#9  0x00007f9a41bd0a13 in ml::maths::CUnivariateTimeSeriesModel::updateTrend(std::vector<ml::core::CTriple<long, ml::core::CSmallVector<double, 2ul>, unsigned long>, std::allocator<ml::core::CTriple<long, ml::core::CSmallVector<double, 2ul>, unsigned long> > > const&, std::vector<boost::array<ml::core::CSmallVector<double, 2ul>, 4ul>, std::allocator<boost::array<ml::core::CSmallVector<double, 2ul>, 4ul> > > const&) () from /opt/elastic/7.0.0-20180511/elasticsearch-7.0.0-alpha1-SNAPSHOT/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlMaths.so
#10 0x00007f9a41bd0f5e in ml::maths::CUnivariateTimeSeriesModel::addSamples(ml::maths::CModelAddSamplesParams const&, std::vector<ml::core::CTriple<long, ml::core::CSmallVector<double, 2ul>, unsigned long>, std::allocator<ml::core::CTriple<long, ml::core::CSmallVector<double, 2ul>, unsigned long> > >) () from /opt/elastic/7.0.0-20180511/elasticsearch-7.0.0-alpha1-SNAPSHOT/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlMaths.so
#11 0x00007f9a4138a295 in ml::model::CEventRateModel::sample(long, long, ml::model::CResourceMonitor&) () from /opt/elastic/7.0.0-20180511/elasticsearch-7.0.0-alpha1-SNAPSHOT/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlModel.so
#12 0x00007f9a412d49c7 in ml::model::CAnomalyDetector::sample(long, long, ml::model::CResourceMonitor&) () from /opt/elastic/7.0.0-20180511/elasticsearch-7.0.0-alpha1-SNAPSHOT/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlModel.so
#13 0x00007f9a412d831c in ml::model::CAnomalyDetector::buildResults(long, long, ml::model::CHierarchicalResults&) () from /opt/elastic/7.0.0-20180511/elasticsearch-7.0.0-alpha1-SNAPSHOT/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlModel.so
#14 0x00007f9a40f2fce0 in ml::api::CAnomalyJob::outputResults(long) () from /opt/elastic/7.0.0-20180511/elasticsearch-7.0.0-alpha1-SNAPSHOT/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlApi.so
#15 0x00007f9a40f30553 in ml::api::CAnomalyJob::outputBucketResultsUntil(long) () from /opt/elastic/7.0.0-20180511/elasticsearch-7.0.0-alpha1-SNAPSHOT/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlApi.so
#16 0x00007f9a40f316a8 in ml::api::CAnomalyJob::handleRecord(boost::unordered::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, boost::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&) () from /opt/elastic/7.0.0-20180511/elasticsearch-7.0.0-alpha1-SNAPSHOT/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlApi.so
#17 0x00007f9a40fb6e30 in ml::api::CLengthEncodedInputParser::readStream(std::function<bool (boost::unordered::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, boost::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&)> const&) () from /opt/elastic/7.0.0-20180511/elasticsearch-7.0.0-alpha1-SNAPSHOT/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlApi.so
#18 0x00007f9a40f5d647 in ml::api::CCmdSkeleton::ioLoop() () from /opt/elastic/7.0.0-20180511/elasticsearch-7.0.0-alpha1-SNAPSHOT/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlApi.so
#19 0x00005627315a37c9 in main ()

@hendrikmuhs
Copy link
Contributor

Looking at the build hash, this seems to be a regression introduced with #92. I assume this test has been run before.

@hendrikmuhs hendrikmuhs self-assigned this May 14, 2018
@tveasey
Copy link
Contributor

tveasey commented May 14, 2018

I suspect that this is probably caused by numberOutliers in reweightOutliers being less than 0.5 for very sparse data. In which case, wrapping the code which does the reweighting in a if (numberOutliers > 1.0) would be the natural fix. (Note that this isn't released (nor targeted at 6.3) so also marking this as a non-issue.)

@hendrikmuhs
Copy link
Contributor

@tveasey This is indeed the problem and can/should be fixed as you proposed.

But: I would like to further harden the code.

hendrikmuhs pushed a commit to hendrikmuhs/ml-cpp that referenced this issue May 16, 2018
…stream

and harden accumulators to prevent empty containers.

fixes elastic#94
hendrikmuhs pushed a commit that referenced this issue May 17, 2018
#96)

Do not re-weight outliers if there is just 1, preventing a crash downstream
and harden accumulators to prevent empty containers.

fixes #94
hendrikmuhs pushed a commit to elastic/elasticsearch that referenced this issue May 18, 2018
…30674)

This change adds version information in case a native ML process crashes, the version is important for choosing the right symbol files when analyzing the crash. Adding the version combines all necessary information on one line.

relates elastic/ml-cpp#94
hendrikmuhs pushed a commit to elastic/elasticsearch that referenced this issue May 18, 2018
…30674)

This change adds version information in case a native ML process
crashes, the version is important for choosing the right symbol files
when analyzing the crash. Adding the version combines all necessary
information on one line.

relates elastic/ml-cpp#94
tveasey pushed a commit that referenced this issue May 22, 2018
#96)

Do not re-weight outliers if there is just 1, preventing a crash downstream
and harden accumulators to prevent empty containers.

fixes #94
ywelsch pushed a commit to ywelsch/elasticsearch that referenced this issue May 23, 2018
…lastic#30674)

This change adds version information in case a native ML process crashes, the version is important for choosing the right symbol files when analyzing the crash. Adding the version combines all necessary information on one line.

relates elastic/ml-cpp#94
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants