-
Notifications
You must be signed in to change notification settings - Fork 66
Closed
Description
A SIGFPE
crashed an ML job in 6.8.1. This is the failure message:
Fatal error: 'si_signo 8, si_code: 1, si_errno: 0, address: 0x7f64eccb39b1, library: /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlMaths.so, base: 0x7f64ec95c000, normalized address: 0x3579b1', version: 6.8.1 (build 6e3432237cefa4)
It looks like the function where the SIGFPE
occurred was ml::maths::CTimeSeriesDecompositionDetail::CComponents::CSeasonal::propagateForwards(long, long)
:
$ objdump -T libMlMaths.so | grep '^00000000003579' | sort
0000000000357930 g DF .text 000000000000016e Base _ZN2ml5maths30CTimeSeriesDecompositionDetail11CComponents9CSeasonal17propagateForwardsEll
$ c++filt _ZN2ml5maths30CTimeSeriesDecompositionDetail11CComponents9CSeasonal17propagateForwardsEll
ml::maths::CTimeSeriesDecompositionDetail::CComponents::CSeasonal::propagateForwards(long, long)
The relevant code looks like this in 6.8.1:
void CTimeSeriesDecompositionDetail::CComponents::CSeasonal::propagateForwards(core_t::TTime start,
core_t::TTime end) {
for (std::size_t i = 0u; i < m_Components.size(); ++i) {
core_t::TTime period{m_Components[i].time().period()};
core_t::TTime a{CIntegerTools::floor(start, period)};
core_t::TTime b{CIntegerTools::floor(end, period)};
if (b > a) {
double time{static_cast<double>(b - a) /
static_cast<double>(CTools::truncate(period, DAY, WEEK))};
m_Components[i].propagateForwardsByTime(time);
m_PredictionErrors[i].age(std::exp(-m_Components[i].decayRate() * time));
}
}
}
It's still the same in latest 6.8. It was refactored in version 7.3.0 in #496. We should probably backport a bare minimum change to 6.8 that at least logs an error instead of crashing with a SIGFPE.