[ML] Fix memory errors picked up by Valgrind #852

edsavage · 2019-11-27T19:08:12Z

Also fixes a couple more warnings picked up by clang_tidy

closes #845

Also fixes a couple more warnings picked up by clang_tidy

droberts195 · 2019-11-27T21:59:14Z

docs/CHANGELOG.asciidoc

 * Reduce memory usage of {ml} native processes on Windows. (See {ml-pull}844[#844].)

+=== Bug Fixes
+* Fixes memory errors picked up by Valgrind. (See {ml-pull}852[#852].)


I don’t think the release notes should mention Valgrind. The end user won’t care how the problem was discovered (or if they really care they can drill down to the GitHub issue).

Maybe something like, “Fixes potential memory corruption when determining seasonality.”

droberts195 · 2019-11-27T22:03:22Z

lib/maths/CSignal.cc


-    TComplexVec f;
-    f.reserve(n);
+    TComplexVec f(n, TComplex{});


The loop below contains push_back and emplace_back calls on this vector. Is the final size meant to be n? If so I think it will end up too big unless the loop contents are changed too.

Having said that, given that a couple of branches inside the loop set values to (0, 0) you’re probably right that it’s clearest to just initialize a correctly sized vector to all (0, 0) and then change the elements that need changing.

Yeah, I've stepped through this several times. The call to emplace_back happens after the vector has been truncated by resize.

Actually, on the face of it I think the code was probably fine as it stood but the change keeps Valgrind happy. The other option then is to leave this bit of the code alone and to simply add the offending stack trace to a Valgrind suppression file (that would need to be created and committed as part of this PR).

Which leads me to another suggestion. Should we include Valgrind checks as part of the Jenkins CI builds for PRs? Maybe we could also run a full Valgrind check once a week or so, over the weekend?

edsavage · 2019-11-28T09:34:48Z

Looks like there's a few test case failures due to changed threshold values mainly, investigating.

tveasey

The change to the bound check looks right to me. However, I don't think it is necessary to pad the value vector with default initialised values.

tveasey · 2019-11-27T19:40:35Z

lib/maths/CSignal.cc


-    TComplexVec f;
-    f.reserve(n);
+    TComplexVec f(n, TComplex{});


This doesn't seem right to me. Consider the case that count(value[i]) > 0 for all i then the vector contains a pad of n TComplex{} followed by n TComplex{CBasicStatistics::mean(values[i]) - mean, 0.0} values. Basically, I'm not sure why this is now resized. Also, I can't see why it could cause a valgrind error since we only resize or append to f in the following.

Yeah, this change was just an attempt to shut Valgrind up about the following...

Conditional jump or move depends on uninitialised value(s) std::__1::complex<double> std::__1::operator*<double>(std::__1::complex<double> const&, std::__1::complex<double> const&) std::__1::complex<double>& std::__1::complex<double>::operator*=<double>(std::__1::complex<double> const&) ml::maths::CSignal::hadamard(std::__1::vector<std::__1::complex, std::__1::allocator> const&, std::__1::vector<std::__1::complex, std::__1::allocator>&) ml::maths::CSignal::fft(std::__1::vector<std::__1::complex, std::__1::allocator>&) ml::maths::CSignal::autocorrelations(std::__1::vector<ml::maths::CBasicStatistics::SSampleCentralMoments, std::__1::allocator> const&, std::__1::vector<double, std::__1::allocator>&) ml::maths::(anonymous namespace)::mostSignificantPeriodicComponent(std::__1::vector<ml::maths::CBasicStatistics::SSampleCentralMoments, std::__1::allocator>) ml::maths::testForPeriods(ml::maths::CPeriodicityHypothesisTestsConfig const&, long, long, std::__1::vector<ml::maths::CBasicStatistics::SSampleCentralMoments, std::__1::allocator> const&) ml::maths::CTimeSeriesDecompositionDetail::CPeriodicityTest::test(ml::maths::CTimeSeriesDecompositionDetail::SAddValue const&) ml::maths::CTimeSeriesDecompositionDetail::CPeriodicityTest::handle(ml::maths::CTimeSeriesDecompositionDetail::SAddValue const&) ml::maths::CTimeSeriesDecomposition::addPoint(long, double, std::__1::array<double, 4ul> const&, std::__1::function<void (std::__1::vector)> const&) CTimeSeriesDecompositionTest::testMixedSmoothAndSpikeyDataProblemCase::test_method() CTimeSeriesDecompositionTest::testMixedSmoothAndSpikeyDataProblemCase_invoker() Uninitialised value was created by a heap allocation malloc operator new(unsigned long) std::__1::__libcpp_allocate(unsigned long, unsigned long) std::__1::allocator<std::__1::complex>::allocate(unsigned long, void const*) std::__1::allocator_traits<std::__1::allocator>::allocate(std::__1::allocator<std::__1::complex>&, unsigned long) std::__1::__split_buffer<std::__1::complex, std::__1::allocator&>::__split_buffer(unsigned long, unsigned long, std::__1::allocator<std::__1::complex>&) std::__1::__split_buffer<std::__1::complex, std::__1::allocator&>::__split_buffer(unsigned long, unsigned long, std::__1::allocator<std::__1::complex>&) std::__1::vector<std::__1::complex, std::__1::allocator>::reserve(unsigned long) ml::maths::CSignal::autocorrelations(std::__1::vector<ml::maths::CBasicStatistics::SSampleCentralMoments, std::__1::allocator> const&, std::__1::vector<double, std::__1::allocator>&) ml::maths::(anonymous namespace)::mostSignificantPeriodicComponent(std::__1::vector<ml::maths::CBasicStatistics::SSampleCentralMoments, std::__1::allocator>) ml::maths::testForPeriods(ml::maths::CPeriodicityHypothesisTestsConfig const&, long, long, std::__1::vector<ml::maths::CBasicStatistics::SSampleCentralMoments, std::__1::allocator> const&) ml::maths::CTimeSeriesDecompositionDetail::CPeriodicityTest::test(ml::maths::CTimeSeriesDecompositionDetail::SAddValue const&)

Maybe Valgrind just got confused? If so, as I said in a comment above, I can always just add a suppression for this warning.

Let me study the code a bit and have a think about this. I can't at present see any reason why the loop should be triggering an error so may be a false positive, but I think we would need an alternative fix if we wanted to tackle this. One possibility would be to presize then not extend, but just write values into place.

Let me study the code a bit and have a think about this. I can't at present see any reason why the loop should be triggering an error so may be a false positive, but I think we would need an alternative fix if we wanted to tackle this. One possibility would be to presize then not extend, but just write values into place.

Thanks Tom. Welcome back btw :-)

Along these lines, how about the following:

TComplexVec f(n, TComplex{0.0, 0.0}); for (std::size_t i = 0; i < n; ++i) { std::size_t j = i; for (/**/; j < n && CBasicStatistics::count(values[j]) == 0; ++j) { // no-op } if (i < j) { // Infer missing values by linearly interpolating. if (j == n) { break; } else if (i > 0) { for (std::size_t k = i; k < j; ++k) { double alpha{static_cast<double>(k - i + 1) / static_cast<double>(j - i + 1)}; double fj{CBasicStatistics::mean(values[j]) - mean}; f[k] = (1.0 - alpha) * f[i - 1] + alpha * TComplex{fj, 0.0}; } } i = j; } f[i] = TComplex{CBasicStatistics::mean(values[i]) - mean, 0.0}; }

This is actually clearer IMO and may incidentally fix the valgrind error (which still seems spurious to me).

That also looked odd to me. The offending call was from fft, but it is applied to a and b which are initialised as

TComplexVec a(m, TComplex(0.0)); TComplexVec b(m, TComplex(0.0));

so must be of the same size. The only other thing I thought on this was does calling TComplex(0.0) leave the imaginary part uninitialised, I haven't checked? May be clearer anyway to change to TComplex{0.0, 0.0} in these definitions.

Did you see the error above before or after the error in radix2fft was fixed?

@droberts195 I think the error was visible after the radix2fft was fixed (at least I didn't notice it in the initial Valgrind report)

Also @tveasey, your suggested change above ^^^ looks good. No Valgrind errors in CTimeSeriesDecompositionTest/testMixedSmoothAndSpikeyDataProblemCase at least.

so must be of the same size

hadamard is a public method, so not necessarily called from there. Also, a bug in another method called in between initialization and the call to hadamard could accidentally change the size of one of the vectors. Saying "so must be of the same size" assumes the reader is intimately familiar with all possible code paths.

does calling TComplex(0.0) leave the imaginary part uninitialised

No, it initializes it to 0.0. I checked the library code when I first opened #845. But I agree it's clearer and equally efficient to explicitly pass both arguments.

hadamard is a public method, so not necessarily called from there.

But the stack trace includes CSignal::fft

tveasey

LGTM

droberts195

LGTM

* Fixes Valgrind errors in `CSignal` * Also fixes a couple more warnings picked up by clang_tidy (cherry picked from commit 8fd6ed3)

* Fixes Valgrind errors in CSignal * Also fixes a couple more warnings picked up by clang_tidy Backports #852

Fixes another valgrind error in CSignal

eaa7744

Also fixes a couple more warnings picked up by clang_tidy

edsavage added >bug review :ml v8.0.0 v7.6.0 labels Nov 27, 2019

Updated Changelog

918ada2

edsavage requested review from droberts195 and tveasey November 27, 2019 19:18

droberts195 reviewed Nov 27, 2019

View reviewed changes

tveasey reviewed Nov 28, 2019

View reviewed changes

edsavage added 2 commits November 28, 2019 10:12

Updated Changelog

77d13c7

Addresses code review comments

45b11cc

tveasey approved these changes Nov 28, 2019

View reviewed changes

droberts195 approved these changes Nov 28, 2019

View reviewed changes

edsavage merged commit 8fd6ed3 into elastic:master Nov 28, 2019

edsavage added a commit to edsavage/ml-cpp that referenced this pull request Nov 28, 2019

[ML] Fix memory errors picked up by Valgrind (elastic#852)

8500f18

* Fixes Valgrind errors in `CSignal` * Also fixes a couple more warnings picked up by clang_tidy (cherry picked from commit 8fd6ed3)

edsavage mentioned this pull request Nov 28, 2019

[ML][7.6] Fix memory errors picked up by Valgrind (#852) #858

Merged

edsavage deleted the valgrind_error branch November 28, 2019 17:28

edsavage added a commit that referenced this pull request Nov 28, 2019

[ML] [7.6] Fix memory errors picked up by Valgrind (#852) (#858)

667a2b0

* Fixes Valgrind errors in CSignal * Also fixes a couple more warnings picked up by clang_tidy Backports #852

droberts195 mentioned this pull request Dec 4, 2019

[ML] Adds "make memcheck" target to build system. #876

Closed

[ML] Fix memory errors picked up by Valgrind #852

[ML] Fix memory errors picked up by Valgrind #852

Uh oh!

Conversation

edsavage commented Nov 27, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

edsavage commented Nov 28, 2019

Uh oh!

tveasey left a comment

Choose a reason for hiding this comment

Uh oh!

tveasey Nov 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tveasey Nov 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tveasey left a comment

Choose a reason for hiding this comment

Uh oh!

droberts195 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tveasey Nov 27, 2019 •

edited

Loading

tveasey Nov 28, 2019 •

edited

Loading