-
Notifications
You must be signed in to change notification settings - Fork 66
[ML] Fix memory errors picked up by Valgrind #852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Also fixes a couple more warnings picked up by clang_tidy
docs/CHANGELOG.asciidoc
Outdated
* Reduce memory usage of {ml} native processes on Windows. (See {ml-pull}844[#844].) | ||
|
||
=== Bug Fixes | ||
* Fixes memory errors picked up by Valgrind. (See {ml-pull}852[#852].) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think the release notes should mention Valgrind. The end user won’t care how the problem was discovered (or if they really care they can drill down to the GitHub issue).
Maybe something like, “Fixes potential memory corruption when determining seasonality.”
lib/maths/CSignal.cc
Outdated
|
||
TComplexVec f; | ||
f.reserve(n); | ||
TComplexVec f(n, TComplex{}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loop below contains push_back
and emplace_back
calls on this vector. Is the final size meant to be n
? If so I think it will end up too big unless the loop contents are changed too.
Having said that, given that a couple of branches inside the loop set values to (0, 0) you’re probably right that it’s clearest to just initialize a correctly sized vector to all (0, 0) and then change the elements that need changing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I've stepped through this several times. The call to emplace_back
happens after the vector has been truncated by resize
.
Actually, on the face of it I think the code was probably fine as it stood but the change keeps Valgrind
happy. The other option then is to leave this bit of the code alone and to simply add the offending stack trace to a Valgrind
suppression file (that would need to be created and committed as part of this PR).
Which leads me to another suggestion. Should we include Valgrind
checks as part of the Jenkins CI builds for PRs? Maybe we could also run a full Valgrind
check once a week or so, over the weekend?
Looks like there's a few test case failures due to changed threshold values mainly, investigating. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change to the bound check looks right to me. However, I don't think it is necessary to pad the value vector with default initialised values.
lib/maths/CSignal.cc
Outdated
|
||
TComplexVec f; | ||
f.reserve(n); | ||
TComplexVec f(n, TComplex{}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem right to me. Consider the case that count(value[i]) > 0
for all i
then the vector contains a pad of n TComplex{}
followed by n TComplex{CBasicStatistics::mean(values[i]) - mean, 0.0}
values. Basically, I'm not sure why this is now resized. Also, I can't see why it could cause a valgrind error since we only resize or append to f
in the following.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this change was just an attempt to shut Valgrind
up about the following...
Conditional jump or move depends on uninitialised value(s)
std::__1::complex<double> std::__1::operator*<double>(std::__1::complex<double> const&, std::__1::complex<double> const&)
std::__1::complex<double>& std::__1::complex<double>::operator*=<double>(std::__1::complex<double> const&)
ml::maths::CSignal::hadamard(std::__1::vector<std::__1::complex, std::__1::allocator> const&, std::__1::vector<std::__1::complex, std::__1::allocator>&)
ml::maths::CSignal::fft(std::__1::vector<std::__1::complex, std::__1::allocator>&)
ml::maths::CSignal::autocorrelations(std::__1::vector<ml::maths::CBasicStatistics::SSampleCentralMoments, std::__1::allocator> const&, std::__1::vector<double, std::__1::allocator>&)
ml::maths::(anonymous namespace)::mostSignificantPeriodicComponent(std::__1::vector<ml::maths::CBasicStatistics::SSampleCentralMoments, std::__1::allocator>)
ml::maths::testForPeriods(ml::maths::CPeriodicityHypothesisTestsConfig const&, long, long, std::__1::vector<ml::maths::CBasicStatistics::SSampleCentralMoments, std::__1::allocator> const&)
ml::maths::CTimeSeriesDecompositionDetail::CPeriodicityTest::test(ml::maths::CTimeSeriesDecompositionDetail::SAddValue const&)
ml::maths::CTimeSeriesDecompositionDetail::CPeriodicityTest::handle(ml::maths::CTimeSeriesDecompositionDetail::SAddValue const&)
ml::maths::CTimeSeriesDecomposition::addPoint(long, double, std::__1::array<double, 4ul> const&, std::__1::function<void (std::__1::vector)> const&)
CTimeSeriesDecompositionTest::testMixedSmoothAndSpikeyDataProblemCase::test_method()
CTimeSeriesDecompositionTest::testMixedSmoothAndSpikeyDataProblemCase_invoker()
Uninitialised value was created by a heap allocation
malloc
operator new(unsigned long)
std::__1::__libcpp_allocate(unsigned long, unsigned long)
std::__1::allocator<std::__1::complex>::allocate(unsigned long, void const*)
std::__1::allocator_traits<std::__1::allocator>::allocate(std::__1::allocator<std::__1::complex>&, unsigned long)
std::__1::__split_buffer<std::__1::complex, std::__1::allocator&>::__split_buffer(unsigned long, unsigned long, std::__1::allocator<std::__1::complex>&)
std::__1::__split_buffer<std::__1::complex, std::__1::allocator&>::__split_buffer(unsigned long, unsigned long, std::__1::allocator<std::__1::complex>&)
std::__1::vector<std::__1::complex, std::__1::allocator>::reserve(unsigned long)
ml::maths::CSignal::autocorrelations(std::__1::vector<ml::maths::CBasicStatistics::SSampleCentralMoments, std::__1::allocator> const&, std::__1::vector<double, std::__1::allocator>&)
ml::maths::(anonymous namespace)::mostSignificantPeriodicComponent(std::__1::vector<ml::maths::CBasicStatistics::SSampleCentralMoments, std::__1::allocator>)
ml::maths::testForPeriods(ml::maths::CPeriodicityHypothesisTestsConfig const&, long, long, std::__1::vector<ml::maths::CBasicStatistics::SSampleCentralMoments, std::__1::allocator> const&)
ml::maths::CTimeSeriesDecompositionDetail::CPeriodicityTest::test(ml::maths::CTimeSeriesDecompositionDetail::SAddValue const&)
Maybe Valgrind
just got confused? If so, as I said in a comment above, I can always just add a suppression for this warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me study the code a bit and have a think about this. I can't at present see any reason why the loop should be triggering an error so may be a false positive, but I think we would need an alternative fix if we wanted to tackle this. One possibility would be to presize then not extend, but just write values into place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me study the code a bit and have a think about this. I can't at present see any reason why the loop should be triggering an error so may be a false positive, but I think we would need an alternative fix if we wanted to tackle this. One possibility would be to presize then not extend, but just write values into place.
Thanks Tom. Welcome back btw :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Along these lines, how about the following:
TComplexVec f(n, TComplex{0.0, 0.0});
for (std::size_t i = 0; i < n; ++i) {
std::size_t j = i;
for (/**/; j < n && CBasicStatistics::count(values[j]) == 0; ++j) {
// no-op
}
if (i < j) {
// Infer missing values by linearly interpolating.
if (j == n) {
break;
} else if (i > 0) {
for (std::size_t k = i; k < j; ++k) {
double alpha{static_cast<double>(k - i + 1) / static_cast<double>(j - i + 1)};
double fj{CBasicStatistics::mean(values[j]) - mean};
f[k] = (1.0 - alpha) * f[i - 1] + alpha * TComplex{fj, 0.0};
}
}
i = j;
}
f[i] = TComplex{CBasicStatistics::mean(values[i]) - mean, 0.0};
}
This is actually clearer IMO and may incidentally fix the valgrind error (which still seems spurious to me).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That also looked odd to me. The offending call was from fft
, but it is applied to a
and b
which are initialised as
TComplexVec a(m, TComplex(0.0));
TComplexVec b(m, TComplex(0.0));
so must be of the same size. The only other thing I thought on this was does calling TComplex(0.0)
leave the imaginary part uninitialised, I haven't checked? May be clearer anyway to change to TComplex{0.0, 0.0}
in these definitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you see the error above before or after the error in radix2fft was fixed?
@droberts195 I think the error was visible after the radix2fft
was fixed (at least I didn't notice it in the initial Valgrind report)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also @tveasey, your suggested change above ^^^ looks good. No Valgrind errors in CTimeSeriesDecompositionTest/testMixedSmoothAndSpikeyDataProblemCase
at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so must be of the same size
hadamard
is a public method, so not necessarily called from there. Also, a bug in another method called in between initialization and the call to hadamard
could accidentally change the size of one of the vectors. Saying "so must be of the same size" assumes the reader is intimately familiar with all possible code paths.
does calling
TComplex(0.0)
leave the imaginary part uninitialised
No, it initializes it to 0.0
. I checked the library code when I first opened #845. But I agree it's clearer and equally efficient to explicitly pass both arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hadamard is a public method, so not necessarily called from there.
But the stack trace includes CSignal::fft
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Fixes Valgrind errors in `CSignal` * Also fixes a couple more warnings picked up by clang_tidy (cherry picked from commit 8fd6ed3)
Also fixes a couple more warnings picked up by clang_tidy
closes #845