Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Valgrind errors in multiple unit tests #4439

Closed
saatvikshah opened this issue Dec 17, 2018 · 14 comments
Closed

Valgrind errors in multiple unit tests #4439

saatvikshah opened this issue Dec 17, 2018 · 14 comments

Comments

@saatvikshah
Copy link
Contributor

On the current latest develop branch, when running valgrind --leak-check=full ./shogun-unit-test --gtest_filter=*NeuralInputLayer*(or other filters) Im getting thousands of errors all of which seem to be complaining Conditional jump or move depends on uninitialised value(s).

When running the same on a version from one month back(commit hash f6efd89467a7016be8a50bf3f7c0fdf32b075369) theres no errors.

Here is the valgrind output log:
valgrind_out.txt

@gf712
Copy link
Member

gf712 commented Dec 17, 2018

Hi, which compiler did you use to compile shogun? I just did the same check (both in debug and release mode) and did not get any errors. (I also used valgrind 3.13.0)

@saatvikshah
Copy link
Contributor Author

I basically setup a CLion project on my Ubuntu 18.04 machine and tested it on the executable generated there. In that case the above errors show.
But when I build it in a separate build directory there are no errors.
The CXX compiler used seems to be the same. Since Im not able to reproduce it - I guess it can be closed?

@gf712
Copy link
Member

gf712 commented Dec 18, 2018

What are the compile flags passed by your CLion via cmake? I think it might be worth investigating it now, considering it is a recent change that causes that error and we can narrow it down.

@karlnapf
Copy link
Member

Many thanks for reporting. Can you track down the exact change that started causing this?

@saatvikshah
Copy link
Contributor Author

I thought might be a one off thing, but after deleting and rebuilding valgrind still reports the errors. That said, here is what the cmake configuration step looks like with the CLion build:
cmake-build-out.txt

@karlnapf
Copy link
Member

again, if we want to find out what is going on here, it might be best to find out what commit (or even better line changed) started causing those issues ... whether they are real issues, and what the best solution would be

@saatvikshah
Copy link
Contributor Author

Sure, I can try to find out where its failing - Regarding whether its a real issue I'm not sure how to figure out. It does seem awkward that the error only occurs when building with CLion.
A few other observations:

  1. This warning always creeps up right at the start: ==25464== Warning: set address range perms: large range [0x8f21000, 0x24c75000) (defined).
  2. I'm running valgrind from a Parallels Virtual machine.

@saatvikshah
Copy link
Contributor Author

Another update, so I've figured out when Valgrind reports the error - its when I build with cmake -DCMAKE_BUILD_TYPE=Debug -DENABLE_TESTING=ON -DBUILD_META_EXAMPLES=ON -DENABLE_ASAN=ON -DENABLE_MSAN=ON -DENABLE_TSAN=ON -DENABLE_UBSAN=ON .. . There are no errors showing up if the address sanitizers are disabled. Apologies for the cluttered/slow approach but my main OS is a Mac which doesnt support Valgrind so I have to run this on my parallels VM(Ubuntu 18.04) which takes like an hour to build each time :/.

I'll try checking a few commits in between and let you'll know. But could someone else confirm theyre seeing these errors and its not just me.

@saatvikshah
Copy link
Contributor Author

It seems the commit on Nov 15th, AnyParameterProperties refactor is the source.

Heres the rough analysis on a few commits(all are Debug builds with sanitizer support enabled - flags seen in previous commit):

Commit#, Date, Name, Errors?
f6efd89467a7016be8a50bf3f7c0fdf32b075369, Oct 31, Add svmlin & svmsgd meta example: No errors
b45243cf5afadcea5dee1df21284c2442ee01e98, Nov 15, AnyParameterProperties refactor: errors reported 
a92ba8d9e43adf61979b5cf03e89690ad2c6313f(Current develop HEAD), Dec 21, static initialisation of maps: errors

@karlnapf
Copy link
Member

very interesting! This is a great investigation, appreciated.

@gf712 is away for a week and a bit so it might take a while to fix it. But I will have a look at the commit and check/think a bit

@gf712
Copy link
Member

gf712 commented Jan 1, 2019

Just had a better look, and it seems like the default constructor of AnyParameterProperties does not set the old class member values which then causes issues when the copy constructor is called. I’ll fix it! I’m not sure what the large range is about though.

@gf712
Copy link
Member

gf712 commented Jan 4, 2019

OK, should be fixed.
@saatvikshah1994 if you used MacOS and need valgrind with the newest OS update you should use Docker instead of VM (it should be quicker). I wrote a python script that runs cmake configs with the shogun docker image (https://github.com/gf712/shogun-docker) and then runs valgrind

@saatvikshah
Copy link
Contributor Author

@gf712 thanks for the suggestion, I'll try it out!

@saatvikshah
Copy link
Contributor Author

Also, isnt it a bit surprising that the problem only reveals itself when running valgrind with sanitizers. I read up more and apparently running valgrind with sanitizers leads to problems at times. That said, it still looks like this wasnt a false positive. Im curious about others thoughts on this? What should be the correct way to run valgrind/sanitizers independently to catch this bug?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants