Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

valgrind chokes on rdrand from gcc libstdc++ #6705

Closed
ezyang opened this issue Apr 18, 2018 · 6 comments
Closed

valgrind chokes on rdrand from gcc libstdc++ #6705

ezyang opened this issue Apr 18, 2018 · 6 comments
Labels
module: cpu CPU specific problem (e.g., perf, algorithm) module: dependency bug Problem is not caused by us, but caused by an upstream library we use triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@ezyang
Copy link
Contributor

ezyang commented Apr 18, 2018

I've attempted to use valgrind on PyTorch in the past and ran into https://bugs.kde.org/show_bug.cgi?id=387940

Apparently, the culprit was -march=native. If you're going to valgrind your PyTorch, you shouldn't build with this flag.

This issue is to track:

  1. Testing that this indeed resolves the issue
  2. Making it easy to turn of -march=native in case you are doing a valgrind run

cc @VitalyFedyunin

@apaszke
Copy link
Contributor

apaszke commented Apr 18, 2018

When and what do we build with -march=native? 😮

@ezyang
Copy link
Contributor Author

ezyang commented Apr 18, 2018

Maybe it's not actually -march=native, but explicit requests for funny instructions, e.g., in our CPU kernels.

@ezyang ezyang changed the title Don't use -march=native when running valgrind Don't use -march when running valgrind Apr 18, 2018
@apaszke
Copy link
Contributor

apaszke commented Apr 18, 2018

That's possible. We should just make it possible to control the implementation selector via env vars, so you can disable those paths for debugging and testing.

@yf225
Copy link
Contributor

yf225 commented Apr 18, 2018

I ran into this exact same issue when I was running https://github.com/pytorch/pytorch/blob/master/aten/tools/run_tests.sh#L24 with CUDA enabled, although there is no error if the CUDA path is avoided and THCRandom_init is not called. I wondered if this is Valgrind's incompatibility with CUDA?

@colesbury
Copy link
Member

You can disable the AVX and AVX2 kernels by setting both environment variables ATEN_DISABLE_AVX2=1 ATEN_DISABLE_AVX=1.

But that's NOT the issue you're seeing. The issue is that GCC's libstdc++ uses RDRAND for std::random_device on Linux. I suspect that's in the library and doesn't depend on any compiler flag we provide. (We're not compiling the relevant code with -march=...)

http://www.pcg-random.org/posts/cpps-random_device.html

On Linux, we could switch to reading from /dev/urandom instead of std::random_device.

@ezyang ezyang changed the title Don't use -march when running valgrind valgrind chokes on rdrand from gcc libstdc++ Oct 20, 2020
@ezyang ezyang added module: cpu CPU specific problem (e.g., perf, algorithm) triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: dependency bug Problem is not caused by us, but caused by an upstream library we use labels Oct 20, 2020
@ezyang
Copy link
Contributor Author

ezyang commented Oct 20, 2020

Upstream reports this bug is fixed https://bugs.kde.org/show_bug.cgi?id=353370 so I'm gonna assume that you can get a recent enough version valgrind to work aroudn this problem

@ezyang ezyang closed this as completed Oct 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: cpu CPU specific problem (e.g., perf, algorithm) module: dependency bug Problem is not caused by us, but caused by an upstream library we use triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

4 participants