-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random blocks test fails on over 16400 MiB memory #7
Comments
My guess is that this is related to #3. When I originally wrote this tool, no GPUs had even 4GB of memory, so there may be 32-bitness issues sitting around. Unfortunately I haven't been actively working on this tool for over 5 years now (and I don't have a GPU with so much RAM), so I won't be able to fix this myself. If someone is interested in submitting a pull request with a fix I'd be happy to merge it, though. |
It appears there may be a bug with random blocks that is separate from the memory size issue: https://forums.geforce.com/default/topic/1080529/rtx-strix-2080-errors-with-memtestg80-help/ My guess would be some kind of synchronization/warp size issue on newer GPUs, though it's also possible that the code is making assumptions about pointer size that are no longer true. |
Skip random blocks test if more than 16400 MiB are tested.
I have just tried testing on two new Quadro P6000 cards. Both return the same errors on tests while testing memory over 16400 MiB.
Following are results of 16400 passing, 16401 failing and 20000 failing:
./memtestG80 16400 1
Running 1 iterations of tests over 16400 MB of GPU memory on card 0: Quadro P6000
Running memory bandwidth test over 20 iterations of 8200 MB transfers...
Estimated bandwidth 328000000.00 MB/s
Test iteration 1 (GPU 0, 16400 MiB): 0 errors so far
Moving Inversions (ones and zeros): 0 errors (256 ms)
Memtest86 Walking 8-bit: 0 errors (2049 ms)
True Walking zeros (8-bit): 0 errors (1011 ms)
True Walking ones (8-bit): 0 errors (1012 ms)
Moving Inversions (random): 0 errors (258 ms)
Memtest86 Walking zeros (32-bit): 0 errors (4050 ms)
Memtest86 Walking ones (32-bit): 0 errors (4051 ms)
Random blocks: 0 errors (456 ms)
Memtest86 Modulo-20: 0 errors (23933 ms)
Logic (one iteration): 0 errors (129 ms)
Logic (4 iterations): 0 errors (130 ms)
Logic (shared memory, one iteration): 0 errors (129 ms)
Logic (shared-memory, 4 iterations): 0 errors (130 ms)
Final error count after 1 iterations over 16400 MiB of GPU memory: 0 errors
./memtestG80 16401 1
Running 1 iterations of tests over 16402 MB of GPU memory on card 0: Quadro P6000
Running memory bandwidth test over 20 iterations of 8201 MB transfers...
Estimated bandwidth 328040000.00 MB/s
Test iteration 1 (GPU 0, 16402 MiB): 0 errors so far
Moving Inversions (ones and zeros): 0 errors (257 ms)
Memtest86 Walking 8-bit: 0 errors (2052 ms)
True Walking zeros (8-bit): 0 errors (1010 ms)
True Walking ones (8-bit): 0 errors (1014 ms)
Moving Inversions (random): 0 errors (257 ms)
Memtest86 Walking zeros (32-bit): 0 errors (4050 ms)
Memtest86 Walking ones (32-bit): 0 errors (4051 ms)
Random blocks: 67198032 errors (457 ms)
Memtest86 Modulo-20: 0 errors (23952 ms)
Logic (one iteration): 0 errors (128 ms)
Logic (4 iterations): 0 errors (130 ms)
Logic (shared memory, one iteration): 0 errors (129 ms)
Logic (shared-memory, 4 iterations): 0 errors (130 ms)
Final error count after 1 iterations over 16402 MiB of GPU memory: 67198032 errors
./memtestG80 20000 1
Running 1 iterations of tests over 20000 MB of GPU memory on card 0: Quadro P6000
Running memory bandwidth test over 20 iterations of 10000 MB transfers...
Estimated bandwidth 2030456.85 MB/s
Test iteration 1 (GPU 0, 20000 MiB): 0 errors so far
Moving Inversions (ones and zeros): 0 errors (313 ms)
Memtest86 Walking 8-bit: 0 errors (2499 ms)
True Walking zeros (8-bit): 0 errors (1232 ms)
True Walking ones (8-bit): 0 errors (1234 ms)
Moving Inversions (random): 0 errors (314 ms)
Memtest86 Walking zeros (32-bit): 0 errors (4932 ms)
Memtest86 Walking ones (32-bit): 0 errors (4933 ms)
Random blocks: 2270811672 errors (557 ms)
Memtest86 Modulo-20: 0 errors (29190 ms)
Logic (one iteration): 0 errors (157 ms)
Logic (4 iterations): 0 errors (158 ms)
Logic (shared memory, one iteration): 0 errors (157 ms)
Logic (shared-memory, 4 iterations): 0 errors (157 ms)
Final error count after 1 iterations over 20000 MiB of GPU memory: 2270811672 errors
The number of errors are the same for each card. All other tests pass which makes me think this is a bug and not a failure of the card.
This is a great tool and has helped me find GPUs with problems.
Thank you
The text was updated successfully, but these errors were encountered: