Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random blocks test fails on over 16400 MiB memory #7

Open
jsc2 opened this issue Feb 14, 2019 · 2 comments
Open

Random blocks test fails on over 16400 MiB memory #7

jsc2 opened this issue Feb 14, 2019 · 2 comments

Comments

@jsc2
Copy link

jsc2 commented Feb 14, 2019

I have just tried testing on two new Quadro P6000 cards. Both return the same errors on tests while testing memory over 16400 MiB.

Following are results of 16400 passing, 16401 failing and 20000 failing:

./memtestG80 16400 1

 -------------------------------------------------------------
 |                      MemtestG80 v1.00                     |
 |                                                           |
 | Usage: memtestG80 [flags] [MB GPU RAM to test] [# iters]  |
 |                                                           |
 | Defaults: GPU 0, 128MB RAM, 50 test iterations            |
 | Amount of tested RAM will be rounded up to nearest 2MB    |
 -------------------------------------------------------------

  Available flags:
    --gpu N ,-g N : run test on the Nth (from 0) CUDA GPU
    --license ,-l : show license terms for this build

Running 1 iterations of tests over 16400 MB of GPU memory on card 0: Quadro P6000

Running memory bandwidth test over 20 iterations of 8200 MB transfers...
Estimated bandwidth 328000000.00 MB/s

Test iteration 1 (GPU 0, 16400 MiB): 0 errors so far
Moving Inversions (ones and zeros): 0 errors (256 ms)
Memtest86 Walking 8-bit: 0 errors (2049 ms)
True Walking zeros (8-bit): 0 errors (1011 ms)
True Walking ones (8-bit): 0 errors (1012 ms)
Moving Inversions (random): 0 errors (258 ms)
Memtest86 Walking zeros (32-bit): 0 errors (4050 ms)
Memtest86 Walking ones (32-bit): 0 errors (4051 ms)
Random blocks: 0 errors (456 ms)
Memtest86 Modulo-20: 0 errors (23933 ms)
Logic (one iteration): 0 errors (129 ms)
Logic (4 iterations): 0 errors (130 ms)
Logic (shared memory, one iteration): 0 errors (129 ms)
Logic (shared-memory, 4 iterations): 0 errors (130 ms)

Final error count after 1 iterations over 16400 MiB of GPU memory: 0 errors

./memtestG80 16401 1

 -------------------------------------------------------------
 |                      MemtestG80 v1.00                     |
 |                                                           |
 | Usage: memtestG80 [flags] [MB GPU RAM to test] [# iters]  |
 |                                                           |
 | Defaults: GPU 0, 128MB RAM, 50 test iterations            |
 | Amount of tested RAM will be rounded up to nearest 2MB    |
 -------------------------------------------------------------

  Available flags:
    --gpu N ,-g N : run test on the Nth (from 0) CUDA GPU
    --license ,-l : show license terms for this build

Running 1 iterations of tests over 16402 MB of GPU memory on card 0: Quadro P6000

Running memory bandwidth test over 20 iterations of 8201 MB transfers...
Estimated bandwidth 328040000.00 MB/s

Test iteration 1 (GPU 0, 16402 MiB): 0 errors so far
Moving Inversions (ones and zeros): 0 errors (257 ms)
Memtest86 Walking 8-bit: 0 errors (2052 ms)
True Walking zeros (8-bit): 0 errors (1010 ms)
True Walking ones (8-bit): 0 errors (1014 ms)
Moving Inversions (random): 0 errors (257 ms)
Memtest86 Walking zeros (32-bit): 0 errors (4050 ms)
Memtest86 Walking ones (32-bit): 0 errors (4051 ms)
Random blocks: 67198032 errors (457 ms)
Memtest86 Modulo-20: 0 errors (23952 ms)
Logic (one iteration): 0 errors (128 ms)
Logic (4 iterations): 0 errors (130 ms)
Logic (shared memory, one iteration): 0 errors (129 ms)
Logic (shared-memory, 4 iterations): 0 errors (130 ms)

Final error count after 1 iterations over 16402 MiB of GPU memory: 67198032 errors

./memtestG80 20000 1

 -------------------------------------------------------------
 |                      MemtestG80 v1.00                     |
 |                                                           |
 | Usage: memtestG80 [flags] [MB GPU RAM to test] [# iters]  |
 |                                                           |
 | Defaults: GPU 0, 128MB RAM, 50 test iterations            |
 | Amount of tested RAM will be rounded up to nearest 2MB    |
 -------------------------------------------------------------

  Available flags:
    --gpu N ,-g N : run test on the Nth (from 0) CUDA GPU
    --license ,-l : show license terms for this build

Running 1 iterations of tests over 20000 MB of GPU memory on card 0: Quadro P6000

Running memory bandwidth test over 20 iterations of 10000 MB transfers...
Estimated bandwidth 2030456.85 MB/s

Test iteration 1 (GPU 0, 20000 MiB): 0 errors so far
Moving Inversions (ones and zeros): 0 errors (313 ms)
Memtest86 Walking 8-bit: 0 errors (2499 ms)
True Walking zeros (8-bit): 0 errors (1232 ms)
True Walking ones (8-bit): 0 errors (1234 ms)
Moving Inversions (random): 0 errors (314 ms)
Memtest86 Walking zeros (32-bit): 0 errors (4932 ms)
Memtest86 Walking ones (32-bit): 0 errors (4933 ms)
Random blocks: 2270811672 errors (557 ms)
Memtest86 Modulo-20: 0 errors (29190 ms)
Logic (one iteration): 0 errors (157 ms)
Logic (4 iterations): 0 errors (158 ms)
Logic (shared memory, one iteration): 0 errors (157 ms)
Logic (shared-memory, 4 iterations): 0 errors (157 ms)

Final error count after 1 iterations over 20000 MiB of GPU memory: 2270811672 errors

The number of errors are the same for each card. All other tests pass which makes me think this is a bug and not a failure of the card.

This is a great tool and has helped me find GPUs with problems.
Thank you

@ihaque
Copy link
Owner

ihaque commented Feb 16, 2019

My guess is that this is related to #3. When I originally wrote this tool, no GPUs had even 4GB of memory, so there may be 32-bitness issues sitting around.

Unfortunately I haven't been actively working on this tool for over 5 years now (and I don't have a GPU with so much RAM), so I won't be able to fix this myself. If someone is interested in submitting a pull request with a fix I'd be happy to merge it, though.

@ihaque
Copy link
Owner

ihaque commented Jul 3, 2019

It appears there may be a bug with random blocks that is separate from the memory size issue:

https://forums.geforce.com/default/topic/1080529/rtx-strix-2080-errors-with-memtestg80-help/

My guess would be some kind of synchronization/warp size issue on newer GPUs, though it's also possible that the code is making assumptions about pointer size that are no longer true.

aeszter added a commit to aeszter/memtestG80 that referenced this issue Dec 15, 2021
Skip random blocks test if more than 16400 MiB are tested.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants