New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU preemption failure #291
Comments
Could you please provide some logs with log level Do you have some way to reproduce the problem? |
2022-11-24T23:07:17.607 INFO storage_proofs_core::compound_proof > snark_proof:start |
From the log messages it's hard to tell, which lines comes from which process/thread. It could well be that the WinningPoSt one got priority. Why are you sure it didn't? Are you able to reproduce the issue? Are you compiling the Rust parts from source? I'm asking as if you can, I might be able to provide you a version, where it also logs the thread ID, so that we can distinguish them. |
you can run "cargo test test_parallel_prover --features "cuda" -- --nocapture" with v0.21.0 and v0.22.0, and then compare rust DEBUG log, we find v0.21.0 could get "[2022-11-28T13:26:12Z WARN bellperson::gpu::locks] GPU acquired by a high priority process! Freeing up Multiexp kernels..." if happened conflict, but v0.22.0 never get this log. and for my lotus-miner, When the wdpost calculation and winningpost calculation occur at the same time, although the priority of winningpost is true and that of wdpost is false, winningpost still fails to preempt the GPU,and then winningpost computing timeout. |
Thanks @Elhorses for providing the command to run. I think I can reproduce it, I'm having a look. |
OK, thank you ! |
Thanks, that'll save me a lot of time! |
Due to refactorings, the `PriorityLock::should_break()` logic was quite confusing and used the wrong way. Make it work correctly while simplifying the logic. This commit also removes `PriorityLock` from the public API as it isn't really needed. Tweak some values in the parallel prover test, so that aborting a low priority kernel from running on the GPU happens more frequently. Fixes #291.
@Elhorses here's my version of a fix: #293. It's for the master branch, but it should be easily applicable to older |
Ok, thank for you help, i'll use it |
Due to refactorings, the `PriorityLock::should_break()` logic was quite confusing and used the wrong way. Make it work correctly while simplifying the logic. This commit also removes `PriorityLock` from the public API as it isn't really needed. Tweak some values in the parallel prover test, so that aborting a low priority kernel from running on the GPU happens more frequently. It needs a newer version of `ec-gpu-gen`, else it would cause panics (which are not fatal, as they happen within a thread. Though, they still show up in the logs). Fixes #291.
hello, can we using bellperson on the AMD GPU? |
The OpenCL version should run on AMD GPUs. If it doesn't, it's a bug. Please report if you run into problems. |
ok, thank for you help! |
Due to refactorings, the `PriorityLock::should_break()` logic was quite confusing and used the wrong way. Make it work correctly while simplifying the logic. This commit also removes `PriorityLock` from the public API as it isn't really needed. Tweak some values in the parallel prover test, so that aborting a low priority kernel from running on the GPU happens more frequently. It needs a newer version of `ec-gpu-gen`, else it would cause panics (which are not fatal, as they happen within a thread. Though, they still show up in the logs). Fixes #291.
When the wdpost calculation and winningpost calculation occur at the same time, although the priority of winningpost is true and that of wdpost is false, winningpost still fails to preempt the GPU,and then winningpost computing timeout
The text was updated successfully, but these errors were encountered: