-
Notifications
You must be signed in to change notification settings - Fork 389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Docker Image CUDA ERROR #104
Comments
Thank you for reporting, @aerdem4 I am receiving the same error on my machine with a 3090 and the host cuda: As everything runs smoothly on all other tested machines, I expected that to be a rare issue. Seems, I was wrong. I'll investigate more and try to find a solution. Do you have any special ENV vars set on your host machine regarding cuda? That is one thing that I have set differently on the machine where the docker can't initialize the GPU. |
I don't think so. Maybe 11.8 is just not compatible with 3090? Was any of successful tests on 3090? |
no tests that I am aware of. Other tests included A100, A10G, A6000, V100 (all successful) I just tested a docker build with |
It didn't work for me. Same error. |
When I change the base image to my host machine CUDA version, it works. |
Yeah, unfortunately I hope that this PR fixes it: For now I am hesitant to try to address this too much, as it seems this is also only an issue on Docker for some setups. If manually fixing the cuda version fixes it for you, it sounds like a good workaround. Otherwise it might be also good idea to run it outside of Docker with the make commands. |
Seems to be working for me on 3090 and docker but I seem to have different versions of stuff. Docker image used: gcr.io/vorvan/h2oai/h2o-llmstudio:nightly (appears to be created on May 21, 2023, 6:12:32 AM)
nvidia-smi (ran from inside of the container)
|
We updated
Please reopen if issues still persist. |
馃悰 Bug
I am getting the warning below and the nightly Docker image doesn't see my GPU. I have RTX 3090 with Driver Version: 470.182.03 CUDA Version: 11.4 on the host machine.
UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.)
Docker Image CUDA version seems to be 11.8 and my driver version should support it.
To Reproduce
sudo docker run --runtime=nvidia --shm-size=64g --init --rm -p 10101:10101 -v
pwd
/data:/workspace/data -vpwd
/output:/workspace/output gcr.io/vorvan/h2oai/h2o-llmstudio:nightlyThe text was updated successfully, but these errors were encountered: