-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error running docker image #53
Comments
Dear @miguel-negrao , thanks for this report. |
Note that I don't have CUDA Version: 11.2 installed. nvidia-smi does not report the right version, I read in multiple places.
|
It's quite a strange behavior. I guess the docker image uses its own cuda components, but that they need to be compatible with your drivers. In other word, I would suggest to modify the first line of the docker file: into Could you try this trick and let me know if it fixes your problem ? |
You'll find bellow the driver information machines on which the docker image works fine. NVIDIA-SMI 418.56 Driver Version: 418.56 CUDA Version: 10.1 |
Will try downgrading driver. I thought that drivers were backwards compatible with older version of the cuda runtime... |
Downgraded to 418.56
I get the same error:
|
Did you manage to use any other program using tensorflow 2.3.0 and cudnn ? |
I did not attempt that. |
Sorry, actually I did. I can run Mozilla's Deepspeech which is built with tensorflow v2.3.0-6-g23ad988, but it only runs with a driver version higher than 418, for instance with 460 it runs fine.
With a driver 418.x I get |
I've compiled a simple cudnn example from here.
It runs without problem:
If I add some code for gpu info
Then I get
This was done using Info on Cuda installation:
|
Right now, I do not know the reason of this issue... Here are my suggestions: If you need to process a small amount of files, you may use only the cpu and avoid these cudnn issues. In order to do this, you may remove the "--gpus all" option from docker's command line. This would be the faster trick to test if you need results on small amounts of data. I you have some more time, I may suggest to change the Dockerfile in order to use more recent versions of tensorflow & cudnn: FROM tensorflow/tensorflow:2.3.0-gpu-jupyter into Let me know if any of this option works for you, Kind regards and sorry for the inconvenience. |
I've also tested a simple keras example from here It runs correctly. The environment was setup as follows:
The output:
I think that NVIDIA drivers are backwards compatible with older CUDA, since I have no problem running these examples or Mozilla's Deepspeech. |
I've got an additional question: the keras example you used does not seems to take advantage of convolutionnal layers. |
Thanks a lot for looking into this issue. :-) Indeed that example doesn't run, so we are getting closer to understanding the issue.
|
Ok. I would also suggest to try to make this work outside of a docker image first. |
Ok, after searching a bit more for the types of errors I'm getting I was able to get the example and inaSpeechSegmenter to work fine, on tensorflow 2.3.0 with cuda 10.1 by adding the following code:
I have no idea why, but the problems seems to be related with the fact that without this code tf tries to get all gpu memory. I'm using the nvidia card as the main graphics card on my system, so it is being used by graphical programs. In any case this solves it for me. Btw, I also get the same errors on tf 2.4 and cuda 11.0, and they are also fixed in the same way. It might be worth mentioning this in the help file, although it is difficult to determine in which cases this change is needed. Another suggestion would be to document which version of tensorflow each version of iss uses. At least placing that info on the readme, so that going back to previous git versions it's possible to know which tf to use. Again, thanks for all the help ! |
Hi, i meet the same problem. i want ask where should the code add. is it add in my code? like:
Looking forward your reply! it's will help me a lot. thank you. |
System information
Expected Behavior
Calling ina_speech_segmenter.py inside the docker container would run.
Current Behavior
does not run:
2021-02-15 18:51:19.309054: E tensorflow/stream_executor/cuda/cuda_dnn.cc:328] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Steps to Reproduce
note:
pip install matplotlib==3.2
is needed otherwise the program does not start.The text was updated successfully, but these errors were encountered: