-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"CUBLAS_STATUS_NOT_INITIALIZED" returned from 'cublas_copy(GetCublasHandle(), dim_, src.Data(), 1, data_, 1)' #4501
Comments
CUDA 11.2, verified on latest kaldi master. |
Could you check if the cuda device destructor gets called between the two? kaldi/src/cudamatrix/cu-device.cc Line 605 in 5caf2c0
|
I can repro. Looking at this. Thanks |
Sorry, I didn't get to this yesterday -- please let me know if I can be of
any help.
y.
…On Thu, Apr 22, 2021 at 7:34 PM Hugo Braun ***@***.***> wrote:
I can repro. Looking at this. Thanks
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#4501 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACUKYX5DYU6OFMHO3H76YZTTKCXARANCNFSM43I7Y7EA>
.
|
Thanks, it seems to be an exotic bug. I've already spent some time with gdb and nothing obvious shows up. I'm wondering if some previous cublas call goes wrong and the cublas handle somehow reset. I'll continue searching |
@hugovbraun, FWIW, if it helps your analysis, Wondering if a similar call to |
First of all, for those curious, I'm going to be working with Hugo a lot more now from within NVIDIA. Anyway, I found what is likely to be the bug @jtrmal I couldn't exactly reproduce your issue. Instead, I encountered a cuda runtime error (rather than cublas error) at kaldi/src/cudamatrix/cu-matrix.cc Line 253 in e28927f
It became clear fairly quickly that the source of the error was somewhere in the destructor of
And then I learned that cudaFreeHost() was being called on a pointer allocated via cudaMalloc (therefore, on the device) in the destructor of BatchedStaticNnet3. So that's most likely the issue. A fix is on the way. In addition, I found a memory leak in ThreadPoolLight (circular shared_ptr references), but that can wait until later. |
💝
…On Tue, May 4, 2021 at 1:33 PM Daniel Galvez ***@***.***> wrote:
First of all, for those curious, I'm going to be working with Hugo a lot
more now from within NVIDIA. Anyway, I found what is likely to be the bug
@jtrmal <https://github.com/jtrmal>
I couldn't exactly reproduce your issue. Instead, I encountered a cuda
runtime error (rather than cublas error) athttps://
github.com/kaldi-asr/kaldi/blob/e28927fd17b22318e73faf2cf903a7566fa1b724/src/cudamatrix/cu-matrix.cc#L253
This is probably because I am using ivectors in my pipeline, while you are
not.
It became clear fairly quickly that the source of the error was somewhere
in the destructor of BatchedThreadedNnet3CudaPipeline2. The destructor of
a an object runs before the destructor of any of its members, so I learned
as well that one of the members was the problem. I used ltrace to figure
out which cuda call in particular was the cause of the issue:
ltrace -i -l 'libcu*' ./batched-wav-nnet3-cuda2 \
--frame-subsampling-factor=3 \
--config=/home/dgalvez/code/asr/kaldi/egs/aspire/s5/exp/tdnn_7b_chain_online/conf/online.conf \
--max-active=7000 \
--beam=15.0 \
--lattice-beam=6.0 \
--acoustic-scale=1.0 \
--cuda-decoder-copy-threads=2 \
--cuda-worker-threads=2 \
--word-symbol-table=/home/dgalvez/code/asr/kaldi/egs/aspire/s5/exp/tdnn_7b_chain_online/graph_pp/words.txt \
/home/dgalvez/code/asr/kaldi/egs/aspire/s5/exp/chain/tdnn_7b/final.mdl \
/home/dgalvez/code/asr/kaldi/egs/aspire/s5/exp/tdnn_7b_chain_online/graph_pp/HCLG.fst \
scp:wav.scp \
ark,t:-
And then I learned that cudaFreeHost() was being called on a pointer
allocated via cudaMalloc (therefore, on the device) in the destructor of
BatchedStaticNnet3. So that's most likely the issue. A fix is on the way.
In addition, I found a memory leak in ThreadPoolLight (circular shared_ptr
references), but that can wait until later.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4501 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACUKYXYAPO2ZD4L4KDKYNN3TMAVXFANCNFSM43I7Y7EA>
.
|
cudaFreeHost() was called instead of cudaFree() on d_batch_slot_assignment_, which is a pointer to device memory, causing an error. This hadn't been noticd before because people usually destroyed the BatchedThreadedNnet3CudaPipeline2 only when terminating the program. Testing: I manualy applid the change described in kaldi-asr#4501 (comment) No unit test. Additionally, add several defensive CU_SAFE_CALL guards that weren't there before.
cudaFreeHost() was called instead of cudaFree() on d_batch_slot_assignment_, which is a pointer to device memory, causing an error. This hadn't been noticd before because people usually destroyed the BatchedThreadedNnet3CudaPipeline2 only when terminating the program. Testing: I manualy applid the change described in kaldi-asr#4501 (comment) No unit test. Additionally, add several defensive CU_SAFE_CALL guards that weren't there before.
Hi, in a proprietary code I have discovered some issue with the cudadecoder. The repro is quite easy --
in file src/cudadecoderbin/batched-wav-nnet3-cuda2.cc simply replace
by
(i.e. allocate an object and deallocate it and try to alloc it again)
The backtrace bellow. Any suggestion on how to fix that? Imagine allocating the decoder again with different params (wil still crash)
The text was updated successfully, but these errors were encountered: