Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not run training program with cuda 10.2 #220

Closed
t13m opened this issue Jan 23, 2020 · 3 comments
Closed

Can not run training program with cuda 10.2 #220

t13m opened this issue Jan 23, 2020 · 3 comments

Comments

@t13m
Copy link

t13m commented Jan 23, 2020

Hi, I was trying to run eesen in nvidia's docker container, and failed.

The container has cuda 10.2 in it. Eesen can be compiled, but when invoking "train-ctc-parallel", it crash with following logs:

LOG (train-ctc-parallel:DisableCaching():cuda-device.cc:731) Disabling caching of GPU memory.
LOG (train-ctc-parallel:SetUpdateAlgorithm():net.cc:483) Selecting SGD with momentum as optimization algorithm.
LOG (train-ctc-parallel:SetTrainMode():net.cc:408) Setting TrainMode for layer 0
LOG (train-ctc-parallel:SetTrainMode():net.cc:408) Setting TrainMode for layer 1
LOG (train-ctc-parallel:SetTrainMode():net.cc:408) Setting TrainMode for layer 2
LOG (train-ctc-parallel:SetTrainMode():net.cc:408) Setting TrainMode for layer 3
LOG (train-ctc-parallel:SetTrainMode():net.cc:408) Setting TrainMode for layer 4
add-deltas ark:- ark:-
copy-feats scp:exp/train_char_l5_c320/train_local.scp ark:-
LOG (train-ctc-parallel:main():train-ctc-parallel.cc:133) TRAINING STARTED
ERROR (train-ctc-parallel:AddVecToRows():cuda-matrix.cc:541) cudaError_t 209 : "no kernel image is available for execution on the device" returned from 'cudaGetLastError()'
WARNING (train-ctc-parallel:Close():kaldi-io.cc:446) Pipe gunzip -c exp/train_char_l5_c320/labels.tr.gz| had nonzero return status 13
WARNING (train-ctc-parallel:Close():kaldi-io.cc:446) Pipe copy-feats scp:exp/train_char_l5_c320/train_local.scp ark:- | add-deltas ark:- ark:- | had nonzero return status 36096
ERROR (train-ctc-parallel:AddVecToRows():cuda-matrix.cc:541) cudaError_t 209 : "no kernel image is available for execution on the device" returned from 'cudaGetLastError()'
[stack trace: ]
eesen::KaldiGetStackTraceabi:cxx11
eesen::KaldiErrorMessage::~KaldiErrorMessage()
eesen::CuMatrixBase::AddVecToRows(float, eesen::CuVectorBase const&, float)
eesen::BiLstmParallel::PropagateFncVanillaPassForward(eesen::CuMatrixBase const&, int, int)
eesen::BiLstmParallel::PropagateFnc(eesen::CuMatrixBase const&, eesen::CuMatrixBase)
eesen::Layer::Propagate(eesen::CuMatrixBase const&, eesen::CuMatrix
)
eesen::Net::Propagate(eesen::CuMatrixBase const&, eesen::CuMatrix*)
train-ctc-parallel(main+0x148d) [0x5583f00fe692]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7f385afb9b97]
train-ctc-parallel(_start+0x2a) [0x5583f00fb44a]

Is there any workaround about this? I don't know much about cuda, I tried to add "-gencode arch=compute_{70,72,75},code={70,72,75}" to gpucompute/Makefile but it still crash.

@t13m t13m closed this as completed Jan 30, 2020
@liyongze
Copy link

could you tell me how did you fix this problem? I met the same problem.

@t13m
Copy link
Author

t13m commented Aug 21, 2021

Hi, I didn't manage to make it work. My experiments were conducted on cpu.

@liyongze
Copy link

liyongze commented Aug 23, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants