CUDA out of memory in decoding #66

cdxie · 2021-10-03T17:03:00Z

Hi, I am newer to learn icefall,I finished the training of tdnn_lstm_ctc, when run the decode steps, I meet the following error, I change the --max-duration, there are still errors:

2021-10-04 00:42:07,942 INFO [decode.py:383] Decoding started
2021-10-04 00:42:07,942 INFO [decode.py:384] {'exp_dir': PosixPath('tdnn_lstm_ctc/exp'), 'lang_dir': PosixPath('data/lang_phone'), 'lm_dir': PosixPath('data/lm'), 'feature_dim': 80, 'subsampling_factor': 3, 'search_beam': 20, 'output_beam': 5, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'epoch': 19, 'avg': 5, 'method': 'whole-lattice-rescoring', 'num_paths': 100, 'lattice_score_scale': 0.5, 'export': False, 'full_libri': True, 'feature_dir': PosixPath('data/fbank'), 'max_duration': 50, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num_workers': 2}
2021-10-04 00:42:08,361 INFO [lexicon.py:113] Loading pre-compiled data/lang_phone/Linv.pt
2021-10-04 00:42:08,614 INFO [decode.py:393] device: cuda:0
2021-10-04 00:42:23,560 INFO [decode.py:406] Loading G_4_gram.fst.txt
2021-10-04 00:42:23,560 WARNING [decode.py:407] It may take 8 minutes.
Traceback (most recent call last):
File "./tdnn_lstm_ctc/decode.py", line 492, in
main()
File "/opt/conda/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
return func(*args, **kwargs)
File "./tdnn_lstm_ctc/decode.py", line 420, in main
G = k2.arc_sort(G)
File "/opt/conda/lib/python3.8/site-packages/k2-1.8.dev20210918+cuda11.0.torch1.7.1-py3.8-linux-x86_64.egg/k2/fsa_algo.py", line 441, in arc_sort
ragged_arc, arc_map = _k2.arc_sort(fsa.arcs, need_arc_map=need_arc_map)
RuntimeError: CUDA out of memory. Tried to allocate 884.00 MiB (GPU 0; 15.78 GiB total capacity; 14.28 GiB already allocated; 461.19 MiB free; 14.29 GiB reserved in total by PyTorch)

would you give me some advice？thanks

cdxie · 2021-10-03T17:13:44Z

Another question：we also have machine cluster, but the machine can not set device number
so, should I mask the following lines of code in the decode.py:
if torch.cuda.is_available():
device = torch.device("cuda", 0)

csukuangfj · 2021-10-03T23:33:09Z

I meet the following error, I change the --max-duration, there are still errors:

There are several things you can do:

(1) Change to a GPU with a larger RAM, i.e., 32 GB.
(2) Use a decoding method that does not involve an LM, i.e., use --method 1best
(3) Change

icefall/egs/librispeech/ASR/tdnn_lstm_ctc/decode.py

Lines 423 to 424 in adb068e

    
           G = k2.Fsa.from_fsas([G]).to(device) 
        
           G = k2.arc_sort(G)

to

 G = k2.arc_sort(G) 
 G = k2.Fsa.from_fsas([G]).to(device)

I assume it will not cause OOM errors in later decoding steps.
(4) Prune your G. You can use the script from kaldi-asr/kaldi#4594 to prune your G.
(Note: It is a single python script, having no dependencies on Kaldi).

csukuangfj · 2021-10-03T23:35:01Z

should I mask the following lines of code in the decode.py:
if torch.cuda.is_available():
device = torch.device("cuda", 0)

Can you use device = torch.device("cuda") to select your default cuda device.

If you use CPU, it is going to be slow when you decode.

cdxie · 2021-10-04T16:29:00Z

I meet the following error, I change the --max-duration, there are still errors:

There are several things you can do:

(1) Change to a GPU with a larger RAM, i.e., 32 GB. (2) Use a decoding method that does not involve an LM, i.e., use --method 1best (3) Change

icefall/egs/librispeech/ASR/tdnn_lstm_ctc/decode.py

Lines 423 to 424 in adb068e

G = k2.Fsa.from_fsas([G]).to(device)

G = k2.arc_sort(G)

to
 G = k2.arc_sort(G) 
 G = k2.Fsa.from_fsas([G]).to(device) 
I assume it will not cause OOM errors in later decoding steps. (4) Prune your G. You can use the script from kaldi-asr/kaldi#4594 to prune your G. (Note: It is a single python script, having no dependencies on Kaldi).

I try the (3) method, there are still errors:

2021-10-05 00:00:07,427 INFO [decode.py:387] Decoding started
2021-10-05 00:00:07,427 INFO [decode.py:388] {'exp_dir': PosixPath('tdnn_lstm_ctc/exp'), 'lang_dir': PosixPath('data/lang_phone'), 'lm_dir': PosixPath('data/lm'), 'feature_dim': 80, 'subsampling_factor': 3, 'search_beam': 20, 'output_beam': 5, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'epoch': 19, 'avg': 5, 'method': 'whole-lattice-rescoring', 'num_paths': 100, 'nbest_scale': 0.5, 'export': False, 'full_libri': True, 'feature_dir': PosixPath('data/fbank'), 'max_duration': 100, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num_workers': 2}
2021-10-05 00:00:07,947 INFO [lexicon.py:113] Loading pre-compiled data/lang_phone/Linv.pt
2021-10-05 00:00:08,310 INFO [decode.py:397] device: cuda
2021-10-05 00:00:46,069 INFO [decode.py:410] Loading G_4_gram.fst.txt
2021-10-05 00:00:46,070 WARNING [decode.py:411] It may take 8 minutes.
Traceback (most recent call last):
File "./tdnn_lstm_ctc/decode.py", line 497, in
main()
File "/opt/conda/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
return func(*args, **kwargs)
File "./tdnn_lstm_ctc/decode.py", line 435, in main
G = k2.add_epsilon_self_loops(G)
File "/opt/conda/lib/python3.8/site-packages/k2-1.8.dev20210918+cuda11.0.torch1.7.1-py3.8-linux-x86_64.egg/k2/fsa_algo.py", line 499, in add_epsilon_self_loops
ragged_arc, arc_map = _k2.add_epsilon_self_loops(fsa.arcs,
RuntimeError: CUDA out of memory. Tried to allocate 4.73 GiB (GPU 0; 15.78 GiB total capacity; 9.21 GiB already allocated; 3.90 GiB free; 10.85 GiB reserved in total by PyTorch)

I think I should try the (1)

cdxie closed this as completed Oct 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA out of memory in decoding #66

CUDA out of memory in decoding #66

cdxie commented Oct 3, 2021

cdxie commented Oct 3, 2021

csukuangfj commented Oct 3, 2021

csukuangfj commented Oct 3, 2021

cdxie commented Oct 4, 2021

CUDA out of memory in decoding #66

CUDA out of memory in decoding #66

Comments

cdxie commented Oct 3, 2021

cdxie commented Oct 3, 2021

csukuangfj commented Oct 3, 2021

csukuangfj commented Oct 3, 2021

cdxie commented Oct 4, 2021