Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA out of memory in decoding #66

Closed
cdxie opened this issue Oct 3, 2021 · 4 comments
Closed

CUDA out of memory in decoding #66

cdxie opened this issue Oct 3, 2021 · 4 comments

Comments

@cdxie
Copy link
Contributor

cdxie commented Oct 3, 2021

Hi, I am newer to learn icefall,I finished the training of tdnn_lstm_ctc, when run the decode steps, I meet the following error, I change the --max-duration, there are still errors:

2021-10-04 00:42:07,942 INFO [decode.py:383] Decoding started
2021-10-04 00:42:07,942 INFO [decode.py:384] {'exp_dir': PosixPath('tdnn_lstm_ctc/exp'), 'lang_dir': PosixPath('data/lang_phone'), 'lm_dir': PosixPath('data/lm'), 'feature_dim': 80, 'subsampling_factor': 3, 'search_beam': 20, 'output_beam': 5, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'epoch': 19, 'avg': 5, 'method': 'whole-lattice-rescoring', 'num_paths': 100, 'lattice_score_scale': 0.5, 'export': False, 'full_libri': True, 'feature_dir': PosixPath('data/fbank'), 'max_duration': 50, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num_workers': 2}
2021-10-04 00:42:08,361 INFO [lexicon.py:113] Loading pre-compiled data/lang_phone/Linv.pt
2021-10-04 00:42:08,614 INFO [decode.py:393] device: cuda:0
2021-10-04 00:42:23,560 INFO [decode.py:406] Loading G_4_gram.fst.txt
2021-10-04 00:42:23,560 WARNING [decode.py:407] It may take 8 minutes.
Traceback (most recent call last):
File "./tdnn_lstm_ctc/decode.py", line 492, in
main()
File "/opt/conda/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
return func(*args, **kwargs)
File "./tdnn_lstm_ctc/decode.py", line 420, in main
G = k2.arc_sort(G)
File "/opt/conda/lib/python3.8/site-packages/k2-1.8.dev20210918+cuda11.0.torch1.7.1-py3.8-linux-x86_64.egg/k2/fsa_algo.py", line 441, in arc_sort
ragged_arc, arc_map = _k2.arc_sort(fsa.arcs, need_arc_map=need_arc_map)
RuntimeError: CUDA out of memory. Tried to allocate 884.00 MiB (GPU 0; 15.78 GiB total capacity; 14.28 GiB already allocated; 461.19 MiB free; 14.29 GiB reserved in total by PyTorch)

the device used:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.118.02 Driver Version: 440.118.02 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE... Off | 00000000:3B:00.0 Off | 0 |
| N/A 27C P0 25W / 250W | 12MiB / 16160MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-PCIE... Off | 00000000:D8:00.0 Off | 0 |
| N/A 28C P0 25W / 250W | 12MiB / 16160MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

would you give me some advice?thanks

@cdxie
Copy link
Contributor Author

cdxie commented Oct 3, 2021

Another question:we also have machine cluster, but the machine can not set device number
so, should I mask the following lines of code in the decode.py:
if torch.cuda.is_available():
device = torch.device("cuda", 0)

@csukuangfj
Copy link
Collaborator

I meet the following error, I change the --max-duration, there are still errors:

There are several things you can do:

(1) Change to a GPU with a larger RAM, i.e., 32 GB.
(2) Use a decoding method that does not involve an LM, i.e., use --method 1best
(3) Change

G = k2.Fsa.from_fsas([G]).to(device)
G = k2.arc_sort(G)

to

 G = k2.arc_sort(G) 
 G = k2.Fsa.from_fsas([G]).to(device) 

I assume it will not cause OOM errors in later decoding steps.
(4) Prune your G. You can use the script from kaldi-asr/kaldi#4594 to prune your G.
(Note: It is a single python script, having no dependencies on Kaldi).

@csukuangfj
Copy link
Collaborator

should I mask the following lines of code in the decode.py:
if torch.cuda.is_available():
device = torch.device("cuda", 0)

Can you use device = torch.device("cuda") to select your default cuda device.

If you use CPU, it is going to be slow when you decode.

@cdxie
Copy link
Contributor Author

cdxie commented Oct 4, 2021

I meet the following error, I change the --max-duration, there are still errors:

There are several things you can do:

(1) Change to a GPU with a larger RAM, i.e., 32 GB. (2) Use a decoding method that does not involve an LM, i.e., use --method 1best (3) Change

G = k2.Fsa.from_fsas([G]).to(device)
G = k2.arc_sort(G)

to

 G = k2.arc_sort(G) 
 G = k2.Fsa.from_fsas([G]).to(device) 

I assume it will not cause OOM errors in later decoding steps. (4) Prune your G. You can use the script from kaldi-asr/kaldi#4594 to prune your G. (Note: It is a single python script, having no dependencies on Kaldi).

I try the (3) method, there are still errors:

2021-10-05 00:00:07,427 INFO [decode.py:387] Decoding started
2021-10-05 00:00:07,427 INFO [decode.py:388] {'exp_dir': PosixPath('tdnn_lstm_ctc/exp'), 'lang_dir': PosixPath('data/lang_phone'), 'lm_dir': PosixPath('data/lm'), 'feature_dim': 80, 'subsampling_factor': 3, 'search_beam': 20, 'output_beam': 5, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'epoch': 19, 'avg': 5, 'method': 'whole-lattice-rescoring', 'num_paths': 100, 'nbest_scale': 0.5, 'export': False, 'full_libri': True, 'feature_dir': PosixPath('data/fbank'), 'max_duration': 100, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num_workers': 2}
2021-10-05 00:00:07,947 INFO [lexicon.py:113] Loading pre-compiled data/lang_phone/Linv.pt
2021-10-05 00:00:08,310 INFO [decode.py:397] device: cuda
2021-10-05 00:00:46,069 INFO [decode.py:410] Loading G_4_gram.fst.txt
2021-10-05 00:00:46,070 WARNING [decode.py:411] It may take 8 minutes.
Traceback (most recent call last):
File "./tdnn_lstm_ctc/decode.py", line 497, in
main()
File "/opt/conda/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
return func(*args, **kwargs)
File "./tdnn_lstm_ctc/decode.py", line 435, in main
G = k2.add_epsilon_self_loops(G)
File "/opt/conda/lib/python3.8/site-packages/k2-1.8.dev20210918+cuda11.0.torch1.7.1-py3.8-linux-x86_64.egg/k2/fsa_algo.py", line 499, in add_epsilon_self_loops
ragged_arc, arc_map = _k2.add_epsilon_self_loops(fsa.arcs,
RuntimeError: CUDA out of memory. Tried to allocate 4.73 GiB (GPU 0; 15.78 GiB total capacity; 9.21 GiB already allocated; 3.90 GiB free; 10.85 GiB reserved in total by PyTorch)

I think I should try the (1)

@cdxie cdxie closed this as completed Oct 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants