-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA out of memory in decoding #66
Comments
Another question:we also have machine cluster, but the machine can not set device number |
There are several things you can do: (1) Change to a GPU with a larger RAM, i.e., 32 GB. icefall/egs/librispeech/ASR/tdnn_lstm_ctc/decode.py Lines 423 to 424 in adb068e
to G = k2.arc_sort(G)
G = k2.Fsa.from_fsas([G]).to(device) I assume it will not cause OOM errors in later decoding steps. |
Can you use If you use CPU, it is going to be slow when you decode. |
I try the (3) method, there are still errors: 2021-10-05 00:00:07,427 INFO [decode.py:387] Decoding started I think I should try the (1) |
Hi, I am newer to learn icefall,I finished the training of tdnn_lstm_ctc, when run the decode steps, I meet the following error, I change the --max-duration, there are still errors:
2021-10-04 00:42:07,942 INFO [decode.py:383] Decoding started
2021-10-04 00:42:07,942 INFO [decode.py:384] {'exp_dir': PosixPath('tdnn_lstm_ctc/exp'), 'lang_dir': PosixPath('data/lang_phone'), 'lm_dir': PosixPath('data/lm'), 'feature_dim': 80, 'subsampling_factor': 3, 'search_beam': 20, 'output_beam': 5, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'epoch': 19, 'avg': 5, 'method': 'whole-lattice-rescoring', 'num_paths': 100, 'lattice_score_scale': 0.5, 'export': False, 'full_libri': True, 'feature_dir': PosixPath('data/fbank'), 'max_duration': 50, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num_workers': 2}
2021-10-04 00:42:08,361 INFO [lexicon.py:113] Loading pre-compiled data/lang_phone/Linv.pt
2021-10-04 00:42:08,614 INFO [decode.py:393] device: cuda:0
2021-10-04 00:42:23,560 INFO [decode.py:406] Loading G_4_gram.fst.txt
2021-10-04 00:42:23,560 WARNING [decode.py:407] It may take 8 minutes.
Traceback (most recent call last):
File "./tdnn_lstm_ctc/decode.py", line 492, in
main()
File "/opt/conda/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
return func(*args, **kwargs)
File "./tdnn_lstm_ctc/decode.py", line 420, in main
G = k2.arc_sort(G)
File "/opt/conda/lib/python3.8/site-packages/k2-1.8.dev20210918+cuda11.0.torch1.7.1-py3.8-linux-x86_64.egg/k2/fsa_algo.py", line 441, in arc_sort
ragged_arc, arc_map = _k2.arc_sort(fsa.arcs, need_arc_map=need_arc_map)
RuntimeError: CUDA out of memory. Tried to allocate 884.00 MiB (GPU 0; 15.78 GiB total capacity; 14.28 GiB already allocated; 461.19 MiB free; 14.29 GiB reserved in total by PyTorch)
the device used:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.118.02 Driver Version: 440.118.02 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE... Off | 00000000:3B:00.0 Off | 0 |
| N/A 27C P0 25W / 250W | 12MiB / 16160MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-PCIE... Off | 00000000:D8:00.0 Off | 0 |
| N/A 28C P0 25W / 250W | 12MiB / 16160MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
would you give me some advice?thanks
The text was updated successfully, but these errors were encountered: