Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error during training librispeech model, in multi-gpu mode #157

Open
dh7ahn opened this issue Sep 29, 2020 · 1 comment
Open

Error during training librispeech model, in multi-gpu mode #157

dh7ahn opened this issue Sep 29, 2020 · 1 comment

Comments

@dh7ahn
Copy link

dh7ahn commented Sep 29, 2020

I've got the following error message from stage 3 (training lm) during training librispeech model in the example directory.
AttributeError: 'numpy.ndarray' object has no attribute 'items'
Checking the corresponding code in data_parallel.py, the variable 'input' is found to be of type 'ndarray', but the code tries to call 'input.items()' though 'input' is not a dictionary type. Is there anything wrong? or how can I fix it?
This error happens when I training with multi gpus. Single-gpu training seems to work without errors.

============================================================================
LM Training stage (stage:3)

0%| | 0/11000320 [00:00<?, ?it/s]Original utterance num: 281241
Removed 0 utterances (threshold)
Original utterance num: 2703
Removed 0 utterances (threshold)
Original utterance num: 2864
Removed 0 utterances (threshold)
Original utterance num: 2620
Removed 0 utterances (threshold)
Original utterance num: 2939
Removed 0 utterances (threshold)
Traceback (most recent call last):
File "/home/ahn/pkgs/neural_sp/examples/librispeech/s5/../../../neural_sp/bin/lm/train.py", line 340, in
save_path = pr.runcall(main)
File "/home/ahn/pkgs/neural_sp/tools/miniconda/lib/python3.7/cProfile.py", line 121, in runcall
return func(*args, **kw)
File "/home/ahn/pkgs/neural_sp/examples/librispeech/s5/../../../neural_sp/bin/lm/train.py", line 214, in main
loss, hidden, observation = model(ys_train, state=hidden)
File "/home/ahn/pkgs/neural_sp/tools/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ahn/pkgs/neural_sp/tools/miniconda/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 151, in forward
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)
File "/scl_group_shared/ahn/pkgs/neural_sp/neural_sp/models/data_parallel.py", line 49, in scatter
res = [{k: scatter_map(v, i) for k, v in inputs.items()} for i in range(len(self.device_ids))]
File "/scl_group_shared/ahn/pkgs/neural_sp/neural_sp/models/data_parallel.py", line 49, in
res = [{k: scatter_map(v, i) for k, v in inputs.items()} for i in range(len(self.device_ids))]
AttributeError: 'numpy.ndarray' object has no attribute 'items'
0%| | 0/11000320 [00:00<?, ?it/s]

@hirofumi0810
Copy link
Owner

I recommend using a single GPU for LM training though I will fix the bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants