You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I try to run the CNCeleb recipe, but a RuntimeError appears:
#### Training will run for 6 epochs.
Traceback (most recent call last):
File "/home/ubuntu/kaldi/egs/xmuspeech/sre/subtools/pytorch/libs/training/trainer.py", line 283, in run
loss, acc = self.train_one_batch(batch)
File "/home/ubuntu/kaldi/egs/xmuspeech/sre/subtools/pytorch/libs/training/trainer.py", line 182, in train_one_batch
loss = model.get_loss(model_forward(inputs), targets)
File "/home/ubuntu/kaldi/egs/xmuspeech/sre/subtools/pytorch/libs/support/utils.py", line 157, in wrapper
return function(self, *transformed)
File "/home/ubuntu/kaldi/egs/xmuspeech/sre/exp/SEResnet34_am_train_fbank40/config/resnet-se-xvector.py", line 559, in get_loss
return self.loss(inputs, targets)
File "/home/ubuntu/miniconda3/envs/subtools/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/kaldi/egs/xmuspeech/sre/subtools/pytorch/libs/nnet/loss.py", line 360, in forward
return self.loss_function(outputs/self.t, targets) + self.ring_loss * ring_loss
File "/home/ubuntu/miniconda3/envs/subtools/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/miniconda3/envs/subtools/lib/python3.8/site-packages/torch/nn/modules/loss.py", line 1150, in forward
return F.cross_entropy(input, target, weight=self.weight,
File "/home/ubuntu/miniconda3/envs/subtools/lib/python3.8/site-packages/torch/nn/functional.py", line 2846, in cross_entropy
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
/opt/conda/conda-bld/pytorch_1634272172048/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [0,0,0], thread: [55,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
That means the num_speakers output of FC classifier is less than the label.
And I find the num_targets in exp/egs/train_sequential/info is 2687, while the max label in train.egs.csv is 2711.
So could you please tell me which script generates the exp/egs/train_sequential/info/num_targets?
The text was updated successfully, but these errors were encountered:
Thanks to syousen, for offline egs, get_chunk_egs() in subtools/pytorch/pipeline/onestep/get_chunk_egs.py will generate num_targets file.
I find after the train, val set split, the num_spkrs of the train set is reduced because some speakers will be removed since their utterances are all split to the val set.
While it has no influence on the utt2spk label utt2spk_int, because it's not updated in filter() of the KaldiDataset in subtools/pytorch/libs/egs/kaldi_dataset.py. Only attributes belong to 'utt_first_files' and 'spk_first_files' are changed in filter() function.
So I recommend in subtools/pytorch/pipeline/onestep/get_chunk_egs.py, use dataset.num_spks instead of trainset.num_spks to generate the */info/num_targets file.
Hi,
I try to run the CNCeleb recipe, but a RuntimeError appears:
That means the num_speakers output of FC classifier is less than the label.
And I find the num_targets in
exp/egs/train_sequential/info
is 2687, while the max label in train.egs.csv is 2711.So could you please tell me which script generates the
exp/egs/train_sequential/info/num_targets
?The text was updated successfully, but these errors were encountered: