Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using conformer_transformer and rnnt_loss error #38

Closed
zhaoyang9425 opened this issue Jun 22, 2021 · 4 comments · Fixed by #56
Closed

Using conformer_transformer and rnnt_loss error #38

zhaoyang9425 opened this issue Jun 22, 2021 · 4 comments · Fixed by #56
Assignees
Labels
DONE GOOD FIRST ISSUE Good for newcomers

Comments

@zhaoyang9425
Copy link

zhaoyang9425 commented Jun 22, 2021

Environment info

  • Platform: Ubuntu 16.04
  • Python version: 3.7.10
  • PyTorch version (GPU?): PyTorch 1.7.0, CUDA 10.2
  • Using GPU in script?: GeForce RTX 2080 Ti

Information

Model I am using (ListenAttendSpell, Transformer, Conformer ...): conformer_transducer

The problem arises when using:
CUDA_VISIBLE_DEVICES=8 python ./openspeech_cli/hydra_train.py dataset=libri dataset.dataset_path=/data/dataset/Libri/LibriSpeech dataset.dataset_download=False dataset.manifest_file_path=/home/gpu/WorkSpace/Speech/OpenSpeech/dataset/libri_subword_manifest.txt vocab=libri_subword vocab.vocab_size=5000 vocab.vocab_path=/home/gpu/WorkSpace/Speech/OpenSpeech/dataset model=conformer_transducer audio=fbank lr_scheduler=warmup_reduce_lr_on_plateau trainer=gpu trainer.batch_size=4 criterion=transducer

Error executing job with overrides: ['dataset=libri', 'dataset.dataset_path=/data/dataset/Libri/LibriSpeech', 'dataset.dataset_download=False', 'dataset.manifest_file_path=/home/gpu/WorkSpace/Speech/OpenSpeech/dataset/libri_subword_manifest.txt', 'vocab=libri_subword', 'vocab.vocab_size=5000', 'vocab.vocab_path=/home/gpu/WorkSpace/Speech/OpenSpeech/dataset', 'model=conformer_transducer', 'audio=fbank', 'lr_scheduler=warmup_reduce_lr_on_plateau', 'trainer=gpu', 'trainer.batch_size=4', 'criterion=transducer']
Traceback (most recent call last):
  File "./openspeech_cli/hydra_train.py", line 51, in hydra_main
    trainer.fit(model, data_module)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 458, in fit
    self._run(model)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 756, in _run
    self.dispatch()
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 797, in dispatch
    self.accelerator.start_training(self)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 96, in start_training
    self.training_type_plugin.start_training(trainer)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 144, in start_training
    self._results = trainer.run_stage()
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 807, in run_stage
    return self.run_train()
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 869, in run_train
    self.train_loop.run_training_epoch()
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 499, in run_training_epoch
    batch_output = self.run_training_batch(batch, batch_idx, dataloader_idx)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 738, in run_training_batch
    self.optimizer_step(optimizer, opt_idx, batch_idx, train_step_and_backward_closure)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 442, in optimizer_step
    using_lbfgs=is_lbfgs,
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/core/lightning.py", line 1403, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/core/optimizer.py", line 214, in step
    self.__optimizer_step(*args, closure=closure, profiler_name=profiler_name, **kwargs)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/core/optimizer.py", line 134, in __optimizer_step
    trainer.accelerator.optimizer_step(optimizer, self._optimizer_idx, lambda_closure=closure, **kwargs)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 329, in optimizer_step
    self.run_optimizer_step(optimizer, opt_idx, lambda_closure, **kwargs)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 336, in run_optimizer_step
    self.training_type_plugin.optimizer_step(optimizer, lambda_closure=lambda_closure, **kwargs)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 193, in optimizer_step
    optimizer.step(closure=lambda_closure, **kwargs)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
    return func(*args, **kwargs)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/torch/optim/adam.py", line 66, in step
    loss = closure()
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 733, in train_step_and_backward_closure
    split_batch, batch_idx, opt_idx, optimizer, self.trainer.hiddens
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 823, in training_step_and_backward
    result = self.training_step(split_batch, batch_idx, opt_idx, hiddens)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 290, in training_step
    training_step_output = self.trainer.accelerator.training_step(args)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 204, in training_step
    return self.training_type_plugin.training_step(*args)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/dp.py", line 98, in training_step
    return self.model(*args, **kwargs)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 159, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/overrides/data_parallel.py", line 77, in forward
    output = super().forward(*inputs, **kwargs)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/overrides/base.py", line 46, in forward
    output = self.module.training_step(*inputs, **kwargs)
  File "/home/gpu/WorkSpace/Speech/OpenSpeech/openspeech/models/conformer_transducer/model.py", line 110, in training_step
    return super(ConformerTransducerModel, self).training_step(batch, batch_idx)
  File "/home/gpu/WorkSpace/Speech/OpenSpeech/openspeech/models/openspeech_transducer_model.py", line 268, in training_step
    target_lengths=target_lengths,
  File "/home/gpu/WorkSpace/Speech/OpenSpeech/openspeech/models/openspeech_transducer_model.py", line 90, in collect_outputs
    target_lengths=target_lengths.int(),
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/gpu/WorkSpace/Speech/OpenSpeech/openspeech/criterion/transducer/transducer.py", line 96, in forward
    gather=self.gather,
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/warp_rnnt-0.4.0-py3.7-linux-x86_64.egg/warp_rnnt/__init__.py", line 74, in rnnt_loss
    index[:, :, :U-1, 1] = labels.unsqueeze(dim=1)
RuntimeError: The expanded size of the tensor (61) must match the existing size (62) at non-singleton dimension 2.  Target sizes: [4, 355, 61].  Tensor sizes: [4, 1, 62]

The Loss function:

def rnnt_loss(log_probs: torch.FloatTensor,
              labels: torch.IntTensor,
              frames_lengths: torch.IntTensor,
              labels_lengths: torch.IntTensor,
              average_frames: bool = False,
              reduction: Optional[AnyStr] = None,
              blank: int = 0,
              gather: bool = False) -> torch.Tensor:

    The CUDA-Warp RNN-Transducer loss.

    Args:
        log_probs (torch.FloatTensor): Input tensor with shape (N, T, U, V)
            where N is the minibatch size, T is the maximum number of
            input frames, U is the maximum number of output labels and V is
            the vocabulary of labels (including the blank).
        labels (torch.IntTensor): Tensor with shape (N, U-1) representing the
            reference labels for all samples in the minibatch.
        frames_lengths (torch.IntTensor): Tensor with shape (N,) representing the
            number of frames for each sample in the minibatch.
        labels_lengths (torch.IntTensor): Tensor with shape (N,) representing the
            length of the transcription for each sample in the minibatch.
        average_frames (bool, optional): Specifies whether the loss of each
            sample should be divided by its number of frames.
            Default: False.
        reduction (string, optional): Specifies the type of reduction.
            Default: None.
        blank (int, optional): label used to represent the blank symbol.
            Default: 0.
        gather (bool, optional): Reduce memory consumption.
            Default: False.

The log_probs and labels of model output and function requirements are inconsistent

@sooftware sooftware added BUG Something isn't working GOOD FIRST ISSUE Good for newcomers labels Jun 22, 2021
@sooftware
Copy link
Member

@hasangchun

@sooftware
Copy link
Member

@hasangchun How's it going?

@upskyy
Copy link
Member

upskyy commented Jul 11, 2021

I'm really sorry for being late. I will solve it in a short time.

upskyy added a commit that referenced this issue Jul 21, 2021
@upskyy upskyy mentioned this issue Jul 21, 2021
upskyy added a commit that referenced this issue Jul 21, 2021
@upskyy
Copy link
Member

upskyy commented Jul 21, 2021

@zhaoyang9425 I fixed that part. Thanks for recording the issue.

@upskyy upskyy added DONE and removed BUG Something isn't working labels Jul 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DONE GOOD FIRST ISSUE Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants