Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MMI training with word pieces as modelling unit. #6

Merged
merged 23 commits into from
Oct 18, 2021

Conversation

csukuangfj
Copy link
Collaborator

Will post the results once they are available.

@csukuangfj csukuangfj changed the title Add MMI training with word pieces as modelling unit. WIP: Add MMI training with word pieces as modelling unit. Aug 7, 2021
@pzelasko
Copy link
Collaborator

IIRC the alignments from alimdl did not help before when we checked them in snowfall; do you expect different results with the current setup?

@danpovey
Copy link
Collaborator

We are thinking they might help training get started, where that's a problem, e.g. for MMI with BPE.

@csukuangfj
Copy link
Collaborator Author

csukuangfj commented Sep 28, 2021

I just started the MMI training with pre-computed alignments.

The tensorboard logs are:


Without attention decoder

It throws the following warnings at some point (after several hundred batches):
Screen Shot 2021-09-28 at 7 42 00 PM

At some other point, it stops printing the above warnings and the MMI loss starts to decrease:
Screen Shot 2021-09-28 at 7 37 26 PM

Screen Shot 2021-09-28 at 7 44 35 PM


You can see that pre-computed alignment is helpful to make the training converge.
(We will see whether it will diverge later)

@csukuangfj
Copy link
Collaborator Author

The best WER I get for this pull request is

Training without attention decoder

(decoding using whole-lattice-rescoring, i.e., HLG 1-best decoding + 4-gram LM rescoring)

  • test-clean: 2.79
  • test-other: 6.39

Training attention decoder

(decoding using attention decoder for rescoring)

  • test-clean: 2.82
  • test-other: 6.67

LF-MMI + attention decoder seems not as good as CTC + attention decoder.
I will do more experiments on it after finishing the decoding script for #54


Let's merge it first since it contains code for integrating framewise alignment information into training, which can
be used by @danpovey

@csukuangfj csukuangfj changed the title WIP: Add MMI training with word pieces as modelling unit. Add MMI training with word pieces as modelling unit. Oct 18, 2021
@csukuangfj csukuangfj merged commit 53b79fa into k2-fsa:master Oct 18, 2021
@@ -0,0 +1,356 @@
# Copyright 2021 Piotr Żelasko
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any change in this file? Was it supposed to be a symlink like the other asr_datamodule.py in conformer_ctc?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the same as the one in conformer_ctc and tdnn_lstm. I should have placed a symlink here.

@@ -142,69 +205,66 @@ def tokens(self) -> List[int]:
return ans


class BpeLexicon(Lexicon):
class UniqLexicon(Lexicon):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it named UniqLexicon? Not sure how to interpret it.

Copy link
Collaborator Author

@csukuangfj csukuangfj Oct 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uniq here means each word in the lexicon has only one pronunciation, i.e., a unique pronunciation.

In BPE based lexicons, each word can be decomposed in a deterministic way.
In phone based lexicons, if a word has more than one pronunciation, there are scripts to keep only the first one.

func = _compute_mmi_loss_pruned
else:
func = _compute_mmi_loss_exact_non_optimized
# func = _compute_mmi_loss_exact_optimized
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended to be commented out?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the non_optimized version is easier to understand and consumes less memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants