Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hybrid and Base Decoder Classes #110

Merged

Conversation

christophmluscher
Copy link
Contributor

No description provided.

Copy link
Contributor

@michelwi michelwi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some comments. Request changes w.r.t. some mixup of am_scale and pronunciation_scale.

There are many docstrings missing.

common/setups/rasr/base_decoder.py Show resolved Hide resolved
common/setups/rasr/base_decoder.py Show resolved Hide resolved
common/setups/rasr/base_decoder.py Outdated Show resolved Hide resolved
common/setups/rasr/base_decoder.py Show resolved Hide resolved
common/setups/rasr/base_decoder.py Outdated Show resolved Hide resolved
common/setups/rasr/hybrid_decoder.py Outdated Show resolved Hide resolved
common/setups/rasr/util/decode.py Outdated Show resolved Hide resolved
common/setups/rasr/util/decode.py Outdated Show resolved Hide resolved
common/setups/rasr/util/decode.py Show resolved Hide resolved
common/setups/rasr/util/decode.py Show resolved Hide resolved
@christophmluscher christophmluscher force-pushed the hybrid-base-decoder-dev branch 2 times, most recently from 57ff7af to f965db6 Compare March 30, 2023 16:37
elif self.value == 2:
return "cart"
elif self.value == 3:
return "dense"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For dense tying there are three different cases (monophone, diphone and triphone), and one needs to specify also boundary class and word end class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can add other options (in fact all that are available in rasr) to the enum.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can add other options

certainly, we can add different dense tyings, but then the word end class and boundary that is specific to the dense tying needs a separate enum class. I will take care of this when I integrate FH. working on it now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The options to add are:

monophone-dense, diphone-dense, and no-tying-dense, for now @christophmluscher

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added all options from RASR

@christophmluscher
Copy link
Contributor Author

@JackTemaki @Marvin84 @vieting @michelwi can you review please? :)

@Marvin84
Copy link
Contributor

Marvin84 commented Apr 6, 2023

@JackTemaki @Marvin84 @vieting @michelwi can you review please? :)

Looks good, thanks. I will submit later with fh recipes something also for word-end and boundary classes for no-tying-dense.

@vieting
Copy link
Contributor

vieting commented Apr 6, 2023

I think I would just add nitpicks, not sure if this is helpful right now. I'm fine with merging.

@Atticus1806
Copy link
Contributor

So for me it would take a longer time to read into the code to review, I just quickly went over it.
It would be nice (even though I know its tedious) to add documentation where applicable to the parameters. This does not have to be every function, but maybe the major ones (there are even already some empty docstrings)

Copy link
Contributor

@michelwi michelwi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here I am, the only one not using this code.. All my concerns were addressed so I approve now. Have fun :)

@christophmluscher christophmluscher merged commit 5085cae into rwth-i6:main Apr 6, 2023
@christophmluscher christophmluscher deleted the hybrid-base-decoder-dev branch April 6, 2023 12:54


@dataclass()
class Tdp:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is true that for Viterbi alignment and decoding we generally use the non-normalized default tdp values, however, for full-sum training, we generally introduce normalized loop/forward values that might be even estimated from an alignment or defined based on aomse heuristics, e.g. average phoneme length. In addition to this class, we could also introduce an enum class for TdpType, out of default, heuristic, and alignment-based. For the two latter types one could also have jobs that estimate values from an alignment or from transcription. Daniel Mann has already such jobs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider this as a new PR. Since this extends the functionality.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address also the other comment, before merging the current commit. In my honest opinion, that type of search over the parameters is not correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created two new issues.
Currently you have a lot of flexibility, I rather like that...


def _get_iter(self):
return [
(am, lm, pri, pron, tdp, speech, sil, nonspeech, al)
Copy link
Contributor

@Marvin84 Marvin84 Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When doing grid search on the decoding parameters for a hybrid model, doing such a cartesian product does not make so much sense. One should first tune model-related scales such as prior scale and tdp scale. Then tdp values, and only given the optimal values of the mentioned parameters one tunes the lm and pronunciation scales. Moreover, we should definitely consider the obligatory use of a high altas and small beam for the first two steps - only lm scale should not be tuned together with altas.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also consider that experience shows that the only tdp value that is worth it to tune is the exit penalty for silence and non-word.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have the option to pass a List of RecogParams or RecogParams with lists where the cartesian product is taken from. We can always change the RecogParam class later on or extend or add more versions.. that is why I tried to put it all in the RecogParamClass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants