-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hybrid and Base Decoder Classes #110
Hybrid and Base Decoder Classes #110
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some comments. Request changes w.r.t. some mixup of am_scale and pronunciation_scale.
There are many docstrings missing.
57ff7af
to
f965db6
Compare
e682e19
to
2d65c5e
Compare
elif self.value == 2: | ||
return "cart" | ||
elif self.value == 3: | ||
return "dense" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For dense tying there are three different cases (monophone, diphone and triphone), and one needs to specify also boundary class and word end class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can add other options (in fact all that are available in rasr) to the enum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can add other options
certainly, we can add different dense tyings, but then the word end class and boundary that is specific to the dense tying needs a separate enum class. I will take care of this when I integrate FH. working on it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The options to add are:
monophone-dense
, diphone-dense
, and no-tying-dense
, for now @christophmluscher
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added all options from RASR
@JackTemaki @Marvin84 @vieting @michelwi can you review please? :) |
Looks good, thanks. I will submit later with fh recipes something also for word-end and boundary classes for no-tying-dense. |
I think I would just add nitpicks, not sure if this is helpful right now. I'm fine with merging. |
So for me it would take a longer time to read into the code to review, I just quickly went over it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here I am, the only one not using this code.. All my concerns were addressed so I approve now. Have fun :)
|
||
|
||
@dataclass() | ||
class Tdp: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is true that for Viterbi alignment and decoding we generally use the non-normalized default
tdp values, however, for full-sum training, we generally introduce normalized loop/forward values that might be even estimated from an alignment or defined based on aomse heuristics, e.g. average phoneme length. In addition to this class, we could also introduce an enum
class for TdpType
, out of default
, heuristic
, and alignment-based
. For the two latter types one could also have jobs that estimate values from an alignment or from transcription. Daniel Mann has already such jobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider this as a new PR. Since this extends the functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address also the other comment, before merging the current commit. In my honest opinion, that type of search over the parameters is not correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created two new issues.
Currently you have a lot of flexibility, I rather like that...
|
||
def _get_iter(self): | ||
return [ | ||
(am, lm, pri, pron, tdp, speech, sil, nonspeech, al) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When doing grid search on the decoding parameters for a hybrid model, doing such a cartesian product does not make so much sense. One should first tune model-related scales such as prior scale
and tdp scale
. Then tdp values, and only given the optimal values of the mentioned parameters one tunes the lm and pronunciation scales. Moreover, we should definitely consider the obligatory use of a high altas
and small beam
for the first two steps - only lm scale
should not be tuned together with altas
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also consider that experience shows that the only tdp value that is worth it to tune is the exit penalty
for silence and non-word.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have the option to pass a List of RecogParams or RecogParams with lists where the cartesian product is taken from. We can always change the RecogParam class later on or extend or add more versions.. that is why I tried to put it all in the RecogParamClass
No description provided.