-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use BucketingSampler for dev and test data #73
Conversation
+2 |
Do I need to run other tools besides black to make the CI happy? |
Yes,you have to run flake8 |
Please see icefall/.github/workflow/style_check.ymal |
sampler = SingleCutSampler( | ||
cuts_test, max_duration=self.args.max_duration | ||
sampler = BucketingSampler( | ||
cuts_test, max_duration=self.args.max_duration, shuffle=False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guys, we need to give this "shuffle" argument some care, for valid and test.
I observed, after merging master code, very bad valid probs for the attention part of the model (like, 0.5 instead of 0.1).
After a lot of experimentation, I found this 'shuffle' arg to be responsible for at least the majority of this effect.
It seems that what happens is, with shuffle=False, within each bucket the durations vary by much less than with shuffle=True, hence there is less padding. At least that is what seems to happen in my setup. It's possible that the attention model is learning to rely on the very low energy at the end of the utterance, for termination; or something like that. We need to do some experiments with this; we should test whether the shuffle={True,False} arg makes a difference for testing as well, especially for the attention decoder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that makes sense we can randomly choose to pad from the left or from the right during the training to break the pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, makes sense I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in #71 -- I simply hardcoded BucketingSampler in place of SingleCutSampler as I don't see a reason not to use it.
Edit: I checked that it works just on yesno.