Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use BucketingSampler for dev and test data #73

Merged
merged 2 commits into from
Oct 9, 2021

Conversation

pzelasko
Copy link
Collaborator

@pzelasko pzelasko commented Oct 9, 2021

As mentioned in #71 -- I simply hardcoded BucketingSampler in place of SingleCutSampler as I don't see a reason not to use it.

Edit: I checked that it works just on yesno.

@csukuangfj
Copy link
Collaborator

+2

@pzelasko
Copy link
Collaborator Author

pzelasko commented Oct 9, 2021

Do I need to run other tools besides black to make the CI happy?

@csukuangfj
Copy link
Collaborator

Do I need to run other tools besides black to make the CI happy?

Yes,you have to run flake8

@csukuangfj
Copy link
Collaborator

Please see icefall/.github/workflow/style_check.ymal

@pzelasko pzelasko merged commit d54828e into k2-fsa:master Oct 9, 2021
sampler = SingleCutSampler(
cuts_test, max_duration=self.args.max_duration
sampler = BucketingSampler(
cuts_test, max_duration=self.args.max_duration, shuffle=False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guys, we need to give this "shuffle" argument some care, for valid and test.
I observed, after merging master code, very bad valid probs for the attention part of the model (like, 0.5 instead of 0.1).
After a lot of experimentation, I found this 'shuffle' arg to be responsible for at least the majority of this effect.
It seems that what happens is, with shuffle=False, within each bucket the durations vary by much less than with shuffle=True, hence there is less padding. At least that is what seems to happen in my setup. It's possible that the attention model is learning to rely on the very low energy at the end of the utterance, for termination; or something like that. We need to do some experiments with this; we should test whether the shuffle={True,False} arg makes a difference for testing as well, especially for the attention decoder.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that makes sense we can randomly choose to pad from the left or from the right during the training to break the pattern.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, makes sense I think.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #97 and #98.
It is a bug, not related to padding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants