-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving OWSM inference interface #5618
Conversation
for more information, see https://pre-commit.ci
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #5618 +/- ##
===========================================
+ Coverage 24.25% 76.06% +51.81%
===========================================
Files 720 735 +15
Lines 66641 68696 +2055
===========================================
+ Hits 16161 52255 +36094
+ Misses 50480 16441 -34039
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Hi, can we considering merging this soon? I will provide example usage in the project page based on this new interface. As reported previously, we can significantly improve the long-form ASR performance with some rules and heuristics |
Yes, but currently, CI is broken. |
@sw005320 The CI has passed the previous issue. Should be good to go ahead |
Cool! @pyf98, please deal with the CI error |
Thanks! It seems that some TTS tests failed |
@jctian98, this is a reminder. |
espnet2/bin/s2t_inference.py
Outdated
@@ -44,6 +43,105 @@ | |||
] | |||
|
|||
|
|||
class ScoreFilter(BatchScorerInterface, torch.nn.Module): | |||
"""Filter scores based on pre-defined rules.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add more explanations here or in other places about what kind of pre-defined rules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some more comments below
The code is very clear. I'm ok with it. A simple discussion for future development: |
Thanks @jctian98 for the comment. In the previous version of
I think we can keep this separate design for now. |
One more note is that the order of tokens is: The language token is between In terms of the implementation, we have to set both |
Thanks, @pyf98! |
What?
preprocessor_conf
. We no longer need to provide them manually.lang_sym
,task_sym
andpredict_time
can be passed as additional arguments when calling__call__
, which will overwrite the default values in__init__
. This is more convenient to use.s2t_inference_language
.BeamSearch
is also removed. Only a single decoder forward step is performed. And it can return an N-best list of language and (normalized) probability.scorer
that assigns certain tokens with a-inf
score duringsearch
. With this modification, it can now predict the first timestamp by itself. Previously, the first timestamp has to be manually set, which is usually inaccurate.