-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added end detection #46
Conversation
src/nets/e2e_asr_attctc_th.py
Outdated
@@ -697,6 +697,27 @@ def recognize(self, h, recog_args): | |||
|
|||
return y_seq | |||
|
|||
# end detection desribed in Eq. (50) of | |||
# S. Watanabe et al "Hybrid CTC/Attention Architecture for End-to-End Speech Recognition" | |||
def end_detect(self, ended_hyps, i, M=3, D_end=np.log(1 * np.exp(-10))): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this function should be placed outside e2e_asr_attctc[_th].py and be shared from them like from e2e_common import end_detection
. Because this function seems to be static (not use self) and be free from numpy/torch operations.
@@ -570,8 +596,11 @@ def recognize_beam(self, h, recog_args, char_list): | |||
|
|||
# preprate sos | |||
y = self.xp.full(1, self.sos, 'i') | |||
# maxlen >= 1 | |||
maxlen = max(1, int(recog_args.maxlenratio * h.shape[0])) | |||
if recog_args.maxlenratio == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add this specialized behavior in argparse help of asr_recog[_th].py . If you think it is worth being default, set it to 0.0 (now default is 0.5)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and I also think argparse.ArgumentParser
should be shared like from e2e_args import asr_recog_parser
as I commented on def end_detect
because chainer/pytorch tools should be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do it for end_detect, but let me make "argparse part" common later.
Actually, asr_recog.py and asr_recog_th.py are almost same, and I'm thinking of using the common asr_recog.py for both, which is more simple for me, but I'm not sure it would be applicable to asr_train.py and asr_train_th.py, and it needs some more consideration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. It may be too much to this PR.
src/nets/e2e_asr_attctc.py
Outdated
@@ -546,6 +551,27 @@ def recognize(self, h, recog_args): | |||
|
|||
return y_seq | |||
|
|||
# end detection desribed in Eq. (50) of | |||
# S. Watanabe et al "Hybrid CTC/Attention Architecture for End-to-End Speech Recognition" | |||
def end_detect(self, ended_hyps, i, M=3, D_end=np.log(1 * np.exp(-10))): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto (see e2e_asr_attctc_th.py)
…enratio behaviour in the end detect case in argparse
@ShigekiKarita Done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Added end detection described in Eq. (50) of S. Watanabe et al "Hybrid CTC/Attention Architecture for End-to-End Speech Recognition,"
If we set maxlenratio 0.0, this automatic detection is triggered.
With WSJ and Voxforge experiments, it is confirmed that the detection works well.