Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added end detection #46

Merged
merged 5 commits into from Dec 28, 2017
Merged

added end detection #46

merged 5 commits into from Dec 28, 2017

Conversation

sw005320
Copy link
Contributor

@sw005320 sw005320 commented Dec 27, 2017

Added end detection described in Eq. (50) of S. Watanabe et al "Hybrid CTC/Attention Architecture for End-to-End Speech Recognition,"
If we set maxlenratio 0.0, this automatic detection is triggered.
With WSJ and Voxforge experiments, it is confirmed that the detection works well.

@sw005320 sw005320 mentioned this pull request Dec 27, 2017
14 tasks
@sw005320 sw005320 changed the title added endpoint detection added end detection Dec 27, 2017
@@ -697,6 +697,27 @@ def recognize(self, h, recog_args):

return y_seq

# end detection desribed in Eq. (50) of
# S. Watanabe et al "Hybrid CTC/Attention Architecture for End-to-End Speech Recognition"
def end_detect(self, ended_hyps, i, M=3, D_end=np.log(1 * np.exp(-10))):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function should be placed outside e2e_asr_attctc[_th].py and be shared from them like from e2e_common import end_detection. Because this function seems to be static (not use self) and be free from numpy/torch operations.

@@ -570,8 +596,11 @@ def recognize_beam(self, h, recog_args, char_list):

# preprate sos
y = self.xp.full(1, self.sos, 'i')
# maxlen >= 1
maxlen = max(1, int(recog_args.maxlenratio * h.shape[0]))
if recog_args.maxlenratio == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add this specialized behavior in argparse help of asr_recog[_th].py . If you think it is worth being default, set it to 0.0 (now default is 0.5)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and I also think argparse.ArgumentParser should be shared like from e2e_args import asr_recog_parser as I commented on def end_detect because chainer/pytorch tools should be consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do it for end_detect, but let me make "argparse part" common later.
Actually, asr_recog.py and asr_recog_th.py are almost same, and I'm thinking of using the common asr_recog.py for both, which is more simple for me, but I'm not sure it would be applicable to asr_train.py and asr_train_th.py, and it needs some more consideration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. It may be too much to this PR.

@@ -546,6 +551,27 @@ def recognize(self, h, recog_args):

return y_seq

# end detection desribed in Eq. (50) of
# S. Watanabe et al "Hybrid CTC/Attention Architecture for End-to-End Speech Recognition"
def end_detect(self, ended_hyps, i, M=3, D_end=np.log(1 * np.exp(-10))):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto (see e2e_asr_attctc_th.py)

@sw005320
Copy link
Contributor Author

@ShigekiKarita Done!

Copy link
Member

@ShigekiKarita ShigekiKarita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants