-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transformer Chainer #774
Transformer Chainer #774
Conversation
I wish you also fix #755 in chainer |
I just added the fixes for most of the problem with pytorch backend. Currently, I am training a LM for testing the joint decoding and finish with this PR. |
BTW, the CER/WER implemented is a based in greedy search. I am currently using this due to the large number of epochs employed in the transformer. Let me know if this is Ok, or should be a beam_search similar to that implemented in the RNN decoder. |
CTC: I think this is fine. |
@Fhrozen, can we make the CER computation part as a function, put it on some common directory, and call it at both transformer/RNN in both chainer/pytorch backend? |
@ShigekiKarita I just finished with the requested tests:
I am using a model trained for 68 epochs ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your work.
I add some comments.
When w/o ctc ctc_weight=0.0, lm_weight=1.0:
, the CER result is bad?
| Corr Sub Del Ins Err S.Err |
| 6.5 1.1 92.4 0.0 93.5 99.1 |
if self.flag_return: | ||
loss_ctc = None | ||
return self.loss, loss_ctc, loss_att, acc | ||
else: | ||
return self.loss | ||
|
||
def recognize(self, x_block, recog_args, char_list=None, rnnlm=None): | ||
def recognize_beam2(self, x_block, recog_args, char_list=None, rnnlm=None, use_jit=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this method is necessary?
I think this PR does not use recognize_beam2
in other codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left this on purpose. Just in case the recognize_beam didnot work, but I will be removing before merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thank you.
@@ -78,37 +78,48 @@ class CustomUpdater(training.StandardUpdater): | |||
def __init__(self, train_iter, optimizer, converter, device, accum_grad=1): | |||
super(CustomUpdater, self).__init__( | |||
train_iter, optimizer, converter=converter, device=device) | |||
self.count = 0 | |||
self.forward_count = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel changing from count
to forward_count
seems to have side effects.
Do you want to fix the code of accum_grad
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not find any side effects on the accum_grad with forward_count but I will check it once more. Could you explain me which possible effect appear.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry. This is my concern.
I think CustomUpdater
andCustomParallelUpdater
are common components.
Is this modification necessary for the Transformer PR?
Actually, I'm not sure why this modification is necessary. (This is just comment.)
If this modification for CustomUpdater
andCustomParallelUpdater
is not related to Transformer method, I feel you can separate PR into different PRs.
But I'm not the main contributor of ESPNET. so I'm not sure we should separate PR or not.
@aonotas thank you for your support and comments. I will be reflecting the modifications later. |
@aonotas Thanks for your help!
Unfortunately, this is expected. This strange behaviour is already known in pytorch impl Transformer. We found LM integration without CTC seems to be difficult in WSJ. |
Wow, this is interesting. Thank you for your information. |
@creatorscan may fix it. |
I just finished to test the chainer model with
The result did not change alot, only the dev has a slightly reduction CER 0.2 and WER 0.1. |
Codecov Report
@@ Coverage Diff @@
## v.0.5.0 #774 +/- ##
===========================================
+ Coverage 51.07% 51.07% +<.01%
===========================================
Files 102 110 +8
Lines 10957 11133 +176
===========================================
+ Hits 5596 5686 +90
- Misses 5361 5447 +86
Continue to review full report at Codecov.
|
@Fhrozen, what is the status? |
Only Need to add CER computation to RNN for finishing this PR. |
OK. Enjoy IJCNN! |
@sw005320 @kan-bayashi , pls check it for merge before someone else updates v.0.5. ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you add a test for this PR?
def update(self): | ||
self.update_core() | ||
if self.forward_count == 0: | ||
self.iteration += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you do it here?
espnet/espnet/asr/pytorch_backend/asr.py
Line 162 in 53ca7b9
self.iteration += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was related to #777, I suppose I need to add it as comment
This is the second part of the updates for the transformer with Chainer:
Common CER computation for RNN - Pytorch/Chainer(Leave for another PR)Format call inside chainer transformer(Leave for another PR)