Transformer Chainer #774

Fhrozen · 2019-05-29T09:36:20Z

This is the second part of the updates for the transformer with Chainer:

Reflect pytorch code
Joint CTC/att training
Support Beam search
Support LM integration
Add CER for Transformer Chainer (cer_ctc is not working for the transformer #755)
Fix CER for Transformer pytorch (cer_ctc is not working for the transformer #755)
Fix accum grad #iterations with accum_grad > 1 #777
Mismatch dim in Conv2D sub (Pytorch) Output dimension of Conv2dSubsampling #822
Common CER computation for Transformer - Pytorch/Chainer
~~Common CER computation for RNN - Pytorch/Chainer~~ (Leave for another PR)
~~Format call inside chainer transformer~~ (Leave for another PR)

ShigekiKarita · 2019-05-29T16:36:02Z

I wish you also fix #755 in chainer

Fhrozen · 2019-06-11T11:16:52Z

I just added the fixes for most of the problem with pytorch backend.
Also trained chainer backend with mtl_alpha 0.3 without problem.
I set the patience to 10 and the model was trained for 72 epochs.

Currently, I am training a LM for testing the joint decoding and finish with this PR.
BTW, the train.yml file is modified with a ln command inside the run.sh, but it will be better to delete and directly call them from the run.sh. let me know about this.

Fhrozen · 2019-06-11T11:24:14Z

BTW, the CER/WER implemented is a based in greedy search. I am currently using this due to the large number of epochs employed in the transformer. Let me know if this is Ok, or should be a beam_search similar to that implemented in the RNN decoder.

sw005320 · 2019-06-11T11:38:19Z

CTC: I think this is fine.
Attention: Making transformer beam search work on GPU requires additional work, and this is still fine, but people may confuse it. We may not need it for now.

egs/wsj/asr1/conf/tuning/train_pytorch_transformer.yaml

espnet/nets/pytorch_backend/e2e_asr_transformer.py

test/test_e2e_transformer.py

sw005320 · 2019-06-11T13:41:00Z

@Fhrozen, can we make the CER computation part as a function, put it on some common directory, and call it at both transformer/RNN in both chainer/pytorch backend?

Fhrozen · 2019-06-18T23:59:34Z

@ShigekiKarita I just finished with the requested tests:

w/o ctc w/o lm ctc_weight=0.0, lm_weight=0.0:

write a CER (or TER) result in exp/train_si284_chainer_train_chainer_transformer_no_preprocess/decode_test_eval92_decode_chainer_transf_noctc_nolm_lm_word65000/result.txt                                  
|     SPKR       |     # Snt          # Wrd     |     Corr           Sub            Del            Ins            Err          S.Err     |                                                                  
|     Sum/Avg    |      333           33341     |     94.1           1.4            4.5            1.1            7.0           77.8     |                                                                  
write a WER result in exp/train_si284_chainer_train_chainer_transformer_no_preprocess/decode_test_eval92_decode_chainer_transf_noctc_nolm_lm_word65000/result.wrd.txt                                       
|     SPKR       |     # Snt          # Wrd     |     Corr             Sub            Del             Ins            Err           S.Err     |                                                              
|     Sum/Avg    |      333            5643     |     87.1             8.8            4.1             1.6           14.5            73.6     |
write a CER (or TER) result in exp/train_si284_chainer_train_chainer_transformer_no_preprocess/decode_test_dev93_decode_chainer_transf_noctc_nolm_lm_word65000/result.txt                                   
|     SPKR       |     # Snt         # Wrd     |     Corr            Sub            Del           Ins            Err          S.Err     |                                                                   
|     Sum/Avg    |      503          48634     |     92.9            2.1            5.0           1.1            8.2           83.1     |                                                                   
write a WER result in exp/train_si284_chainer_train_chainer_transformer_no_preprocess/decode_test_dev93_decode_chainer_transf_noctc_nolm_lm_word65000/result.wrd.txt                                        
|     SPKR       |     # Snt          # Wrd     |     Corr            Sub             Del            Ins            Err           S.Err     |                                                               
|     Sum/Avg    |      503            8234     |     84.1           11.3             4.7            1.7           17.6            82.1     |

w/o ctc ctc_weight=0.0, lm_weight=1.0:

write a CER (or TER) result in exp/train_si284_chainer_train_chainer_transformer_no_preprocess/decode_test_eval92_decode_chainer_transf_noctc_lm_word65000/result.txt                                      
|    SPKR       |     # Snt         # Wrd     |    Corr            Sub           Del           Ins            Err         S.Err     |                                                                       
|    Sum/Avg    |      333          33341     |     5.1            0.0          94.9           0.0           94.9          99.4     |                                                                       

write a WER result in exp/train_si284_chainer_train_chainer_transformer_no_preprocess/decode_test_eval92_decode_chainer_transf_noctc_lm_word65000/result.wrd.txt   
|     SPKR       |     # Snt         # Wrd     |     Corr           Sub            Del            Ins            Err          S.Err     |                                                                   
|     Sum/Avg    |      333           5643     |      6.5           1.1           92.4            0.0           93.5           99.1     |  

write a CER (or TER) result in exp/train_si284_chainer_train_chainer_transformer_no_preprocess/decode_test_dev93_decode_chainer_transf_noctc_lm_word65000/result.txt                                  
|    SPKR       |    # Snt          # Wrd     |    Corr           Sub           Del            Ins           Err         S.Err     |                                                                        
|    Sum/Avg    |     503           48634     |     5.7           0.1          94.2            0.0          94.3          99.4     |                                                                        

write a WER result in exp/train_si284_chainer_train_chainer_transformer_no_preprocess/decode_test_dev93_decode_chainer_transf_noctc_lm_word65000/result.wrd.txt                                             
|     SPKR       |     # Snt        # Wrd     |     Corr            Sub            Del           Ins            Err          S.Err     |                                                                    
|     Sum/Avg    |      503          8234     |      7.2            1.1           91.7           0.0           92.9           99.2     |

w/o lm ctc_weight=0.3, lm_weight=0.0:

write a CER (or TER) result in exp/train_si284_chainer_train_chainer_transformer_no_preprocess/decode_test_eval92_decode_chainer_transf_nolm_lm_word65000/result.txt
|    SPKR       |    # Snt          # Wrd     |    Corr           Sub           Del            Ins           Err         S.Err     |
|    Sum/Avg    |     333           33341     |    97.1           1.5           1.4            1.0           3.9          80.2     |
write a WER result in exp/train_si284_chainer_train_chainer_transformer_no_preprocess/decode_test_eval92_decode_chainer_transf_nolm_lm_word65000/result.wrd.txt
|     SPKR       |     # Snt        # Wrd     |     Corr            Sub            Del           Ins            Err          S.Err     |
|     Sum/Avg    |      333          5643     |     89.0           10.2            0.8           1.5           12.5           75.4     |
write a CER (or TER) result in exp/train_si284_chainer_train_chainer_transformer_no_preprocess/decode_test_dev93_decode_chainer_transf_nolm_lm_word65000/result.txt
|    SPKR       |    # Snt         # Wrd     |    Corr            Sub           Del           Ins           Err         S.Err     |
|    Sum/Avg    |     503          48634     |    95.8            2.2           2.1           1.0           5.2          83.1     |
write a WER result in exp/train_si284_chainer_train_chainer_transformer_no_preprocess/decode_test_dev93_decode_chainer_transf_nolm_lm_word65000/result.wrd.txt
|     SPKR       |    # Snt         # Wrd     |     Corr           Sub            Del            Ins           Err          S.Err     |
|     Sum/Avg    |     503           8234     |     85.8          12.6            1.5            1.5          15.7           81.7     |

I am using a model trained for 68 epochs (patience=10). So the averaged model will come from epochs 59 ~68.
Let me know if any additional test is required.

aonotas

Thank you for your work.
I add some comments.

When w/o ctc ctc_weight=0.0, lm_weight=1.0:, the CER result is bad?

 |   Corr           Sub            Del            Ins            Err          S.Err     |                                                                   
 |    6.5           1.1           92.4            0.0           93.5           99.1     |

aonotas · 2019-06-19T02:47:45Z

espnet/nets/chainer_backend/e2e_asr_transformer.py

        if self.flag_return:
            loss_ctc = None
            return self.loss, loss_ctc, loss_att, acc
        else:
            return self.loss

-    def recognize(self, x_block, recog_args, char_list=None, rnnlm=None):
+    def recognize_beam2(self, x_block, recog_args, char_list=None, rnnlm=None, use_jit=False):


Is this method is necessary?
I think this PR does not use recognize_beam2 in other codes.

I left this on purpose. Just in case the recognize_beam didnot work, but I will be removing before merge.

I see, thank you.

aonotas · 2019-06-19T02:57:06Z

espnet/asr/chainer_backend/asr.py

@@ -78,37 +78,48 @@ class CustomUpdater(training.StandardUpdater):
    def __init__(self, train_iter, optimizer, converter, device, accum_grad=1):
        super(CustomUpdater, self).__init__(
            train_iter, optimizer, converter=converter, device=device)
-        self.count = 0
+        self.forward_count = 0


I feel changing from count to forward_count seems to have side effects.
Do you want to fix the code of accum_grad?

I did not find any side effects on the accum_grad with forward_count but I will check it once more. Could you explain me which possible effect appear.?

I'm sorry. This is my concern.
I think CustomUpdater andCustomParallelUpdater are common components.
Is this modification necessary for the Transformer PR?
Actually, I'm not sure why this modification is necessary. (This is just comment.)

If this modification for CustomUpdater andCustomParallelUpdater is not related to Transformer method, I feel you can separate PR into different PRs.
But I'm not the main contributor of ESPNET. so I'm not sure we should separate PR or not.

Fhrozen · 2019-06-19T03:36:18Z

@aonotas thank you for your support and comments. I will be reflecting the modifications later.

ShigekiKarita · 2019-06-19T04:00:28Z

@aonotas Thanks for your help!

Thank you for your work.
I add some comments.

When w/o ctc ctc_weight=0.0, lm_weight=1.0:, the CER result is bad?
 |   Corr           Sub            Del            Ins            Err          S.Err     |                                                                   
 |    6.5           1.1           92.4            0.0           93.5           99.1     |

Unfortunately, this is expected. This strange behaviour is already known in pytorch impl Transformer. We found LM integration without CTC seems to be difficult in WSJ.

aonotas · 2019-06-19T06:39:26Z

We found LM integration without CTC seems to be difficult in WSJ.

Wow, this is interesting. Thank you for your information.

sw005320 · 2019-06-19T14:51:05Z

We found LM integration without CTC seems to be difficult in WSJ.

@creatorscan may fix it.
He told me that he found a bug for this.

Fhrozen · 2019-06-20T12:53:16Z

I just finished to test the chainer model with ngpu=2 & accu_grad=2
The model was trained for 71 epochs (early stop with patience=10)
Training time: 25hrs (2 GPUs GTX TITAN X & CUDA 10)

write a CER (or TER) result in exp/train_si284_chainer_train_chainer_transformer_no_preprocess/decode_test_eval92_decode_chainer_transformer_lm_word65000/result.txt
|    SPKR       |    # Snt          # Wrd     |    Corr           Sub           Del            Ins           Err         S.Err     |
|    Sum/Avg    |     333           33341     |    98.1           1.0           0.9            0.7           2.6          55.6     |
write a WER result in exp/train_si284_chainer_train_chainer_transformer_no_preprocess/decode_test_eval92_decode_chainer_transformer_lm_word65000/result.wrd.txt
|     SPKR       |     # Snt        # Wrd     |     Corr            Sub            Del           Ins            Err          S.Err     |
|     Sum/Avg    |      333          5643     |     95.2            4.4            0.4           1.0            5.8           47.7     |

write a CER (or TER) result in exp/train_si284_chainer_train_chainer_transformer_no_preprocess/decode_test_dev93_decode_chainer_transformer_lm_word65000/result.txt                                        
|    SPKR       |    # Snt         # Wrd     |    Corr            Sub           Del           Ins           Err         S.Err     |                                                                        
|    Sum/Avg    |     503          48634     |    97.0            1.5           1.5           0.8           3.8          65.0     |                                                                        
write a WER result in exp/train_si284_chainer_train_chainer_transformer_no_preprocess/decode_test_dev93_decode_chainer_transformer_lm_word65000/result.wrd.txt                                             
|     SPKR       |    # Snt         # Wrd     |     Corr           Sub            Del            Ins           Err          S.Err     |                                                                    
|     Sum/Avg    |     503           8234     |     92.6           6.5            0.9            1.5           8.9           59.4     |

The result did not change alot, only the dev has a slightly reduction CER 0.2 and WER 0.1.
I will finishing the CER computation and additional small fixes by the weekend.

codecov · 2019-07-04T07:24:34Z

Codecov Report

Merging #774 into v.0.5.0 will increase coverage by <.01%.
The diff coverage is 64.81%.

@@             Coverage Diff             @@
##           v.0.5.0     #774      +/-   ##
===========================================
+ Coverage    51.07%   51.07%   +<.01%     
===========================================
  Files          102      110       +8     
  Lines        10957    11133     +176     
===========================================
+ Hits          5596     5686      +90     
- Misses        5361     5447      +86

Impacted Files	Coverage Δ
espnet/nets/chainer_backend/rnn/decoders.py	`89.43% <ø> (ø)`
espnet/nets/pytorch_backend/e2e_asr_transformer.py	`70.63% <ø> (ø)`	⬆️
espnet/nets/chainer_backend/rnn/attentions.py	`98.18% <ø> (ø)`
...pnet/nets/chainer_backend/transformer/attention.py	`100% <ø> (ø)`
espnet/nets/chainer_backend/rnn/encoders.py	`98.29% <ø> (ø)`
espnet/asr/chainer_backend/asr.py	`0% <0%> (ø)`	⬆️
...nets/chainer_backend/transformer/optimizer_rule.py	`0% <0%> (ø)`
espnet/asr/pytorch_backend/asr.py	`0% <0%> (ø)`	⬆️
...r_backend/transformer/positionwise_feed_forward.py	`100% <100%> (ø)`
.../nets/chainer_backend/transformer/decoder_layer.py	`100% <100%> (ø)`
... and 24 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8cbde38...f0863d6. Read the comment docs.

sw005320 · 2019-07-18T01:59:47Z

@Fhrozen, what is the status?

Fhrozen · 2019-07-18T03:45:18Z

Only Need to add CER computation to RNN for finishing this PR.
I will be doing this once i got back to japan on the weekend (IJCNN conference finish on 19).

sw005320 · 2019-07-18T14:37:36Z

OK. Enjoy IJCNN!

Fhrozen · 2019-07-30T06:25:17Z

@sw005320 @kan-bayashi , pls check it for merge before someone else updates v.0.5. ;)

sw005320

Do you add a test for this PR?

egs/wsj/asr1/conf/tuning/decode_chainer_transformer.yaml

sw005320 · 2019-07-30T06:43:25Z

espnet/asr/pytorch_backend/asr.py

+    def update(self):
+        self.update_core()
+        if self.forward_count == 0:
+            self.iteration += 1


Why did you do it here?

espnet/espnet/asr/pytorch_backend/asr.py

Line 162 in 53ca7b9

self.iteration += 1

seems to increase the iterations. Do you need this? If so, could add comments about this?

This was related to #777, I suppose I need to add it as comment

espnet/nets/chainer_backend/transformer/attention.py

pytorch reflexion

c5567ac

sw005320 added the Enhancement Enhancement label May 29, 2019

ShigekiKarita mentioned this pull request May 30, 2019

#iterations with accum_grad > 1 #777

Closed

Fhrozen added 10 commits May 30, 2019 12:32

Merge remote-tracking branch 'upstream/v.0.4.0' into pr-transf-chainer

3514dc9

delete wrong files

ef372eb

add fixed files

6b9f063

travis fix

2bf98b8

Merge remote-tracking branch 'upstream/v.0.4.0' into pr-transf-chainer

b040fa8

chainer reflection on pytorch

cc1f32c

first try add joint training

4810ddf

add joint decoding

e06cba0

delete incorrect files

08e2905

add fixed files

2f6b32d

Fhrozen mentioned this pull request Jun 11, 2019

Output dimension of Conv2dSubsampling #822

Closed

Fhrozen added 9 commits June 11, 2019 12:16

Merge remote-tracking branch 'upstream/v.0.4.0' into pr-transf-chainer

e9dad0e

fix travis

2047b17

ci fixes and delete wrong files

4b3dec0

adding fixed files

7bcaa07

delete wrong file

eba442e

adding fixed file

80e19c7

greedy cer/wer for pytorch transf

11e9d86

fixing bug for espnet#822

aca5f08

fix espnet#771 and flake8

f3db4de

Fhrozen mentioned this pull request Jun 11, 2019

cer_ctc is not working for the transformer #755

Closed

takenori-y reviewed Jun 11, 2019

View reviewed changes

aonotas reviewed Jun 19, 2019

View reviewed changes

Fhrozen added 2 commits June 23, 2019 19:49

Merge remote-tracking branch 'upstream/v.0.5.0' into pr-transf-chainer

932129e

setting decoding params

3a911a6

This was referenced Jun 25, 2019

Fix: Output dimension of Conv2dSubsampling #822 #921

Merged

Fix plot attention for chainer transformer #940

Merged

Fhrozen added 4 commits July 2, 2019 23:22

updating optimizer rule

9445179

merge upstream/v.0.5.0

1b465f2

fix test

e043087

merge from v.0.5.0

b386d64

Fhrozen added 2 commits July 18, 2019 12:53

fix er bug

6df16a9

Merge remote-tracking branch 'upstream/v.0.5.0' into pr-transf-chainer

fb4ed34

Fhrozen added 2 commits July 30, 2019 10:53

merge from v.0.5.0

627195f

fixes from merge

59858fc

Merge remote-tracking branch 'upstream/v.0.5.0' into pr-transf-chainer

53ca7b9

sw005320 reviewed Jul 30, 2019

View reviewed changes

fix last comments.

f0863d6

sw005320 merged commit 19b7916 into espnet:v.0.5.0 Jul 30, 2019

Fhrozen deleted the pr-transf-chainer branch July 30, 2019 08:56

kan-bayashi changed the title ~~[WIP] Transformer Chainer~~ Transformer Chainer Jul 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transformer Chainer #774

Transformer Chainer #774

Fhrozen commented May 29, 2019 •

edited

Loading

ShigekiKarita commented May 29, 2019

Fhrozen commented Jun 11, 2019

Fhrozen commented Jun 11, 2019

sw005320 commented Jun 11, 2019 •

edited

Loading

sw005320 commented Jun 11, 2019

Fhrozen commented Jun 18, 2019 •

edited

Loading

aonotas left a comment •

edited

Loading

aonotas Jun 19, 2019

Fhrozen Jun 19, 2019

aonotas Jun 19, 2019

aonotas Jun 19, 2019

Fhrozen Jun 19, 2019

aonotas Jun 19, 2019

Fhrozen commented Jun 19, 2019

ShigekiKarita commented Jun 19, 2019

aonotas commented Jun 19, 2019 •

edited

Loading

sw005320 commented Jun 19, 2019

Fhrozen commented Jun 20, 2019

codecov bot commented Jul 4, 2019 •

edited

Loading

sw005320 commented Jul 18, 2019

Fhrozen commented Jul 18, 2019

sw005320 commented Jul 18, 2019

Fhrozen commented Jul 30, 2019

sw005320 left a comment

sw005320 Jul 30, 2019

Fhrozen Jul 30, 2019

Transformer Chainer #774

Transformer Chainer #774

Conversation

Fhrozen commented May 29, 2019 • edited Loading

ShigekiKarita commented May 29, 2019

Fhrozen commented Jun 11, 2019

Fhrozen commented Jun 11, 2019

sw005320 commented Jun 11, 2019 • edited Loading

sw005320 commented Jun 11, 2019

Fhrozen commented Jun 18, 2019 • edited Loading

aonotas left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fhrozen commented Jun 19, 2019

ShigekiKarita commented Jun 19, 2019

aonotas commented Jun 19, 2019 • edited Loading

sw005320 commented Jun 19, 2019

Fhrozen commented Jun 20, 2019

codecov bot commented Jul 4, 2019 • edited Loading

Codecov Report

sw005320 commented Jul 18, 2019

Fhrozen commented Jul 18, 2019

sw005320 commented Jul 18, 2019

Fhrozen commented Jul 30, 2019

sw005320 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fhrozen commented May 29, 2019 •

edited

Loading

sw005320 commented Jun 11, 2019 •

edited

Loading

Fhrozen commented Jun 18, 2019 •

edited

Loading

aonotas left a comment •

edited

Loading

aonotas commented Jun 19, 2019 •

edited

Loading

codecov bot commented Jul 4, 2019 •

edited

Loading