Add mask is set for Attn during training. #67

yuntang · 2017-06-15T23:00:33Z

In Decoder.forward, no mask is set for attention model before attention computation. The softmax will has 0 (padding value) as input and the output will be exp(0)/sum exp(x_i) != 0

donglixp · 2017-06-16T23:52:59Z

The mask is used in: https://github.com/OpenNMT/OpenNMT-py/blob/master/onmt/Translator.py#L130

magic282 · 2017-06-22T03:09:00Z

It seems that this apply mask is not used during training.

donglixp · 2017-06-22T20:57:45Z

@magic282 There's no mask in during training in the implementation. I'm not sure whether it would make a huge difference.

jekbradbury · 2017-06-23T01:28:13Z

I don't think it does, but I also haven't run any comparison tests

vene · 2017-06-27T17:46:23Z

I assumed since the sentences are sorted by length, with small enough batches and large enough datasets, training batches will be fully filled out? Now I'm not sure anymore...

magic282 · 2017-06-28T01:26:38Z

@vene But with option -extra-shuffle, I guess things will be different.

nelson-liu · 2017-06-28T05:35:47Z

Anecdotally speaking, I ran an informal comparison and it made almost no difference, since as @vene said my dataset was large enough and the batch size was small enough that the majority of batches had no padding.

vene · 2017-06-29T17:53:42Z

Thanks for checking @nelson-liu, that makes sense!

I wonder if skipping the masking really saves a lot of time during training. With -extra-shuffle it indeed seems like this is a bug, as @magic282 points out. Even with sorted batches, and with a huge number of sentences for each length bin, there will be some unfortunate batches with one sentence of length d+1 and N-1 sentences of length d, where the code does not correctly reflect the intended model, then.

vince62s · 2018-09-03T12:49:09Z

old thread, if someone is motivated to implement, just reopen.

falcondai added the awaiting team label Jul 8, 2017

srush added contributions welcome type:feature and removed awaiting team labels Jul 12, 2017

srush changed the title ~~No mask is set for Attn for nmt~~ Add mask is set for Attn during training. Jul 12, 2017

vince62s closed this as completed Sep 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add mask is set for Attn during training. #67

Add mask is set for Attn during training. #67

yuntang commented Jun 15, 2017 •

edited

donglixp commented Jun 16, 2017

magic282 commented Jun 22, 2017

donglixp commented Jun 22, 2017

jekbradbury commented Jun 23, 2017

vene commented Jun 27, 2017

magic282 commented Jun 28, 2017

nelson-liu commented Jun 28, 2017

vene commented Jun 29, 2017

vince62s commented Sep 3, 2018

Add mask is set for Attn during training. #67

Add mask is set for Attn during training. #67

Comments

yuntang commented Jun 15, 2017 • edited

donglixp commented Jun 16, 2017

magic282 commented Jun 22, 2017

donglixp commented Jun 22, 2017

jekbradbury commented Jun 23, 2017

vene commented Jun 27, 2017

magic282 commented Jun 28, 2017

nelson-liu commented Jun 28, 2017

vene commented Jun 29, 2017

vince62s commented Sep 3, 2018

yuntang commented Jun 15, 2017 •

edited