-
Notifications
You must be signed in to change notification settings - Fork 264
Help understand the share* input arguments. #42
Comments
Also, this should be helpful to understand what parameters are related, and not independent: Overall, I would not worry too much about this, and I would just suggest sharing everything. Sharing everything should not give you something very far from the best performance you may get by not sharing some specific layers. Hope this helps. |
Thanks a lot Guillaume for such a detailed response. |
Thanks for releasing the code.
Here are the args:
share_lang_emb
share_encdec_emb
share_decpro_emb
share_output_emb
share_lstm_proj
share_enc
share_dec
Also, can you provide some intuition on the scenarios when these should be changed from the default values? Like, when the languages are distant, or low-resource etc.
For
share_enc
,share_dec
, I understand that if we have 4 encoder and 4 decoder layers and I set these to 2 and 2 respectively, I am sharing the first 2 encoder/decoder layers. Is that correct? What happens in the case of the reverse translation model (tgt-src), are all of these shared?For
share_decpro_emb
, following Press and Wolf (2016), I understand the input and output embeddings for the decoder are shared. Currently, they are also tied to the reverse model decoder (tgt-src) because we have a joint vocabulary. How do I not share these decoder embeddings across languages (ex: distant pairs like en-hi)?For
share_output_emb
, when you say Share decoder output embeddings, sharing with what? (forward and reverse models?)In your Unsupervised NMT+ PBSMT paper, in section 4.3.1, it is said 'all lookup tables between encoder-decoder, and source-target language are shared'? Isn't later (src-tgt) a consequence of joint BPE vocabulary model? Also, can you clarify how many different look-up tables you are using and how that choice might be affected by the case of distant languages with different alphabets?
Thanks again,
Ashim
The text was updated successfully, but these errors were encountered: