Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Help understand the share* input arguments. #42

Closed
ashim95 opened this issue Nov 20, 2018 · 2 comments
Closed

Help understand the share* input arguments. #42

ashim95 opened this issue Nov 20, 2018 · 2 comments

Comments

@ashim95
Copy link

ashim95 commented Nov 20, 2018

Thanks for releasing the code.

  1. Please help me understand these share* arguments in the input and their functions. Which of these arguments define sharing between forward and reverse translation models (src-tgt and tgt-src)?
    Here are the args:

share_lang_emb
share_encdec_emb
share_decpro_emb
share_output_emb
share_lstm_proj
share_enc
share_dec

  1. Also, can you provide some intuition on the scenarios when these should be changed from the default values? Like, when the languages are distant, or low-resource etc.

  2. For share_enc, share_dec, I understand that if we have 4 encoder and 4 decoder layers and I set these to 2 and 2 respectively, I am sharing the first 2 encoder/decoder layers. Is that correct? What happens in the case of the reverse translation model (tgt-src), are all of these shared?

  3. For share_decpro_emb, following Press and Wolf (2016), I understand the input and output embeddings for the decoder are shared. Currently, they are also tied to the reverse model decoder (tgt-src) because we have a joint vocabulary. How do I not share these decoder embeddings across languages (ex: distant pairs like en-hi)?

  4. For share_output_emb, when you say Share decoder output embeddings, sharing with what? (forward and reverse models?)

  5. In your Unsupervised NMT+ PBSMT paper, in section 4.3.1, it is said 'all lookup tables between encoder-decoder, and source-target language are shared'? Isn't later (src-tgt) a consequence of joint BPE vocabulary model? Also, can you clarify how many different look-up tables you are using and how that choice might be affected by the case of distant languages with different alphabets?

Thanks again,
Ashim

@glample
Copy link
Contributor

glample commented Nov 21, 2018

  1. You can check the argparse, this should be helpful: https://github.com/facebookresearch/UnsupervisedMT/blob/master/NMT/main.py#L78-L91
    Also, see 2) below:

  2. Sharing all encoder and decoder layers (share_enc and share_dec) is usually good, or not far from the best you could get. Sharing the language embeddings (share_lang_emb) helps if the languages are related (like English-French), but is not very useful if they are very distant or have a different alphabet (like English-Russian). Sharing the decoder with the output embeddings (share_decpro_emb) or sharing the output embeddings with the input ones (share_output_emb) usually doesn't make a big difference. I would suggest setting this to True as it might help in very low resource scenarios.

  3. Not exactly. You will be sharing the 2 first layers of the decoder, but the 2 last layers of the encoder. The sharing grows from the distance to the latent state.
    See:
    https://github.com/facebookresearch/UnsupervisedMT/blob/master/NMT/src/model/transformer.py#L60 for the encoder, and:
    https://github.com/facebookresearch/UnsupervisedMT/blob/master/NMT/src/model/transformer.py#L166
    for the decoder.

  4. For this you should set share_lang_emb to False

  5. I recognize this is a bit tricky, and some parameters can have a different effect based on the others. I mean that they are not all totally independent. I would suggest looking at these few lines:
    https://github.com/facebookresearch/UnsupervisedMT/blob/master/NMT/src/model/transformer.py#L184-L198 it is probably much simpler that you look at these to understand what is shared in which condition.

  6. Yes, you need to have a shared vocabulary to share the lookup tables. share_lang_emb = True is only possible if the vocabulary is the same for the 2 languages. In total, there are 6 lookup tables: 2 for the encoder input, 2 for the decoder input, and 2 for the decoder output. If you set share_lang_emb = True, it becomes 1 for the encoder input, 1 for the decoder input, and 1 for the decoder output.
    If you also set share_decpro_emb = True you only have 1 lookup table in the encoder, and 1 in the decoder. If you also set share_encdec_emb = True, you only have one lookup table in the end.

Also, this should be helpful to understand what parameters are related, and not independent:
https://github.com/facebookresearch/UnsupervisedMT/blob/master/NMT/src/model/__init__.py#L23-L31
It checks that parameters are valid and not contradictory with each other. For instance assert not params.share_output_emb or params.share_lang_emb says that if we share the output embeddings, then necessarily we share the source and target embeddings and that share_lang_emb has to be true.

Overall, I would not worry too much about this, and I would just suggest sharing everything. Sharing everything should not give you something very far from the best performance you may get by not sharing some specific layers.

Hope this helps.

@ashim95
Copy link
Author

ashim95 commented Nov 21, 2018

Thanks a lot Guillaume for such a detailed response.

@ashim95 ashim95 closed this as completed Nov 21, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants