Add stacked transformers #1073

mwu1993 · 2019-10-24T17:32:18Z

Summary: Add stacked models for pre-training (MLM) and fine-tuning (classification). These models are Bert models where the losses can be applied to multiple layers, so that after fine-tuning we can keep several or all layers and have good final models (for faster inference). https://fb.quip.com/m2HOASKMgJc4

Differential Revision: D18096913

facebook-github-bot · 2019-10-24T17:32:34Z

This pull request was exported from Phabricator. Differential Revision: D18096913

Summary: Pull Request resolved: facebookresearch#1073 Add stacked models for pre-training (MLM) and fine-tuning (classification). These models are Bert models where the losses can be applied to multiple layers, so that after fine-tuning we can keep several or all layers and have good final models (for faster inference). https://fb.quip.com/m2HOASKMgJc4 Differential Revision: D18096913 fbshipit-source-id: 32e51dc43502f1f72fd577953eb55be04f93f16a

facebook-github-bot · 2019-10-30T22:59:40Z

This pull request was exported from Phabricator. Differential Revision: D18096913

Summary: Bert classification benefits from getting the full layer representations, because - We can do multi-tasking of classification and MLM. (The masked LM needs output_encoded_layers = True, and since the tasks share the encoder module the classification model would need it too. - Next diff in the stack needs encoded layers to add losses to the intermediate layers. These layer representations are computed by the underlying transformer module anyways, so there is no increase in training time. Differential Revision: D18263698 fbshipit-source-id: 8807db8cdd4815c5fe3068910fa26ee692ada3b0

Summary: Pull Request resolved: facebookresearch#1073 Add stacked models for pre-training (MLM) and fine-tuning (classification). These models are Bert models where the losses can be applied to multiple layers, so that after fine-tuning we can keep several or all layers and have good final models (for faster inference). https://fb.quip.com/m2HOASKMgJc4 Differential Revision: D18096913 fbshipit-source-id: d22f360de6fb8b4519b91ad782564d2bc1748635

facebook-github-bot · 2019-11-01T00:35:29Z

This pull request was exported from Phabricator. Differential Revision: D18096913

…nt (#1073) Summary: [This commit](facebookresearch/fairseq@dd1298e) made it so that duplicate entries in a dictionary are ignored. Unfortunately the Camembert model depends on overwriting `<unk>`, `<s>` and `</s>`. The proposed solution here is to allow the dictionary to have entries like: ``` <unk> 999 #fairseq:overwrite <s> 999 #fairseq:overwrite </s> 999 #fairseq:overwrite , 999 ▁de 999 . 999 (...) ``` These will preserve the old overwriting behavior. Thus we can release a new `camembert.v0.tar.gz` with a dictionary like above and it works. Pull Request resolved: fairinternal/fairseq-py#1073 Reviewed By: kahne Differential Revision: D20284569 Pulled By: myleott fbshipit-source-id: bf78fbff13c94bf8a6485cbdda62305ddc30c056

facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Oct 24, 2019

mwu1993 force-pushed the export-D18096913 branch from 8eafe04 to 90684cd Compare October 30, 2019 22:59

Michael Wu added 2 commits October 31, 2019 17:35

mwu1993 force-pushed the export-D18096913 branch from 90684cd to 65a8d35 Compare November 1, 2019 00:35

mwu1993 closed this Jan 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add stacked transformers #1073

Add stacked transformers #1073

mwu1993 commented Oct 24, 2019

facebook-github-bot commented Oct 24, 2019

facebook-github-bot commented Oct 30, 2019

facebook-github-bot commented Nov 1, 2019

Add stacked transformers #1073

Add stacked transformers #1073

Conversation

mwu1993 commented Oct 24, 2019

facebook-github-bot commented Oct 24, 2019

facebook-github-bot commented Oct 30, 2019

facebook-github-bot commented Nov 1, 2019