Skip to content
This repository has been archived by the owner on Nov 22, 2022. It is now read-only.

Add stacked transformers #1073

Closed
wants to merge 2 commits into from

Conversation

mwu1993
Copy link
Contributor

@mwu1993 mwu1993 commented Oct 24, 2019

Summary: Add stacked models for pre-training (MLM) and fine-tuning (classification). These models are Bert models where the losses can be applied to multiple layers, so that after fine-tuning we can keep several or all layers and have good final models (for faster inference). https://fb.quip.com/m2HOASKMgJc4

Differential Revision: D18096913

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Oct 24, 2019
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D18096913

mwu1993 pushed a commit to mwu1993/pytext-1 that referenced this pull request Oct 30, 2019
Summary:
Pull Request resolved: facebookresearch#1073

Add stacked models for pre-training (MLM) and fine-tuning (classification). These models are Bert models where the losses can be applied to multiple layers, so that after fine-tuning we can keep several or all layers and have good final models (for faster inference). https://fb.quip.com/m2HOASKMgJc4

Differential Revision: D18096913

fbshipit-source-id: 32e51dc43502f1f72fd577953eb55be04f93f16a
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D18096913

Michael Wu added 2 commits October 31, 2019 17:35
Summary:
Bert classification benefits from getting the full layer representations, because

- We can do multi-tasking of classification and MLM. (The masked LM needs output_encoded_layers = True, and since the tasks share the encoder module the classification model would need it too.

- Next diff in the stack needs encoded layers to add losses to the intermediate layers.

These layer representations are computed by the underlying transformer module anyways, so there is no increase in training time.

Differential Revision: D18263698

fbshipit-source-id: 8807db8cdd4815c5fe3068910fa26ee692ada3b0
Summary:
Pull Request resolved: facebookresearch#1073

Add stacked models for pre-training (MLM) and fine-tuning (classification). These models are Bert models where the losses can be applied to multiple layers, so that after fine-tuning we can keep several or all layers and have good final models (for faster inference). https://fb.quip.com/m2HOASKMgJc4

Differential Revision: D18096913

fbshipit-source-id: d22f360de6fb8b4519b91ad782564d2bc1748635
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D18096913

@mwu1993 mwu1993 closed this Jan 13, 2020
facebook-github-bot pushed a commit that referenced this pull request Mar 8, 2020
…nt (#1073)

Summary:
[This commit](facebookresearch/fairseq@dd1298e) made it so that duplicate entries in a dictionary are ignored. Unfortunately the Camembert model depends on overwriting `<unk>`, `<s>` and `</s>`.

The proposed solution here is to allow the dictionary to have entries like:
```
<unk> 999 #fairseq:overwrite
<s> 999 #fairseq:overwrite
</s> 999 #fairseq:overwrite
, 999
▁de 999
. 999
(...)
```

These will preserve the old overwriting behavior. Thus we can release a new `camembert.v0.tar.gz` with a dictionary like above and it works.
Pull Request resolved: fairinternal/fairseq-py#1073

Reviewed By: kahne

Differential Revision: D20284569

Pulled By: myleott

fbshipit-source-id: bf78fbff13c94bf8a6485cbdda62305ddc30c056
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed Do not delete this pull request or issue due to inactivity.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants