Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMOS #19059

Open
2 tasks done
jpcorb20 opened this issue Sep 15, 2022 · 0 comments
Open
2 tasks done

AMOS #19059

jpcorb20 opened this issue Sep 15, 2022 · 0 comments

Comments

@jpcorb20
Copy link

Model description

Abstract

"We present a new framework AMOS that pretrains text encoders with an Adversarial learning curriculum via a Mixture Of Signals from multiple auxiliary generators. Following ELECTRA-style pretraining, the main encoder is trained as a discriminator to detect replaced tokens generated by auxiliary masked language models (MLMs). Different from ELECTRA which trains one MLM as the generator, we jointly train multiple MLMs of different sizes to provide training signals at various levels of difficulty. To push the discriminator to learn better with challenging replaced tokens, we learn mixture weights over the auxiliary MLMs’ outputs to maximize the discriminator loss by backpropagating the gradient from the discriminator via Gumbel-Softmax. For better pretraining efficiency, we propose a way to assemble multiple MLMs into one unified auxiliary model. AMOS outperforms ELECTRA and recent state-of-the-art pretrained models by about 1 point on the GLUE benchmark for BERT base-sized models."

Open source status

  • The model implementation is available
  • The model weights are available

Provide useful links for the implementation

HF Hub : https://huggingface.co/microsoft/amos
GitHUB : https://github.com/microsoft/AMOS
Paper : https://arxiv.org/pdf/2204.03243.pdf

Authors : @yumeng5 @xiongchenyan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant