Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding BERT as a backend #201

Closed
phiresky opened this issue Nov 19, 2018 · 8 comments
Closed

Consider adding BERT as a backend #201

phiresky opened this issue Nov 19, 2018 · 8 comments

Comments

@phiresky
Copy link

I was just wondering if you've considered adding BERT as an additional backend (as an alternative to the OpenAI GPT), which seems to improve on the performance of the GPT in most tasks.

Their TensorFlow code is open source here: https://github.com/google-research/bert https://arxiv.org/abs/1810.04805

@madisonmay
Copy link
Contributor

Hi @phiresky,

We're definitely tracking this -- in fact we're doing some of the preliminary work on this today. For the most part, we think the architecture of finetune is general enough to support this and we intend to support BERT as a base model for finetune as soon as we find enough cycles to complete a port!

@benleetownsend
Copy link
Contributor

Just a quick update on this, we did some initial evaluations and found that out of the box, BERT performs worse than Finetune on the < 1000 samples that we are specifically interested in targeting. For this reason doing a compatible port was not a high priority for us.

We have started the process of implementing features from the BERT model into Finetune with the aim to train our own base models. This will be a continuous process and I will keep you updated.

@madisonmay
Copy link
Contributor

madisonmay commented May 31, 2019

Hi @phiresky, support for BERT is live!

Example usage

from finetune.base_models import BERT
model = Classifier(base_model=BERT)

https://finetune.indico.io/#using-different-base-models-e-g-bert-gpt2

@phiresky
Copy link
Author

nice, thanks! I'll try it out in a bit

@phiresky
Copy link
Author

Actually, I just realized I don't have access to the dataset I previously wanted to try it on anymore. Do you happen to have and results on performance? Since @benleetownsend previously mentioned that you had worse performance than with GPT when you tried it.

@madisonmay
Copy link
Contributor

@phiresky BERT large will likely outperform GPT at >300 examples. For BERT small it's more of a toss-up. Legend of this is cutoff, but blue is GPT, green is BERT large, and red is BERT small.

image

@phiresky
Copy link
Author

phiresky commented May 31, 2019

thanks for the information! I guess i can close this :)

@madisonmay
Copy link
Contributor

Sounds good! We have some more plots of BERT model training but not against GPT baselines so probably not as useful to you. LMK what results you see if you do end up running a GPT / BERT comparison, always looking to collect datapoints on how the different backends perform!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants