-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider adding BERT as a backend #201
Comments
Hi @phiresky, We're definitely tracking this -- in fact we're doing some of the preliminary work on this today. For the most part, we think the architecture of finetune is general enough to support this and we intend to support BERT as a base model for finetune as soon as we find enough cycles to complete a port! |
Just a quick update on this, we did some initial evaluations and found that out of the box, BERT performs worse than Finetune on the < 1000 samples that we are specifically interested in targeting. For this reason doing a compatible port was not a high priority for us. We have started the process of implementing features from the BERT model into Finetune with the aim to train our own base models. This will be a continuous process and I will keep you updated. |
Hi @phiresky, support for BERT is live! Example usage
https://finetune.indico.io/#using-different-base-models-e-g-bert-gpt2 |
nice, thanks! I'll try it out in a bit |
Actually, I just realized I don't have access to the dataset I previously wanted to try it on anymore. Do you happen to have and results on performance? Since @benleetownsend previously mentioned that you had worse performance than with GPT when you tried it. |
@phiresky BERT large will likely outperform GPT at >300 examples. For BERT small it's more of a toss-up. Legend of this is cutoff, but blue is GPT, green is BERT large, and red is BERT small. |
thanks for the information! I guess i can close this :) |
Sounds good! We have some more plots of BERT model training but not against GPT baselines so probably not as useful to you. LMK what results you see if you do end up running a GPT / BERT comparison, always looking to collect datapoints on how the different backends perform! |
I was just wondering if you've considered adding BERT as an additional backend (as an alternative to the OpenAI GPT), which seems to improve on the performance of the GPT in most tasks.
Their TensorFlow code is open source here: https://github.com/google-research/bert https://arxiv.org/abs/1810.04805
The text was updated successfully, but these errors were encountered: