Consider adding BERT as a backend #201

phiresky · 2018-11-19T15:38:22Z

I was just wondering if you've considered adding BERT as an additional backend (as an alternative to the OpenAI GPT), which seems to improve on the performance of the GPT in most tasks.

Their TensorFlow code is open source here: https://github.com/google-research/bert https://arxiv.org/abs/1810.04805

madisonmay · 2018-11-19T15:46:40Z

Hi @phiresky,

We're definitely tracking this -- in fact we're doing some of the preliminary work on this today. For the most part, we think the architecture of finetune is general enough to support this and we intend to support BERT as a base model for finetune as soon as we find enough cycles to complete a port!

benleetownsend · 2019-01-28T18:02:55Z

Just a quick update on this, we did some initial evaluations and found that out of the box, BERT performs worse than Finetune on the < 1000 samples that we are specifically interested in targeting. For this reason doing a compatible port was not a high priority for us.

We have started the process of implementing features from the BERT model into Finetune with the aim to train our own base models. This will be a continuous process and I will keep you updated.

madisonmay · 2019-05-31T15:47:01Z

Hi @phiresky, support for BERT is live!

Example usage

from finetune.base_models import BERT
model = Classifier(base_model=BERT)

https://finetune.indico.io/#using-different-base-models-e-g-bert-gpt2

phiresky · 2019-05-31T15:47:45Z

nice, thanks! I'll try it out in a bit

phiresky · 2019-05-31T15:49:49Z

Actually, I just realized I don't have access to the dataset I previously wanted to try it on anymore. Do you happen to have and results on performance? Since @benleetownsend previously mentioned that you had worse performance than with GPT when you tried it.

madisonmay · 2019-05-31T15:55:41Z

@phiresky BERT large will likely outperform GPT at >300 examples. For BERT small it's more of a toss-up. Legend of this is cutoff, but blue is GPT, green is BERT large, and red is BERT small.

phiresky · 2019-05-31T16:31:33Z

thanks for the information! I guess i can close this :)

madisonmay · 2019-05-31T17:14:03Z

Sounds good! We have some more plots of BERT model training but not against GPT baselines so probably not as useful to you. LMK what results you see if you do end up running a GPT / BERT comparison, always looking to collect datapoints on how the different backends perform!

phiresky closed this as completed May 31, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider adding BERT as a backend #201

Consider adding BERT as a backend #201

phiresky commented Nov 19, 2018

madisonmay commented Nov 19, 2018

benleetownsend commented Jan 28, 2019

madisonmay commented May 31, 2019 •

edited

Loading

phiresky commented May 31, 2019

phiresky commented May 31, 2019

madisonmay commented May 31, 2019

phiresky commented May 31, 2019 •

edited

Loading

madisonmay commented May 31, 2019

Consider adding BERT as a backend #201

Consider adding BERT as a backend #201

Comments

phiresky commented Nov 19, 2018

madisonmay commented Nov 19, 2018

benleetownsend commented Jan 28, 2019

madisonmay commented May 31, 2019 • edited Loading

phiresky commented May 31, 2019

phiresky commented May 31, 2019

madisonmay commented May 31, 2019

phiresky commented May 31, 2019 • edited Loading

madisonmay commented May 31, 2019

madisonmay commented May 31, 2019 •

edited

Loading

phiresky commented May 31, 2019 •

edited

Loading