[CLOSED] Complete pytorch transformers interface, deprecate old GPT implement #881

jeswan · 2020-09-17T18:21:27Z

Issue by HaokunLiu
Thursday Aug 08, 2019 at 15:20 GMT
Originally opened as nyu-mll/jiant#881

Replacing old GPT implement with the one from huggingface pytorch transformers
Add GPT2, Transformer-XL, XLM to pytorch transformer interface
Refactor pytorch transformer interface a little bit to reduce duplicated / outdated code & comments

HaokunLiu included the following code: https://github.com/nyu-mll/jiant/pull/881/commits

jeswan · 2020-09-17T18:21:28Z

Comment by sleepinyourhat
Thursday Aug 08, 2019 at 17:43 GMT

Thanks for taking this on!

I'm interested to add RoBERTa as soon as that comes out in pytorch-transformers, so if this isn't done by then, we may have competing PRs.

jeswan · 2020-09-17T18:21:30Z

Comment by sleepinyourhat
Thursday Aug 08, 2019 at 17:43 GMT

#855 #846

jeswan · 2020-09-17T18:21:31Z

Comment by pep8speaks
Monday Aug 12, 2019 at 02:58 GMT

Hello @HaokunLiu! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file jiant/models.py:

Line 203:101: E501 line too long (108 > 100 characters)
Line 932:101: E501 line too long (102 > 100 characters)
Line 1249:65: W291 trailing whitespace

In the file jiant/preprocess.py:

Line 131:101: E501 line too long (106 > 100 characters)
Line 634:1: W293 blank line contains whitespace
Line 638:44: W291 trailing whitespace
Line 642:1: W293 blank line contains whitespace

In the file jiant/pytorch_transformers_interface/__init__.py:

Line 13:63: W291 trailing whitespace
Line 15:101: E501 line too long (105 > 100 characters)

In the file jiant/pytorch_transformers_interface/modules.py:

Line 91:1: W293 blank line contains whitespace
Line 93:45: W291 trailing whitespace
Line 95:1: W293 blank line contains whitespace
Line 114:101: E501 line too long (121 > 100 characters)
Line 124:101: E501 line too long (104 > 100 characters)
Line 127:101: E501 line too long (114 > 100 characters)
Line 129:101: E501 line too long (124 > 100 characters)
Line 130:101: E501 line too long (120 > 100 characters)
Line 169:101: E501 line too long (110 > 100 characters)
Line 171:101: E501 line too long (104 > 100 characters)
Line 174:101: E501 line too long (109 > 100 characters)
Line 191:44: W291 trailing whitespace
Line 193:1: W293 blank line contains whitespace
Line 197:1: W293 blank line contains whitespace
Line 209:1: W293 blank line contains whitespace
Line 212:1: W293 blank line contains whitespace
Line 219:76: W291 trailing whitespace
Line 221:1: W293 blank line contains whitespace
Line 223:45: W291 trailing whitespace
Line 238:1: W293 blank line contains whitespace

In the file jiant/tasks/lm_parsing.py:

Line 43:88: W291 trailing whitespace

You can repair most issues by installing black and running: black -l 100 ./*. If you contribute often, have a look at the 'Contributing' section of the README for instructions on doing this automatically.

Comment last updated at 2019-08-26 16:26:29 UTC

jeswan · 2020-09-17T18:21:32Z

Comment by HaokunLiu
Tuesday Aug 13, 2019 at 17:00 GMT

Mostly done, I'll run some test to see if there is any decline in performance. In the meanwhile, feel free to check the code, implementation decisions, naming, notations etc. and tell me where needs to change.

jeswan · 2020-09-17T18:21:33Z

Comment by sleepinyourhat
Thursday Aug 15, 2019 at 17:31 GMT

RoBERTa is out in pytorch_transformers! We should try to merge it in soon.

https://www.google.com/url?q=https://urldefense.proofpoint.com/v2/url?u%3Dhttps-3A__github.com_huggingface_pytorch-2Dtransformers_releases_tag_1.1.0-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DAAJZSWP6UCKXJH57B5VQENDQEVZH3A5CNFSM4IL7KGAKYY3PNVWWK3TUL52HS4DFU5JGK3DFMFZWLKTDN5WW2ZLOORPWSZGOAETMASQ%26d%3DDwMCaQ%26c%3DslrrB7dE8n7gBJbeO0g-IQ%26r%3DsCzLyHdE8zgQwk2-sKwA1w%26m%3Dcg2tUcgtNESh6Wn0rVEK5zfKhc5hCf4XfjlttyKszz0%26s%3DUoWZxG8KT3vLXb4vb3g-YyDn0rezu4tLUM9vlJQTA2A%26e%3D&source=gmail&ust=1565969616801000&usg=AFQjCNGzaMvGuBNc8DUth2g_9Cl8oZI8Rw

Mind either trying to finish this within a day or two so we can start a new RoBERTa PR, or else adding RoBERTA here directly (which might require other changes to match pytorch_transformers 1.0->1.1)?

jeswan · 2020-09-17T18:21:34Z

Comment by HaokunLiu
Thursday Aug 15, 2019 at 18:18 GMT

RoBERTa is out in pytorch_transformers! We should try to merge it in soon.

https://www.google.com/url?q=https://urldefense.proofpoint.com/v2/url?u%3Dhttps-3A__github.com_huggingface_pytorch-2Dtransformers_releases_tag_1.1.0-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DAAJZSWP6UCKXJH57B5VQENDQEVZH3A5CNFSM4IL7KGAKYY3PNVWWK3TUL52HS4DFU5JGK3DFMFZWLKTDN5WW2ZLOORPWSZGOAETMASQ%26d%3DDwMCaQ%26c%3DslrrB7dE8n7gBJbeO0g-IQ%26r%3DsCzLyHdE8zgQwk2-sKwA1w%26m%3Dcg2tUcgtNESh6Wn0rVEK5zfKhc5hCf4XfjlttyKszz0%26s%3DUoWZxG8KT3vLXb4vb3g-YyDn0rezu4tLUM9vlJQTA2A%26e%3D&source=gmail&ust=1565969616801000&usg=AFQjCNGzaMvGuBNc8DUth2g_9Cl8oZI8Rw

Mind either trying to finish this within a day or two so we can start a new RoBERTa PR, or else adding RoBERTA here directly (which might require other changes to match pytorch_transformers 1.0->1.1)?

The tuning and testing haven't finished running, I am not sure if everything can be in place in one or two days. But I can add RoBERTA here.

jeswan · 2020-09-17T18:21:36Z

Comment by sleepinyourhat
Thursday Aug 15, 2019 at 22:43 GMT

Thanks for the update! I'll take a closer look when this is all done.

I'm away tonight–Sunday, but if you need help/feedback sooner, bug Yada/Alex/Ian. (If it's obviously done before then, they can dismiss my requests and merge.)

jeswan · 2020-09-17T18:21:37Z

Comment by HaokunLiu
Wednesday Aug 21, 2019 at 00:34 GMT

GLUE val results come out.

I tuned and evaluated bert-large-cased, roberta-large, openai-gpt, xlm-mlm-en-2048. Except openai-gpt, most results are up to expectation.
In fact there is a bug in gpt, gpt2 and transformer-xl. I fixed it today. It will probably take 2 days before I get the new results.
The first column is from [CLOSED] XLNet support and overhaul/cleanup of BERT support #845, however, I didn't notice it was bert-base until I finish running the models. I will run a bert-large-cased model from master-branch.

	bert-base(old)	bert-large-cased	roberta-large	openai-gpt	xlm-mlm-en-2048
cola	cola_mcc: 0.593, cola_accuracy: 0.834	cola_mcc: 0.666, cola_accuracy: 0.862	cola_mcc: 0.685, cola_accuracy: 0.870	cola_mcc: 0.576, cola_accuracy: 0.827	cola_mcc: 0.661, cola_accuracy: 0.860
sts-b	sts-b_corr: 0.874, sts-b_pearsonr: 0.876, sts-b_spearmanr: 0.872	sts-b_corr: 0.898, sts-b_pearsonr: 0.900, sts-b_spearmanr: 0.897	sts-b_corr: 0.921, sts-b_pearsonr: 0.922, sts-b_spearmanr: 0.920	sts-b_corr: 0.850, sts-b_pearsonr: 0.851, sts-b_spearmanr: 0.849	sts-b_corr: 0.907, sts-b_pearsonr: 0.908, sts-b_spearmanr: 0.906
mnli	mnli_accuracy: 0.837	mnli_accuracy: 0.845	mnli_accuracy: 0.896	mnli_accuracy: 0.772	mnli_accuracy: 0.872
mrpc	mrpc_acc_f1: 0.868, mrpc_accuracy: 0.846, mrpc_f1: 0.891, mrpc_precision: 0.862, mrpc_recall: 0.921	mrpc_acc_f1: 0.907, mrpc_accuracy: 0.892, mrpc_f1: 0.923, mrpc_precision: 0.904, mrpc_recall: 0.943	mrpc_acc_f1: 0.748, mrpc_accuracy: 0.684, mrpc_f1: 0.812, mrpc_precision: 0.684, mrpc_recall: 1.000	mrpc_acc_f1: 0.842, mrpc_accuracy: 0.811, mrpc_f1: 0.872, mrpc_precision: 0.814, mrpc_recall: 0.939	mrpc_acc_f1: 0.904, mrpc_accuracy: 0.887, mrpc_f1: 0.921, mrpc_precision: 0.887, mrpc_recall: 0.957
qnli	qnli_accuracy: 0.913	qnli_accuracy: 0.924	qnli_accuracy: 0.948	qnli_accuracy: 0.850	qnli_accuracy: 0.932
qqp	qqp_acc_f1: 0.886, qqp_accuracy: 0.903, qqp_f1: 0.869, qqp_precision: 0.862, qqp_recall: 0.877	qqp_acc_f1: 0.853, qqp_accuracy: 0.874, qqp_f1: 0.832, qqp_precision: 0.815, qqp_recall: 0.849	qqp_acc_f1: 0.870, qqp_accuracy: 0.889, qqp_f1: 0.851, qqp_precision: 0.839, qqp_recall: 0.863	qqp_acc_f1: 0.821, qqp_accuracy: 0.843, qqp_f1: 0.798, qqp_precision: 0.761, qqp_recall: 0.838	qqp_acc_f1: 0.866, qqp_accuracy: 0.885, qqp_f1: 0.847, qqp_precision: 0.832, qqp_recall: 0.863
rte	rte_accuracy: 0.693	rte_accuracy: 0.718	rte_accuracy: 0.834	rte_accuracy: 0.635	rte_accuracy: 0.765
sst	sst_accuracy: 0.921	sst_accuracy: 0.936	sst_accuracy: 0.960	sst_accuracy: 0.931	sst_accuracy: 0.960
wnli	wnli_accuracy: 0.310	wnli_accuracy: 0.563	wnli_accuracy: 0.563	wnli_accuracy: 0.437	wnli_accuracy: 0.437

jeswan · 2020-09-17T18:21:38Z

Comment by sleepinyourhat
Wednesday Aug 21, 2019 at 15:56 GMT

Thanks again for taking this on. I took a somewhat quick look at the newer changes, and everything seems reasonable to me. The new results are good enough that I'm not worried about major bugs/regressions, and I think it'll be hard to catch smaller performance problems through this kind of experiment.

I left a couple of small suggestions. Let me know if there's anything that you want to do, or anything that I should look at closely, before we merge.

jeswan · 2020-09-17T18:21:39Z

Comment by HaokunLiu
Wednesday Aug 21, 2019 at 16:48 GMT

Thanks again for taking this on. I took a somewhat quick look at the newer changes, and everything seems reasonable to me. The new results are good enough that I'm not worried about major bugs/regressions, and I think it'll be hard to catch smaller performance problems through this kind of experiment.

I left a couple of small suggestions. Let me know if there's anything that you want to do, or anything that I should look at closely, before we merge.

I think I have addressed all suggests on this code from last and this round. I'll make a PR to the site repo in a day or two.

One last thing, although pytorch_transformer_interface provided get_pretrained_lm_head and apply_lm_boundary_tokens (a small part of them are used in NPI and Blimp projects, but not in this PR), I didn't update LanguageModelingTask and MultiTaskModel._lm_forward
to support them. So, I can not guarantee they are correct and can reproduce the LM results in those papers.
If someone need to do LM with pytorch_transformers input modules, they need to update LanguageModelingTask and MultiTaskModel._lm_forward themselves, and possibly check if get_pretrained_lm_head and apply_lm_boundary_tokens are correct as well.

jeswan · 2020-09-17T18:21:40Z

Comment by sleepinyourhat
Wednesday Aug 21, 2019 at 16:53 GMT

For the LM code, would you mind adding some asserts to prevent people from using the untested code? Once that's done, it sounds like this is ready to merge.

jeswan · 2020-09-17T18:21:41Z

Comment by HaokunLiu
Thursday Aug 22, 2019 at 20:39 GMT

This looks good to me. Do experiments with GPT1 match the old implementation?

I don't know yet, I'll let you know when the result comes out.

jeswan · 2020-09-17T18:21:42Z

Comment by HaokunLiu
Thursday Aug 22, 2019 at 21:37 GMT

Generally, this PR looks good. Thanks for taking the time to do this Haokun! My only concern is about TaskModulator. I do see what you mean in that some of preprocessing belongs to the model, but I think we need to be careful about introducing any more complexity/abstraction to jiant (and making functions that are already long even longer and harder to parse). I'd have to think a bit harder about how to make this abstraction the most clear as it can be, but a start would be renaming TaskModulator to something like ModelPreprocessingInterface.

I was thinking about including tokenizer and indexer inside model_preprocessing_interface as well, and change the main process from create_tasks -> preprocess -> create model to create_tasks -> create model -> preprocess, so that members of model_preprocessing_interface can be passed from model, instead of first creating from args in preprocess, and again creating from args in model. But this is unnecessarily radical for the sake of #881 .

Since you are one of the major developers of jiant, maybe you can consider this idea, and when the time comes, figure out a well-rounded overall architecture for jiant.

jeswan · 2020-09-17T18:21:44Z

Comment by HaokunLiu
Monday Aug 26, 2019 at 02:34 GMT

This looks good to me. Do experiments with GPT1 match the old implementation?

	openai-gpt(old)	openai-gpt
cola	cola_mcc: 0.57882, cola_accuracy: 0.82838	cola_mcc: 0.554, cola_accuracy: 0.818
sts-b	sts-b_corr: 0.748, sts-b_pearsonr: 0.738, sts-b_spearmanr: 0.758	sts-b_corr: 0.857, sts-b_pearsonr: 0.859, sts-b_spearmanr: 0.856
mnli	mnli_accuracy: 0.71000	mnli_accuracy: 0.779
mrpc	mrpc_acc_f1: 0.76916, mrpc_accuracy: 0.71569, mrpc_f1: 0.82263, mrpc_precision: 0.71733, mrpc_recall: 0.96416	mrpc_acc_f1: 0.825, mrpc_accuracy: 0.794, mrpc_f1: 0.855, mrpc_precision: 0.824, mrpc_recall: 0.889
qnli	qnli_accuracy: 0.750	qnli_accuracy: 0.846
qqp	qqp_acc_f1: 0.75931, qqp_accuracy: 0.78620, qqp_f1: 0.73242, qqp_precision: 0.66895, qqp_recall: 0.80918	qqp_acc_f1: 0.824, qqp_accuracy: 0.847, qqp_f1: 0.801, qqp_precision: 0.771, qqp_recall: 0.833
rte	rte_accuracy: 0.560	rte_accuracy: 0.603
sst	sst_accuracy: 0.928	sst_accuracy: 0.939
wnli	wnli_accuracy: 0.437	wnli_accuracy: 0.493

Some results are different, but I think, considering how old gpt and new gpt differs, it meets the expectation.
We didn't include concatenating two sentences in a pair for old gpt, when it process pairwise tasks, it make embedding for two sentences separately, then combine the two embedding to make prediction.
As we can see, on single sentence tasks, i.e. cola and sst, old gpt and new gpt have similar results. While on pair tasks, new gpt is better.

jeswan · 2020-09-17T18:21:45Z

Comment by sleepinyourhat
Monday Aug 26, 2019 at 14:31 GMT

Thanks! That's not that informative of a comparison, but since you're getting numbers in the same ballpark as what OpenAI published, I think that's enough. I agree that it's okay to leave in some awkward abstractions for now—better to get this out there and refactor later than to put too much burden on you for doing it.

There are some new merge conflicts (we moved the config dir), BTW.

jeswan · 2020-09-17T18:21:46Z

Comment by sleepinyourhat
Monday Aug 26, 2019 at 16:49 GMT

Ready to merge?

jeswan · 2020-09-17T18:21:47Z

Comment by HaokunLiu
Monday Aug 26, 2019 at 16:55 GMT

Yes, it’s ready.

jeswan · 2020-09-17T18:21:48Z

Comment by sleepinyourhat
Monday Aug 26, 2019 at 20:06 GMT

Great—I'll make a proper release tm unless someone beats me to it.

jeswan added the 0.x.0 release on fix Put out a new 0.x.0 release when this is fixed. label Sep 17, 2020

This was referenced Sep 17, 2020

[CLOSED] RoBERTa support #890

Closed

[CLOSED] update dependencies to support gpt2-large #909

Closed

Unsupported tokenizer 'OpenAI.BPE' #1049

Open

jeswan closed this as completed Sep 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CLOSED] Complete pytorch transformers interface, deprecate old GPT implement #881

[CLOSED] Complete pytorch transformers interface, deprecate old GPT implement #881

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020

[CLOSED] Complete pytorch transformers interface, deprecate old GPT implement #881

[CLOSED] Complete pytorch transformers interface, deprecate old GPT implement #881

Comments

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020

Comment last updated at 2019-08-26 16:26:29 UTC

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020

jeswan commented Sep 17, 2020