Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CLOSED] Complete pytorch transformers interface, deprecate old GPT implement #881

Closed
jeswan opened this issue Sep 17, 2020 · 18 comments
Closed
Labels
0.x.0 release on fix Put out a new 0.x.0 release when this is fixed.

Comments

@jeswan
Copy link
Contributor

jeswan commented Sep 17, 2020

Issue by HaokunLiu
Thursday Aug 08, 2019 at 15:20 GMT
Originally opened as nyu-mll/jiant#881


  • Replacing old GPT implement with the one from huggingface pytorch transformers
  • Add GPT2, Transformer-XL, XLM to pytorch transformer interface
  • Refactor pytorch transformer interface a little bit to reduce duplicated / outdated code & comments

HaokunLiu included the following code: https://github.com/nyu-mll/jiant/pull/881/commits

@jeswan jeswan added the 0.x.0 release on fix Put out a new 0.x.0 release when this is fixed. label Sep 17, 2020
@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by sleepinyourhat
Thursday Aug 08, 2019 at 17:43 GMT


Thanks for taking this on!

I'm interested to add RoBERTa as soon as that comes out in pytorch-transformers, so if this isn't done by then, we may have competing PRs.

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by sleepinyourhat
Thursday Aug 08, 2019 at 17:43 GMT


#855 #846

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by pep8speaks
Monday Aug 12, 2019 at 02:58 GMT


Hello @HaokunLiu! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 203:101: E501 line too long (108 > 100 characters)
Line 932:101: E501 line too long (102 > 100 characters)
Line 1249:65: W291 trailing whitespace

Line 131:101: E501 line too long (106 > 100 characters)
Line 634:1: W293 blank line contains whitespace
Line 638:44: W291 trailing whitespace
Line 642:1: W293 blank line contains whitespace

Line 13:63: W291 trailing whitespace
Line 15:101: E501 line too long (105 > 100 characters)

Line 91:1: W293 blank line contains whitespace
Line 93:45: W291 trailing whitespace
Line 95:1: W293 blank line contains whitespace
Line 114:101: E501 line too long (121 > 100 characters)
Line 124:101: E501 line too long (104 > 100 characters)
Line 127:101: E501 line too long (114 > 100 characters)
Line 129:101: E501 line too long (124 > 100 characters)
Line 130:101: E501 line too long (120 > 100 characters)
Line 169:101: E501 line too long (110 > 100 characters)
Line 171:101: E501 line too long (104 > 100 characters)
Line 174:101: E501 line too long (109 > 100 characters)
Line 191:44: W291 trailing whitespace
Line 193:1: W293 blank line contains whitespace
Line 197:1: W293 blank line contains whitespace
Line 209:1: W293 blank line contains whitespace
Line 212:1: W293 blank line contains whitespace
Line 219:76: W291 trailing whitespace
Line 221:1: W293 blank line contains whitespace
Line 223:45: W291 trailing whitespace
Line 238:1: W293 blank line contains whitespace

Line 43:88: W291 trailing whitespace

You can repair most issues by installing black and running: black -l 100 ./*. If you contribute often, have a look at the 'Contributing' section of the README for instructions on doing this automatically.

Comment last updated at 2019-08-26 16:26:29 UTC

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by HaokunLiu
Tuesday Aug 13, 2019 at 17:00 GMT


Mostly done, I'll run some test to see if there is any decline in performance. In the meanwhile, feel free to check the code, implementation decisions, naming, notations etc. and tell me where needs to change.

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by HaokunLiu
Thursday Aug 15, 2019 at 18:18 GMT


RoBERTa is out in pytorch_transformers! We should try to merge it in soon.

https://www.google.com/url?q=https://urldefense.proofpoint.com/v2/url?u%3Dhttps-3A__github.com_huggingface_pytorch-2Dtransformers_releases_tag_1.1.0-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DAAJZSWP6UCKXJH57B5VQENDQEVZH3A5CNFSM4IL7KGAKYY3PNVWWK3TUL52HS4DFU5JGK3DFMFZWLKTDN5WW2ZLOORPWSZGOAETMASQ%26d%3DDwMCaQ%26c%3DslrrB7dE8n7gBJbeO0g-IQ%26r%3DsCzLyHdE8zgQwk2-sKwA1w%26m%3Dcg2tUcgtNESh6Wn0rVEK5zfKhc5hCf4XfjlttyKszz0%26s%3DUoWZxG8KT3vLXb4vb3g-YyDn0rezu4tLUM9vlJQTA2A%26e%3D&source=gmail&ust=1565969616801000&usg=AFQjCNGzaMvGuBNc8DUth2g_9Cl8oZI8Rw

Mind either trying to finish this within a day or two so we can start a new RoBERTa PR, or else adding RoBERTA here directly (which might require other changes to match pytorch_transformers 1.0->1.1)?

The tuning and testing haven't finished running, I am not sure if everything can be in place in one or two days. But I can add RoBERTA here.

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by sleepinyourhat
Thursday Aug 15, 2019 at 22:43 GMT


Thanks for the update! I'll take a closer look when this is all done.

I'm away tonight–Sunday, but if you need help/feedback sooner, bug Yada/Alex/Ian. (If it's obviously done before then, they can dismiss my requests and merge.)

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by HaokunLiu
Wednesday Aug 21, 2019 at 00:34 GMT


GLUE val results come out.

  1. I tuned and evaluated bert-large-cased, roberta-large, openai-gpt, xlm-mlm-en-2048. Except openai-gpt, most results are up to expectation.
  2. In fact there is a bug in gpt, gpt2 and transformer-xl. I fixed it today. It will probably take 2 days before I get the new results.
  3. The first column is from [CLOSED] XLNet support and overhaul/cleanup of BERT support #845, however, I didn't notice it was bert-base until I finish running the models. I will run a bert-large-cased model from master-branch.
  bert-base(old) bert-large-cased roberta-large openai-gpt xlm-mlm-en-2048
cola cola_mcc: 0.593, cola_accuracy: 0.834 cola_mcc: 0.666, cola_accuracy: 0.862 cola_mcc: 0.685, cola_accuracy: 0.870 cola_mcc: 0.576, cola_accuracy: 0.827 cola_mcc: 0.661, cola_accuracy: 0.860
sts-b sts-b_corr: 0.874, sts-b_pearsonr: 0.876, sts-b_spearmanr: 0.872 sts-b_corr: 0.898, sts-b_pearsonr: 0.900, sts-b_spearmanr: 0.897 sts-b_corr: 0.921, sts-b_pearsonr: 0.922, sts-b_spearmanr: 0.920 sts-b_corr: 0.850, sts-b_pearsonr: 0.851, sts-b_spearmanr: 0.849 sts-b_corr: 0.907, sts-b_pearsonr: 0.908, sts-b_spearmanr: 0.906
mnli mnli_accuracy: 0.837 mnli_accuracy: 0.845 mnli_accuracy: 0.896 mnli_accuracy: 0.772 mnli_accuracy: 0.872
mrpc mrpc_acc_f1: 0.868, mrpc_accuracy: 0.846, mrpc_f1: 0.891, mrpc_precision: 0.862, mrpc_recall: 0.921 mrpc_acc_f1: 0.907, mrpc_accuracy: 0.892, mrpc_f1: 0.923, mrpc_precision: 0.904, mrpc_recall: 0.943 mrpc_acc_f1: 0.748, mrpc_accuracy: 0.684, mrpc_f1: 0.812, mrpc_precision: 0.684, mrpc_recall: 1.000 mrpc_acc_f1: 0.842, mrpc_accuracy: 0.811, mrpc_f1: 0.872, mrpc_precision: 0.814, mrpc_recall: 0.939 mrpc_acc_f1: 0.904, mrpc_accuracy: 0.887, mrpc_f1: 0.921, mrpc_precision: 0.887, mrpc_recall: 0.957
qnli qnli_accuracy: 0.913 qnli_accuracy: 0.924 qnli_accuracy: 0.948 qnli_accuracy: 0.850 qnli_accuracy: 0.932
qqp qqp_acc_f1: 0.886, qqp_accuracy: 0.903, qqp_f1: 0.869, qqp_precision: 0.862, qqp_recall: 0.877 qqp_acc_f1: 0.853, qqp_accuracy: 0.874, qqp_f1: 0.832, qqp_precision: 0.815, qqp_recall: 0.849 qqp_acc_f1: 0.870, qqp_accuracy: 0.889, qqp_f1: 0.851, qqp_precision: 0.839, qqp_recall: 0.863 qqp_acc_f1: 0.821, qqp_accuracy: 0.843, qqp_f1: 0.798, qqp_precision: 0.761, qqp_recall: 0.838 qqp_acc_f1: 0.866, qqp_accuracy: 0.885, qqp_f1: 0.847, qqp_precision: 0.832, qqp_recall: 0.863
rte rte_accuracy: 0.693 rte_accuracy: 0.718 rte_accuracy: 0.834 rte_accuracy: 0.635 rte_accuracy: 0.765
sst sst_accuracy: 0.921 sst_accuracy: 0.936 sst_accuracy: 0.960 sst_accuracy: 0.931 sst_accuracy: 0.960
wnli wnli_accuracy: 0.310 wnli_accuracy: 0.563 wnli_accuracy: 0.563 wnli_accuracy: 0.437 wnli_accuracy: 0.437

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by sleepinyourhat
Wednesday Aug 21, 2019 at 15:56 GMT


Thanks again for taking this on. I took a somewhat quick look at the newer changes, and everything seems reasonable to me. The new results are good enough that I'm not worried about major bugs/regressions, and I think it'll be hard to catch smaller performance problems through this kind of experiment.

I left a couple of small suggestions. Let me know if there's anything that you want to do, or anything that I should look at closely, before we merge.

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by HaokunLiu
Wednesday Aug 21, 2019 at 16:48 GMT


Thanks again for taking this on. I took a somewhat quick look at the newer changes, and everything seems reasonable to me. The new results are good enough that I'm not worried about major bugs/regressions, and I think it'll be hard to catch smaller performance problems through this kind of experiment.

I left a couple of small suggestions. Let me know if there's anything that you want to do, or anything that I should look at closely, before we merge.

I think I have addressed all suggests on this code from last and this round. I'll make a PR to the site repo in a day or two.

One last thing, although pytorch_transformer_interface provided get_pretrained_lm_head and apply_lm_boundary_tokens (a small part of them are used in NPI and Blimp projects, but not in this PR), I didn't update LanguageModelingTask and MultiTaskModel._lm_forward
to support them. So, I can not guarantee they are correct and can reproduce the LM results in those papers.
If someone need to do LM with pytorch_transformers input modules, they need to update LanguageModelingTask and MultiTaskModel._lm_forward themselves, and possibly check if get_pretrained_lm_head and apply_lm_boundary_tokens are correct as well.

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by sleepinyourhat
Wednesday Aug 21, 2019 at 16:53 GMT


For the LM code, would you mind adding some asserts to prevent people from using the untested code? Once that's done, it sounds like this is ready to merge.

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by HaokunLiu
Thursday Aug 22, 2019 at 20:39 GMT


This looks good to me. Do experiments with GPT1 match the old implementation?

I don't know yet, I'll let you know when the result comes out.

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by HaokunLiu
Thursday Aug 22, 2019 at 21:37 GMT


Generally, this PR looks good. Thanks for taking the time to do this Haokun! My only concern is about TaskModulator. I do see what you mean in that some of preprocessing belongs to the model, but I think we need to be careful about introducing any more complexity/abstraction to jiant (and making functions that are already long even longer and harder to parse). I'd have to think a bit harder about how to make this abstraction the most clear as it can be, but a start would be renaming TaskModulator to something like ModelPreprocessingInterface.

I was thinking about including tokenizer and indexer inside model_preprocessing_interface as well, and change the main process from create_tasks -> preprocess -> create model to create_tasks -> create model -> preprocess, so that members of model_preprocessing_interface can be passed from model, instead of first creating from args in preprocess, and again creating from args in model. But this is unnecessarily radical for the sake of #881 .

Since you are one of the major developers of jiant, maybe you can consider this idea, and when the time comes, figure out a well-rounded overall architecture for jiant.

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by HaokunLiu
Monday Aug 26, 2019 at 02:34 GMT


This looks good to me. Do experiments with GPT1 match the old implementation?

  openai-gpt(old) openai-gpt
cola cola_mcc: 0.57882, cola_accuracy: 0.82838 cola_mcc: 0.554, cola_accuracy: 0.818
sts-b sts-b_corr: 0.748, sts-b_pearsonr: 0.738, sts-b_spearmanr: 0.758 sts-b_corr: 0.857, sts-b_pearsonr: 0.859, sts-b_spearmanr: 0.856
mnli mnli_accuracy: 0.71000 mnli_accuracy: 0.779
mrpc mrpc_acc_f1: 0.76916, mrpc_accuracy: 0.71569, mrpc_f1: 0.82263, mrpc_precision: 0.71733, mrpc_recall: 0.96416 mrpc_acc_f1: 0.825, mrpc_accuracy: 0.794, mrpc_f1: 0.855, mrpc_precision: 0.824, mrpc_recall: 0.889
qnli qnli_accuracy: 0.750 qnli_accuracy: 0.846
qqp qqp_acc_f1: 0.75931, qqp_accuracy: 0.78620, qqp_f1: 0.73242, qqp_precision: 0.66895, qqp_recall: 0.80918 qqp_acc_f1: 0.824, qqp_accuracy: 0.847, qqp_f1: 0.801, qqp_precision: 0.771, qqp_recall: 0.833
rte rte_accuracy: 0.560 rte_accuracy: 0.603
sst sst_accuracy: 0.928 sst_accuracy: 0.939
wnli wnli_accuracy: 0.437 wnli_accuracy: 0.493

Some results are different, but I think, considering how old gpt and new gpt differs, it meets the expectation.
We didn't include concatenating two sentences in a pair for old gpt, when it process pairwise tasks, it make embedding for two sentences separately, then combine the two embedding to make prediction.
As we can see, on single sentence tasks, i.e. cola and sst, old gpt and new gpt have similar results. While on pair tasks, new gpt is better.

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by sleepinyourhat
Monday Aug 26, 2019 at 14:31 GMT


Thanks! That's not that informative of a comparison, but since you're getting numbers in the same ballpark as what OpenAI published, I think that's enough. I agree that it's okay to leave in some awkward abstractions for now—better to get this out there and refactor later than to put too much burden on you for doing it.

There are some new merge conflicts (we moved the config dir), BTW.

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by sleepinyourhat
Monday Aug 26, 2019 at 16:49 GMT


Ready to merge?

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by HaokunLiu
Monday Aug 26, 2019 at 16:55 GMT


Yes, it’s ready.

@jeswan
Copy link
Contributor Author

jeswan commented Sep 17, 2020

Comment by sleepinyourhat
Monday Aug 26, 2019 at 20:06 GMT


Great—I'll make a proper release tm unless someone beats me to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.x.0 release on fix Put out a new 0.x.0 release when this is fixed.
Projects
None yet
Development

No branches or pull requests

1 participant