Adding Sentence Order Prediction #1061

pruksmhc · 2020-04-11T03:17:19Z

Adding Sentence Order Prediction Task for ALBERT
What this version of MLM supports: ALBERT embedder.
Below are the runs for MLM + SOP + intermediate task (which is how we intend to use SOP)
The results (on ALBERT) are:

Experiment description	Intermediate Task Performance	SOP performance
SST + SOP	0.85	0.95
QQP + SOP	0.95	0.94

…taparallel_metric_calculation Conflicts: jiant/trainer.py jiant/utils/utils.py

…com/nyu-mll/jiant into fix_dataparallel_metric_calculation

pyeres · 2020-04-15T13:49:17Z

Hi @phu-pmh & @pruksmhc — I see this is a "[WIP]". What remains to be done or reviewed?

phu-pmh · 2020-04-15T14:07:27Z

Hi @phu-pmh & @pruksmhc — I see this is a "[WIP]". What remains to be done or reviewed?

I just pushed the instructions for data generation. Other than that, nothing left to be done or reviewed, I think.

tests/tasks/test_sop.py

sleepinyourhat · 2020-04-15T17:01:00Z

Okay—my major concerns have been addressed, though I'd still prefer Phil do the final review.

pruksmhc · 2020-04-16T15:38:00Z

@pyeres if you could take a look that would be great.

pyeres · 2020-04-16T16:31:17Z

Okay—my major concerns have been addressed, though I'd still prefer Phil do the final review.

Per conversation with @sleepinyourhat — my review will be focused on SentenceOrderTask.

scripts/sop/README.md

jiant/tasks/tasks.py

pyeres · 2020-04-17T19:58:46Z

Capturing some notes from conversation with @pruksmhc out-of-band:

There are intentional differences between our implementation of SOP and the TF implementation — inline code comments will be added to call out these differences.
The docstrings will be updated to fully outline the version of the SOP implemented here.
@pruksmhc considers the SOP implementation to have been reviewed for correctness (already, by other reviewers).

@phu-pmh, @sleepinyourhat, @zphang for visibility as contributors/reviewers on this PR — I'll plan to resume review once the docstrings and comments (mentioned above) are added.

sleepinyourhat · 2020-04-22T22:10:32Z

@pruksmhc @phu-pmh I talked through this PR with Phil earlier today, and I think it'll be ready to go with two more smallish fixes:

Fix the docstring issue mentioned here: Adding Sentence Order Prediction #1061 (comment)
Fix the credit assignment issue for the sections of code that use snippets of ALBERT code.

Try to get to these before you do anything else non-emergency on jiant, and we can get this merged in. Thanks!

pyeres

Thanks @pruksmhc and @phu-pmh

pyeres · 2020-04-24T13:49:10Z

Below are the runs for MLM + SOP + intermediate task (which is how we intend to use SOP)

@pruksmhc — I'm not able to locate the run results you mentioned in the PR description (maybe they're buried somewhere else in the thread?). Please repost them in the PR description.

* misc run scripts * sbatch * sweep scripts * update * qa * update * update * update * update * update * sb file * moving update_metrics to outside scope of dataparallel * fixing micro_avg calculation * undo debugging * Fixing tests, moving update_metrics out of other tasks * remove extraneous change * MLM task * Added MLM task * update * fix multiple choice dataparallel forward * update * add _mask_id to transformers * Update * MLM update * adding update_metrics abstraction * delete update_metrics_ notation * fixed wrong index problem * removed unrelated files * removed unrelated files * removed unrelated files * fix PEP8 * Fixed get_pretained_lm_head for BERT and ALBERT * spelling check * black formatting * fixing tests * bug fix * Adding batch_size constraints to multi-GPU setting * adding documentation * adding batch size test * black correct version * Fixing batch size assertion * generalize batch size assertion for more than 2 GPU setting * reducing label loops in code * fixing span forward * Fixing span prediction forward for multi-GPU * fix commonsenseQA forward * MLM * adding function documentation * resolving nits, fixing seq_gen forward * remove nit * fixing batch_size assert and SpanPrediction task * Remove debugging * Fix batch size mismatch multi-GPU test * Fix order of assert checking for batch size mismatch * mlm training * update * sbatch * update * data parallel * update data parallel stuffs * using sequencelabel, using 1 paragraph per example * update label mapping * adding exmaples-porportion-mixing * changing dataloader to work with wikitext103 * weight sampling * add early stopping only onb one task * commit * Cleaning up code * Removing unecessarily tracked git folders * Removing unnecesary changes * revert README * revert README.md again * Making more general for Transformer-based embedders * torch.uint8 -> torch.bool * Fixing indexing issues * get rid of unecessary changes * black cleanup * update * Prevent updating update_metrics twice in one step * update * update * update * Fixing SOP to work with jiant * delete debugging * tying pooler weights from ALBERT * fixed SOP tie weight, and MLM vocab error * dataset update for SOP * removed pdb * Fix ALBERT -> MLM problem, reduce amount of times get_data_iter is called * delete debugging * adding utf-8 encoding * Removing two-layer MLM class hierarchy * MLM indexing bug * fixing MLM error * removed rest of the shifting code * adding * fixing batch[inputs] error * change corpus to wikipedia raw * change corpus to wikipedia raw * Finish merge * style * Revert rest of mlm_weight * Revert LM change * Revert * Merging SOP * Improving documentation * Revert base_roberta * revert unecessary change * Correcting documentation * revert unnecessary changes * Refactoring SOP to make clearer * Adding SOPClassifier * Fixing SOP Task * Adding further documentation * Adding more description of dataset * fixing merge conflict * cleaning up unnecessary files * Making documentation clearer about our implementation of ALBERT SOP * Fix docstring * Refactoring SOP back as a PairClassificationTask, adding more documentation * Adding more documentation, adding process_split * Fix typo in comment * Adding modified SOP code * fixing based on comments * fixing len(current_chunk)==1 condition * fixing len(current_chunk)==1 condition * documentation fix * minor fix * minor fix: tokenizer * minor fix: current_length update * minor fix: current_length update * minor fix * bug fix * bug fix * Fixing document leakage bug * Fixing document delimiting bug * Cleaning up test * Black style * Accurately updating current_length based on when len for_next_chunk > 2 * SOP data generation insturctions * Fix documentation * Fixing docstrings and adding source of code * Fixing typos and data script documentation * Revert merge mistake Co-authored-by: phu-pmh <phumon91@gmail.com> Co-authored-by: Haokun Liu <haokunliu412@gmail.com> Co-authored-by: pruksmhc <pruks22y@mtholyoke.edu> Co-authored-by: DeepLearning VM <google-dl-platform@googlegroups.com>

phu-pmh and others added 30 commits October 30, 2019 15:05

misc run scripts

430f942

sbatch

39603c3

sweep scripts

9b324f9

Merge branch 'master' of https://github.com/nyu-mll/jiant

d3cc769

Merge branch 'master' of https://github.com/nyu-mll/jiant

00bc40c

update

4e297b1

qa

b75d0f5

update

1aadf48

Merge branch 'master' of https://github.com/nyu-mll/jiant

8993b9e

update

a3f10e2

update

aa0d8b4

Merge branch 'master' of https://github.com/nyu-mll/jiant

275d7a3

Merge branch 'master' of https://github.com/nyu-mll/jiant

4b6b939

update

7252ea5

update

f0d9c56

Merge branch 'master' of https://github.com/nyu-mll/jiant

00223c6

sb file

b0a8ec3

moving update_metrics to outside scope of dataparallel

c4d2601

fixing micro_avg calculation

acb9d24

undo debugging

8bdec95

Merge branch 'master' of https://github.com/nyu-mll/jiant

0d879b1

Merge branch 'master' into fix_dataparallel_metric_calculation

4f0a169

Fixing tests, moving update_metrics out of other tasks

5bb8389

Merge branch 'master' of https://github.com/nyu-mll/jiant into fix_da…

fb59ecc

…taparallel_metric_calculation Conflicts: jiant/trainer.py jiant/utils/utils.py

Merge branch 'fix_dataparallel_metric_calculation' of https://github.…

04dbbda

…com/nyu-mll/jiant into fix_dataparallel_metric_calculation

remove extraneous change

3ddf564

MLM task

e588909

Added MLM task

dfa9fd9

update

46182a9

Merge branch 'MLM' of https://github.com/nyu-mll/jiant into MLM

607bcd2

SOP data generation insturctions

821c7b6

Fix documentation

e284ba9

sleepinyourhat reviewed Apr 15, 2020

View reviewed changes

tests/tasks/test_sop.py Show resolved Hide resolved

pruksmhc changed the title ~~Adding Sentence Order Prediction [WIP]~~ Adding Sentence Order Prediction Apr 16, 2020

pyeres reviewed Apr 16, 2020

View reviewed changes

scripts/sop/README.md Outdated Show resolved Hide resolved

pyeres reviewed Apr 16, 2020

View reviewed changes

scripts/sop/README.md Show resolved Hide resolved

pyeres reviewed Apr 17, 2020

View reviewed changes

jiant/tasks/tasks.py Outdated Show resolved Hide resolved

pyeres reviewed Apr 17, 2020

View reviewed changes

jiant/tasks/tasks.py Show resolved Hide resolved

pyeres reviewed Apr 17, 2020

View reviewed changes

jiant/tasks/tasks.py Outdated Show resolved Hide resolved

Merge branch 'master' into add_sop

298330f

Yada Pruksachatkun and others added 6 commits April 22, 2020 16:16

Fixing docstrings and adding source of code

337d53e

Merge branch 'master' into add_sop

4f0fd05

Fixing typos and data script documentation

62d4bba

Merge branch 'add_sop' of https://github.com/nyu-mll/jiant into add_sop

0ff4596

Merge branch 'master' into add_sop

c58bc2c

Revert merge mistake

b5d711b

pyeres approved these changes Apr 24, 2020

View reviewed changes

pruksmhc merged commit ccad92a into master Apr 24, 2020

jeswan mentioned this pull request Sep 17, 2020

[CLOSED] Adding Sentence Order Prediction nyu-mll/jiant-v1-legacy#1061

Closed

jeswan added the jiant-v1-legacy Relevant to versions <= v1.3.2 label Sep 17, 2020

jeswan deleted the add_sop branch September 22, 2020 03:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding Sentence Order Prediction #1061

Adding Sentence Order Prediction #1061

pruksmhc commented Apr 11, 2020 •

edited

pyeres commented Apr 15, 2020

phu-pmh commented Apr 15, 2020

sleepinyourhat commented Apr 15, 2020

pruksmhc commented Apr 16, 2020

pyeres commented Apr 16, 2020

pyeres commented Apr 17, 2020

sleepinyourhat commented Apr 22, 2020

pyeres left a comment

pyeres commented Apr 24, 2020

Adding Sentence Order Prediction #1061

Adding Sentence Order Prediction #1061

Conversation

pruksmhc commented Apr 11, 2020 • edited

pyeres commented Apr 15, 2020

phu-pmh commented Apr 15, 2020

sleepinyourhat commented Apr 15, 2020

pruksmhc commented Apr 16, 2020

pyeres commented Apr 16, 2020

pyeres commented Apr 17, 2020

sleepinyourhat commented Apr 22, 2020

pyeres left a comment

Choose a reason for hiding this comment

pyeres commented Apr 24, 2020

pruksmhc commented Apr 11, 2020 •

edited