Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change batch size on coding tasks to 1 to avoid OOM #654

Merged
merged 6 commits into from
Oct 10, 2023
Merged

Conversation

bmosaicml
Copy link
Contributor

@bmosaicml bmosaicml commented Oct 7, 2023

This changes the yamls to use batch size 1 to avoid OOMs

coding-eval-Nx2M6E

| model_name      |   average |   world_knowledge |   commonsense_reasoning |   language_understanding |   symbolic_problem_solving |   reading_comprehension |   programming |   world_knowledge_lm_task_subscore |   language_understanding_lm_task_subscore |   symbolic_problem_solving_lm_task_subscore |   reading_comprehension_lm_task_subscore |   world_knowledge_lite |   commonsense_reasoning_lite |   language_understanding_lite |   symbolic_problem_solving_lite |   reading_comprehension_lite |   programming_lite |
|:----------------|----------:|------------------:|------------------------:|-------------------------:|---------------------------:|------------------------:|--------------:|-----------------------------------:|------------------------------------------:|--------------------------------------------:|-----------------------------------------:|-----------------------:|-----------------------------:|------------------------------:|--------------------------------:|-----------------------------:|-------------------:|
| mosaicml/mpt-7b |  0.359149 |          0.355904 |                0.383602 |                 0.380392 |                   0.162548 |                 0.33564 |     0.0570501 |                            0.58963 |                                  0.385176 |                                    0.260441 |                                 0.452675 |               0.356409 |                     0.604679 |                      0.708952 |                         0.18653 |                     0.452675 |          0.0740854 |

Printing complete results for all models

| Category                                  | Benchmark                        | Subtask                             |   Accuracy | Number few shot   | Model           |
|:------------------------------------------|:---------------------------------|:------------------------------------|-----------:|:------------------|:----------------|
| world_knowledge_lite                      | jeopardy                         | Average                             |  0.468222  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge_lite                      |                                  | american_history                    |  0.525424  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge_lite                      |                                  | literature                          |  0.571429  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge_lite                      |                                  | science                             |  0.369748  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge_lite                      |                                  | word_origins                        |  0.273973  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge_lite                      |                                  | world_history                       |  0.600536  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge_lm_task_subscore          | bigbench_qa_wikidata             |                                     |  0.711038  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           | arc_easy                         |                                     |  0.722643  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge_lite                      | arc_challenge                    |                                     |  0.433447  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           | mmlu                             | Average                             |  0.280213  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | abstract_algebra                    |  0.28      | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | anatomy                             |  0.237037  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | astronomy                           |  0.309211  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | business_ethics                     |  0.31      | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | clinical_knowledge                  |  0.256604  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | college_biology                     |  0.277778  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | college_chemistry                   |  0.22      | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | college_computer_science            |  0.22      | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | college_mathematics                 |  0.31      | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | college_medicine                    |  0.225434  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | college_physics                     |  0.215686  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | computer_security                   |  0.29      | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | conceptual_physics                  |  0.293617  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | econometrics                        |  0.201754  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | electrical_engineering              |  0.241379  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | elementary_mathematics              |  0.275132  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | formal_logic                        |  0.246032  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | global_facts                        |  0.3       | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | high_school_biology                 |  0.280645  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | high_school_chemistry               |  0.231527  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | high_school_computer_science        |  0.33      | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | high_school_european_history        |  0.290909  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | high_school_geography               |  0.308081  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | high_school_government_and_politics |  0.305699  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | high_school_macroeconomics          |  0.266667  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | high_school_mathematics             |  0.266667  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | high_school_microeconomics          |  0.289916  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | high_school_physics                 |  0.304636  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | high_school_psychology              |  0.249541  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | high_school_statistics              |  0.240741  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | high_school_us_history              |  0.264706  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | high_school_world_history           |  0.270042  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | human_aging                         |  0.367713  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | human_sexuality                     |  0.267176  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | international_law                   |  0.297521  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | jurisprudence                       |  0.324074  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | logical_fallacies                   |  0.220859  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | machine_learning                    |  0.294643  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | management                          |  0.291262  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | marketing                           |  0.264957  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | medical_genetics                    |  0.32      | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | miscellaneous                       |  0.309068  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | moral_disputes                      |  0.291908  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | moral_scenarios                     |  0.251397  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | nutrition                           |  0.303922  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | philosophy                          |  0.337621  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | prehistory                          |  0.342593  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | professional_accounting             |  0.287234  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | professional_law                    |  0.280313  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | professional_medicine               |  0.213235  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | professional_psychology             |  0.272876  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | public_relations                    |  0.327273  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | security_studies                    |  0.285714  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | sociology                           |  0.268657  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | us_foreign_policy                   |  0.3       | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | virology                            |  0.379518  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           |                                  | world_religions                     |  0.263158  | 10-shot           | mosaicml/mpt-7b |
| world_knowledge                           | bigbench_misconceptions          |                                     |  0.520548  | 10-shot           | mosaicml/mpt-7b |
| commonsense_reasoning_lite                | piqa                             |                                     |  0.804679  | 10-shot           | mosaicml/mpt-7b |
| commonsense_reasoning                     | bigbench_novel_concepts          |                                     |  0.5625    | 10-shot           | mosaicml/mpt-7b |
| commonsense_reasoning                     | bigbench_strange_stories         |                                     |  0.66092   | 10-shot           | mosaicml/mpt-7b |
| commonsense_reasoning                     | bigbench_strategy_qa             |                                     |  0.564875  | 10-shot           | mosaicml/mpt-7b |
| language_understanding_lite               | hellaswag                        |                                     |  0.765485  | 10-shot           | mosaicml/mpt-7b |
| language_understanding                    | bigbench_language_identification |                                     |  0.2536    | 10-shot           | mosaicml/mpt-7b |
| language_understanding                    | bigbench_conceptual_combinations |                                     |  0.320388  | 10-shot           | mosaicml/mpt-7b |
| symbolic_problem_solving_lite             | bigbench_elementary_math_qa      |                                     |  0.276494  | 10-shot           | mosaicml/mpt-7b |
| symbolic_problem_solving_lite             | bigbench_dyck_languages          |                                     |  0.318     | 10-shot           | mosaicml/mpt-7b |
| symbolic_problem_solving_lm_task_subscore | bigbench_cs_algorithms           |                                     |  0.478788  | 10-shot           | mosaicml/mpt-7b |
| symbolic_problem_solving                  | bigbench_logical_deduction       |                                     |  0.246     | 10-shot           | mosaicml/mpt-7b |
| symbolic_problem_solving_lite             | bigbench_operators               |                                     |  0.342857  | 10-shot           | mosaicml/mpt-7b |
| symbolic_problem_solving_lite             | bigbench_repeat_copy_logic       |                                     |  0.25      | 10-shot           | mosaicml/mpt-7b |
| symbolic_problem_solving_lite             | simple_arithmetic_nospaces       |                                     |  0.081     | 10-shot           | mosaicml/mpt-7b |
| symbolic_problem_solving_lite             | simple_arithmetic_withspaces     |                                     |  0.092     | 10-shot           | mosaicml/mpt-7b |
| symbolic_problem_solving                  | math_qa                          |                                     |  0.263493  | 10-shot           | mosaicml/mpt-7b |
| symbolic_problem_solving                  | logi_qa                          |                                     |  0.261137  | 10-shot           | mosaicml/mpt-7b |
| reading_comprehension_lite                | pubmed_qa_labeled                |                                     |  0.322     | 10-shot           | mosaicml/mpt-7b |
| reading_comprehension_lite                | squad                            |                                     |  0.583349  | 10-shot           | mosaicml/mpt-7b |
| reading_comprehension                     | bigbench_understanding_fables    |                                     |  0.206349  | 10-shot           | mosaicml/mpt-7b |
| reading_comprehension                     | boolq                            |                                     |  0.747706  | 10-shot           | mosaicml/mpt-7b |
| commonsense_reasoning_lite                | copa                             |                                     |  0.8       | 0-shot            | mosaicml/mpt-7b |
| commonsense_reasoning                     | openbook_qa                      |                                     |  0.418     | 0-shot            | mosaicml/mpt-7b |
| language_understanding_lite               | lambada_openai                   |                                     |  0.70328   | 0-shot            | mosaicml/mpt-7b |
| language_understanding_lite               | winograd                         |                                     |  0.868132  | 0-shot            | mosaicml/mpt-7b |
| language_understanding                    | winogrande                       |                                     |  0.685083  | 0-shot            | mosaicml/mpt-7b |
| language_understanding_lm_task_subscore   | bigbench_conlang_translation     |                                     |  0.0670732 | 0-shot            | mosaicml/mpt-7b |
| programming_lite                          | human_eval                       |                                     |  0.0740854 | 0-shot            | mosaicml/mpt-7b |
| programming                               | human_eval_cpp                   |                                     |  0.0562112 | 0-shot            | mosaicml/mpt-7b |
| programming                               | human_eval_js                    |                                     |  0.0408537 | 0-shot            | mosaicml/mpt-7b |

@bmosaicml bmosaicml merged commit 1bf0a93 into main Oct 10, 2023
11 checks passed
@bmosaicml bmosaicml deleted the fix_coding_yaml branch October 10, 2023 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants