Add opacus microbenchmarks code #368

ffuuugor · 2022-03-11T14:24:44Z

Merge the code used for opacus microbenchmarks into opacus repo

Summary: ## Types of changes - [ ] Bug fix (non-breaking change which fixes an issue) - [X] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Docs change / refactoring / dependency upgrade ## Motivation and Context / Related issue There's a task pytorch#368 for committing benchmark code. In this change I add these benchmarks into CI integration tests. To choose thresholds I ran the benchmarks locally on all the layers with (batch size: 16, num_runs: 100, num_repeats: 20, forward_only: False), and generated the following report: | | memory* | memory* | memory* | memory* | memory* | runtime | runtime | runtime | runtime | runtime | |--------------|---------|--------|------------|--------|-------------|------------------------|----------------------|--------------------|------------------------|--------------------| | value | control | dp | dp/control | gsm | gsm/control | control | dp | dp/control | gsm | gsm/control | | base_layer | | | | | | | | | | | | conv | 0.0 | | | 0.0 | | 2.021756922606001 | | | 3.2889059911645036 | 1.6267563891534373 | | embedding | 0.0 | | | 0.0 | | 0.002484286398502263 | | | 0.013664713416999803 | 5.5004581698946 | | groupnorm | 0.0 | | | 0.0 | | 0.0001871487290072764 | | | 0.00043170701800136156 | 2.306759016165034 | | gru | 0.0 | 0.0 | | 0.0 | | 0.045029744959007065 | 0.057370035271503174 | 1.2740475284443677 | 0.2402042072270033 | 5.334345274344187 | | instancenorm | 0.0 | | | 0.0 | | 0.004493124293996517 | | | 0.006058429501005777 | 1.3483779002287433 | | layernorm | 0.0 | | | 0.0 | | 0.00011227587499979562 | | | 0.0002241125804985131 | 1.9960884784814286 | | linear | 0.0 | | | 0.0 | | 0.001010556231000001 | | | 0.003052972127999998 | 3.021080900148341 | | lstm | 0.0 | 0.0 | | 0.0 | | 0.052634652085002925 | 0.06508583683050075 | 1.2365586975931682 | 0.2982182763324963 | 5.665816425477371 | | mha | 0.0 | 0.0 | | 0.0 | | 0.018872260358001765 | 0.01870937360499738 | 0.9913689854890476 | 0.02688384014700477 | 1.424516175435558 | | rnn | 0.0 | 0.0 | | 0.0 | | 0.01576623683249454 | 0.02184348723049516 | 1.3854597937711604 | 0.10178373254250346 | 6.455803856296582 | (*) This report wasn't generated on a machine with CUDA so the memory wasn't measured. Will update later when it runs in CI on a GPU machine. Using the report and section 3 in the [paper](https://arxiv.org/pdf/2109.12298.pdf), I parameterised the runtime and memory thresholds for different layers. ## How Has This Been Tested (if it applies) I ran the jobs locally and generated reports. ## Checklist - [X] The documentation is up-to-date with the changes I made. - [X] I have read the **CONTRIBUTING** document and completed the CLA (see **CONTRIBUTING**). - [ ] All tests passed, and additional code has been covered with new tests. Pull Request resolved: pytorch#479 Differential Revision: D38999201 Pulled By: moaradwan fbshipit-source-id: 3d02931970e39ea331674c9f0676db9e22c5edaa

Summary: ## Types of changes - [ ] Bug fix (non-breaking change which fixes an issue) - [X] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Docs change / refactoring / dependency upgrade Issue: #368 ## Motivation and Context / Related issue There's a task #368 for committing benchmark code. In this change I add these benchmarks into CI integration tests. To choose thresholds I ran the benchmarks locally on all the layers with (batch size: 16, num_runs: 100, num_repeats: 20, forward_only: False), ~and generated the following report:~ (please check the [comment below](#481 (comment))) Using the [report](#481 (comment)) and section 3 in the [paper](https://arxiv.org/pdf/2109.12298.pdf), I parameterised the runtime and memory thresholds for different layers. ## How Has This Been Tested (if it applies) - I ran the jobs locally and generated reports. - Local CircleCI config validation `circleci config process .circleci/config.yml` - Local CircleCI job run: `circleci local execute --job JOB_NAME` ## Checklist - [X] The documentation is up-to-date with the changes I made. - [X] I have read the **CONTRIBUTING** document and completed the CLA (see **CONTRIBUTING**). - [x] All tests passed, and additional code has been covered with new tests. Pull Request resolved: #481 Reviewed By: ffuuugor Differential Revision: D39026827 Pulled By: moaradwan fbshipit-source-id: 5c6a7b1b5faaca9e5e95bd172b479a02f59b2b69

ffuuugor assigned jessijzhao Mar 11, 2022

ffuuugor changed the title ~~Microbenchmarks~~ Merge microbenchmarks Mar 11, 2022

ffuuugor added the enhancement New feature or request label Mar 11, 2022

ffuuugor mentioned this issue Mar 11, 2022

Integrate microbenchmarks into CI pipeline #369

Closed

ffuuugor added this to the 1.1.1 milestone Mar 25, 2022

karthikprasad changed the title ~~Merge microbenchmarks~~ Add opacus microbenchmarks code Apr 3, 2022

jessijzhao mentioned this issue Apr 11, 2022

Add micro-benchmarks #412

Merged

7 tasks

karthikprasad closed this as completed Apr 15, 2022

This was referenced Aug 25, 2022

Add benchmarks to CI #479

Closed

Add benchmarks to CI #481

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add opacus microbenchmarks code #368

Add opacus microbenchmarks code #368

ffuuugor commented Mar 11, 2022

Add opacus microbenchmarks code #368

Add opacus microbenchmarks code #368

Comments

ffuuugor commented Mar 11, 2022