New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add benchmarks to CI #481
Add benchmarks to CI #481
Conversation
@moaradwan has updated the pull request. You must reimport the pull request before landing. |
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@moaradwan has updated the pull request. You must reimport the pull request before landing. |
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@moaradwan has updated the pull request. You must reimport the pull request before landing. |
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@moaradwan has updated the pull request. You must reimport the pull request before landing. |
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
05c3d0b
to
b2911a4
Compare
@moaradwan has updated the pull request. You must reimport the pull request before landing. |
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Results after running on GPUThe following table shows the memory and runtime metrics after running it on CircleCI using Group 1: groupnorm, instancenorm, layernorm, dpmhaThreshold based on paper:
Group 2: Linear layerThreshold based on paper:
Group 3: GSM-DPMHAThreshold based on paper:
Group 4: GRUThreshold based on paper:
Group 5: LSTMThreshold based on paper:
Group 6: RNNThreshold based on paper:
Group 7: EmbeddingThreshold based on paper:
Open points1. Reducing runtime of the jobsRight now I only excluded conv layer since it takes up to an hour when run locally. In total the new tasks increase the pipeline execution time by ~20 minutes, with recurrent layers taking most of that time. This makes the integration pipeline total execution time 32 minutes. Some improvements:
2. Changing thresholdsCurrently I used the paper to infer most of the highlights. The recurrent layer validation has failed though.
|
@moaradwan has updated the pull request. You must reimport the pull request before landing. |
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@moaradwan has updated the pull request. You must reimport the pull request before landing. |
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
f86fe79
to
19f0a28
Compare
@moaradwan has updated the pull request. You must reimport the pull request before landing. |
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
19f0a28
to
b87eca0
Compare
@moaradwan has updated the pull request. You must reimport the pull request before landing. |
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
b87eca0
to
b8b2cac
Compare
@moaradwan has updated the pull request. You must reimport the pull request before landing. |
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@ffuuugor @ashkan-software regarding point number 2 mentioned in #481 (comment) should I just update the threshold of the failing tests to let it pass? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, thanks a lot for this!
On thresholds - yes please, just put the current performance on the CI machine as norm. It might not match the numbers from the paper due to a different hardware - and that's not the point anyway. We want to track the relative changes to the performance, absolute numbers are not that important
On the 30 minutes - it's not a big deal, but I would suggest to move this to nightly tests instead and don't run it for every commit.
I would create a separate job (and not bundle together benchmarks and integration tests) and only run this job on nightly workflow. WDYT?
@moaradwan has updated the pull request. You must reimport the pull request before landing. |
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
ded5a7c
to
82474f1
Compare
@moaradwan has updated the pull request. You must reimport the pull request before landing. |
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@moaradwan has updated the pull request. You must reimport the pull request before landing. |
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
8fb03aa
to
5ee3dd9
Compare
@moaradwan has updated the pull request. You must reimport the pull request before landing. |
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Final results after updating thresholdsThe jobs will run separately under the name
Group 1: GSM of: (groupnorm, instancenorm, layernorm), and DPMHAThreshold based on paper:
Group 2: GSM-Linear layerThreshold based on paper:
Group 3: GSM-DPMHAThreshold based on paper:
Group 4&5: DPGRU and GSM-DPGRUDPGRU:
GSM-DPGRU:
Group 6&7: DLSTM and GSM-DPLSTMDLSTM
GSMDLSTM
Group 8&9: DPRNN and GSM-DPRNNDPRNN:
GSM-DPRNN:
Group 10: EmbeddingThreshold based on paper:
|
@moaradwan has updated the pull request. You must reimport the pull request before landing. |
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@ffuuugor I updated the code as follows:
Consider the comment above for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thank you for this amazing contribution! LGTM
Please note, that once approved, you shouldn't merge PR on github, but rather land the diff on Phabricator, PR will then be closed automatically
Types of changes
Issue: #368
Motivation and Context / Related issue
There's a task #368 for committing benchmark code. In this change I add these benchmarks into CI integration tests. To choose thresholds I ran the benchmarks locally on all the layers with (batch size: 16, num_runs: 100, num_repeats: 20, forward_only: False), please check the comment below for more details.
Using the report and section 3 in the paper, I parameterised the runtime and memory thresholds for different layers.
How Has This Been Tested (if it applies)
circleci config process .circleci/config.yml
circleci local execute --job JOB_NAME
Checklist