-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update scaling experiment with K=1. #8
Conversation
liehe
commented
Mar 11, 2019
- Update the experiment which benchmarks linear scaling rule and baseline for K=1.
- Delete the cost / throughput figures in the benchmark task results.
- Add speedup plots to the benchmark task results.
thanks a lot!
|
|
indeed |
@@ -101,6 +101,7 @@ Image classification is one of the most important problems in computer vision an | |||
#. **Training Algorithm** | |||
We use standard synchronous SGD as the optimizer (that is distributed mini-batch SGD with synchronous all-reduce communication after each mini-batch). | |||
|
|||
- Model: Resnet 20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add k=1
:align: center | ||
|
||
* The second figure shows speedups of time-to-accuracy for Top-1 accuracy in 70%, 75%, 80%, 85%, 90%, 91%. Note that the 0 speedup means specified accuracy is not reached within the predefined maximum epochs. The linear scaling rule does not outperform baseline for accuracy <= 85%. However, in order to reach 90%+ accuracy, using linear scaling is much better than the baseline. | ||
* The first figure shows speedups of time-to-accuracy for Top-1 accuracy in 70%, 75%, 80%, 85%, 90%, 91%. Note that the 0 speedup means specified accuracy is not reached within the predefined maximum epochs. The same accuracy is (relatively) much slower to reach as the number of machines grows. This is know as the issues of large-batch training. The linear scaling rule does not outperform baseline for accuracy <= 85%. However, to reach 90% or higher accuracy, using linear scaling is obviously better than the baseline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's maybe just focus on two accuracy levels (easy vs hard). but comment on how initial stepsize is chosen (and if failing at larger K can be avoided by tuning it). precise reproducible stepsizes then need to be written into the benchmark task description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Lie. I think we can just keep the plots for accuracy levels 91% and 70% and explain the resullts for the levels in between. We should also explain how we chose the initial learning rate.
ok. also, instead of all raw data please put the data just for the main plot (as a csv say). @Panaetius what do you think? |
@martinjaggi Makes sense. Maybe we should make a repository for just raw data. Not so much to present to the public (i.e. maybe just a small link somewhere in the docs) but for re-use so we don't need to run each experiment again every time we need the data. |
yes. but the plots which are part of the official task results should even
have their own csv / pickle for easier reference and export
…On Tue, Mar 12, 2019 at 2:52 PM Ralf Grubenmann ***@***.***> wrote:
@martinjaggi <https://github.com/martinjaggi> Makes sense. Maybe we
should make a repository for just raw data. Not so much to present to the
public (i.e. maybe just a small link somewhere in the docs) but for re-use
so we don't need to run each experiment again every time we need the data.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEaGR-rqeOvoD8Hlhzmkfb1QRMnpXEhgks5vV7EVgaJpZM4bpjlO>
.
|
hasn't been merged yet. might need to make a small script to generate the official results (time to acc etc) from the raw results. needs to be automatic as it's part of the official benchmark |