Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paper: Better and faster hyper-parameter optimization with Dask #464

Merged
merged 49 commits into from Jul 14, 2019
Merged
Show file tree
Hide file tree
Changes from 38 commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
62327be
Initial draft
stsievert Apr 27, 2019
b9182b9
Firming up first draft
stsievert Apr 29, 2019
701da08
Fleshing out introduction
stsievert Apr 29, 2019
c187268
Add hyperband prior work section
stsievert May 2, 2019
8dc0ff1
Add section on Dask model selection
stsievert May 2, 2019
4dbb4a2
Implements section on patience (along with more edits and imgs)
stsievert May 4, 2019
3222b58
Add images
stsievert May 4, 2019
d363d94
Edits on algorithm, patience, footnotes
stsievert May 4, 2019
9f6f519
Adds abstract, edits algorithm and makes more edits
stsievert May 4, 2019
054e19c
Adds image for IO and moves to imgs/
stsievert May 6, 2019
bbff530
Adds example problem formulation and Hyperband usage
stsievert May 6, 2019
d01afa3
Modifies images (removes title, makes smaller)
stsievert May 8, 2019
eb32aa7
Word smiths
stsievert May 8, 2019
aab2b2d
Make edits and gives URLs/DOIs
stsievert May 12, 2019
b5cac23
Edits
stsievert May 12, 2019
cce1023
Makes dask-searchcv integratation more clear
stsievert May 20, 2019
7c1265a
Adds appendix back in
stsievert Jun 11, 2019
b0a7c4a
Moves Dask section and provides model selection overview
stsievert Jun 11, 2019
7b7ce96
Explains more of Hyperband
stsievert Jun 11, 2019
236d61b
Provides more detail for hyper-parameter choices
stsievert Jun 11, 2019
540e3d4
Further edit hyperband description (and better define pseudo-code)
stsievert Jun 11, 2019
ec60f6e
Makes edits to respond to reviews
stsievert Jun 20, 2019
8d5eddd
Edits abstract and keywords
stsievert Jun 21, 2019
b76979c
model selection → hyper-parameter optimization
stsievert Jun 21, 2019
16038a8
Adds serial simulation
stsievert Jun 22, 2019
de92fcf
Edits and reorganization
stsievert Jun 22, 2019
2d4c009
Adds new graph for synthetic result
stsievert Jun 22, 2019
b1daaa9
hyper-parameter → hyperparameter
stsievert Jun 22, 2019
41a5a50
Edits with a fine-tooth comb
stsievert Jun 22, 2019
4cbaf24
Updates simulation accuracy curve
stsievert Jun 22, 2019
1a949b1
Updates simulations numbers
stsievert Jun 22, 2019
fdc0504
Adds results on priortizing fits
stsievert Jun 23, 2019
219ab7c
Adds note on prioritization
stsievert Jun 23, 2019
c38e299
Wording nit
stsievert Jun 23, 2019
6d41292
Changes language on aggressiveness and priorities
stsievert Jun 23, 2019
74300c7
Reorder figures
stsievert Jun 24, 2019
9a307fc
Edits prioritization figure
stsievert Jun 24, 2019
0bb196e
Updates figure to reflect changes in https://github.com/dask/dask-ml/…
stsievert Jun 26, 2019
c62eafc
Add :corresponding: tag and small wording change
stsievert Jun 26, 2019
35b175a
Adds note about negative losses
stsievert Jun 26, 2019
f4bd0aa
Responds to review
stsievert Jul 1, 2019
3cba3cc
Adds results from parallel experimens
stsievert Jul 3, 2019
75df0ae
Makes modification to parallel exps
stsievert Jul 5, 2019
ea54112
Small edit
stsievert Jul 5, 2019
cefd6aa
Draft; all experiments are there
stsievert Jul 8, 2019
4134271
Bug on how many better
stsievert Jul 10, 2019
2537f01
Small edits
stsievert Jul 12, 2019
ac404ab
Wording changes
stsievert Jul 13, 2019
0a2ef7f
Add ending sentence
stsievert Jul 13, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
950 changes: 950 additions & 0 deletions papers/scott_sievert/hyperband.rst

Large diffs are not rendered by default.

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added papers/scott_sievert/imgs/2019-03-24-calls.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added papers/scott_sievert/imgs/2019-03-24-time.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added papers/scott_sievert/imgs/io+est original.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added papers/scott_sievert/imgs/io+est.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added papers/scott_sievert/imgs/io.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added papers/scott_sievert/imgs/synthetic-dataset.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added papers/scott_sievert/imgs/synthetic-priority.pdf
Binary file not shown.
Binary file added papers/scott_sievert/imgs/synthetic-val-acc.pdf
Binary file not shown.
347 changes: 347 additions & 0 deletions papers/scott_sievert/refs.bib
@@ -0,0 +1,347 @@
@article{bergstra2012random,
author = {Bergstra, James and Bengio, Yoshua},
title = {Random search for hyper-parameter optimization},
journal = {Journal of Machine Learning Research},
volume = {13},
number = {Feb},
pages = {281–281},
year = {2012},
url = {http://jmlr.csail.mit.edu/papers/v13/bergstra12a.html},
abstract = {},
location = {}}

@article{pedregosa2011,
author = {Pedregosa, Fabian and Varoquaux, Gaël and Gramfort, Alexandre and Michel, Vincent and Thirion, Bertrand and Grisel, Olivier and Blondel, Mathieu and Prettenhofer, Peter and Weiss, Ron and Dubourg, Vincent},
title = {Scikit-learn: Machine learning in Python},
journal = {Journal of machine learning research},
volume = {12},
number = {Oct},
pages = {2825–2830},
year = {2011},
url = {http://jmlr.csail.mit.edu/papers/v12/pedregosa11a.html},
}

@article{li2016hyperband,
author = {Lisha Li and Kevin Jamieson and Giulia DeSalvo and Afshin Rostamizadeh and Ameet Talwalkar},
title = {Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization},
journal = {Journal of Machine Learning Research},
year = {2018},
volume = {18},
number = {185},
pages = {1-52},
url = {http://jmlr.org/papers/v18/16-558.html}
}

@proceedings{hutter2011,
author = {Hutter, Frank and Hoos, Holger H and Leyton-Brown, Kevin},
editor = {},
title = {Sequential model-based optimization for general algorithm configuration},
booktitle = {Sequential model-based optimization for general algorithm configuration},
volume = {International Conference on Learning and Intelligent Optimization},
publisher = {Springer},
address = {},
pages = {507-523},
year = {2011},
doi = {10.1007/978-3-642-25566-3_40},
}


@proceedings{bergstra2011,
title = {Algorithms for Hyper-Parameter Optimization},
author = {James S. Bergstra and Bardenet, R\'{e}mi and Bengio, Yoshua and Bal\'{a}zs K\'{e}gl},
booktitle = {Advances in Neural Information Processing Systems 24},
editor = {J. Shawe-Taylor and R. S. Zemel and P. L. Bartlett and F. Pereira and K. Q. Weinberger},
pages = {2546--2554},
year = {2011},
publisher = {Curran Associates, Inc.},
url = {http://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf}
}

@proceedings{snoek2012,
title = {Practical Bayesian Optimization of Machine Learning Algorithms},
author = {Snoek, Jasper and Larochelle, Hugo and Adams, Ryan P},
booktitle = {Advances in Neural Information Processing Systems 25},
editor = {F. Pereira and C. J. C. Burges and L. Bottou and K. Q. Weinberger},
pages = {2951--2959},
year = {2012},
publisher = {Curran Associates, Inc.},
url = {http://papers.nips.cc/paper/4522-practical-bayesian-optimization-of-machine-learning-algorithms.pdf}
}



@inproceedings{kleinbayesopt17,
author = {A. Klein and S. Falkner and N. Mansur and F. Hutter},
title = {RoBO: A Flexible and Robust Bayesian Optimization Framework in Python},
booktitle = {NIPS 2017 Bayesian Optimization Workshop},
year = {2017},
month = dec,
url = {https://github.com/automl/RoBO},
}

@Article{falkner2018,
title = {{BOHB}: Robust and Efficient Hyperparameter Optimization at Scale},
author = {Falkner, Stefan and Klein, Aaron and Hutter, Frank},
booktitle = {Proceedings of the 35th International Conference on Machine Learning},
pages = {1437--1446},
year = {2018},
editor = {Dy, Jennifer and Krause, Andreas},
volume = {80},
series = {Proceedings of Machine Learning Research},
address = {Stockholmsmässan, Stockholm Sweden},
month = {10--15 Jul},
publisher = {PMLR},
pdf = {http://proceedings.mlr.press/v80/falkner18a/falkner18a.pdf},
url = {http://proceedings.mlr.press/v80/falkner18a.html},
}

@Article{klein2016,
author = {Klein, Aaron and Falkner, Stefan and Bartels, Simon and Hennig, Philipp and Hutter, Frank},
title = {Fast bayesian optimization of machine learning hyperparameters on large datasets},
journal = {arXiv preprint arXiv:1605.07079},
volume = {},
number = {},
pages = {},
url = {https://arxiv.org/abs/1605.07079},
year = {2016}}


@Article{tibshirani1996,
author = {Tibshirani, Robert},
title = {Regression shrinkage and selection via the lasso},
journal = {Journal of the Royal Statistical Society: Series B (Methodological)},
volume = {58},
number = {1},
pages = {267–288},
doi = {10.1111/j.2517-6161.1996.tb02080.x},
year = {1996}}

@Article{marquardt1975,
author = { Donald W. Marquardt and Ronald D. Snee },
title = {Ridge Regression in Practice},
journal = {The American Statistician},
volume = {29},
number = {1},
pages = {3-20},
year = {1975},
publisher = {Taylor & Francis},
doi = {10.1080/00031305.1975.10479105}}

@Article{wattenberg2016,
author = {Wattenberg, Martin and Viégas, Fernanda and Johnson, Ian},
title = {How to Use t-SNE Effectively},
journal = {Distill},
year = {2016},
url = {http://distill.pub/2016/misread-tsne},
doi = {10.23915/distill.00002}
}


@Article{kaufmann2015complexity,
author = {Emilie Kaufmann and Olivier Capp{{\'e}} and Aur{{\'e}}lien Garivier},
title = {On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models},
journal = {Journal of Machine Learning Research},
year = {2016},
volume = {17},
number = {1},
pages = {1-42},
url = {http://jmlr.org/papers/v17/kaufman16a.html}
}

@Incollection{bottou2012stochastic,
author = {Bottou, L\'{e}on},
title = {Stochastic Gradient Tricks},
booktitle = {Neural Networks, Tricks of the Trade, Reloaded},
pages = {430--445},
editor = {Montavon, Gr\'{e}goire and Orr, Genevieve B. and M\"{u}ller, Klaus-Robert},
series = {Lecture Notes in Computer Science (LNCS 7700)},
publisher = {Springer},
year = {2012},
url = {http://leon.bottou.org/papers/bottou-tricks-2012},
}

@InProceedings{shamir2013,
title = {Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes},
author = {Ohad Shamir and Tong Zhang},
booktitle = {Proceedings of the 30th International Conference on Machine Learning},
pages = {71--79},
year = {2013},
editor = {Sanjoy Dasgupta and David McAllester},
volume = {28},
number = {1},
series = {Proceedings of Machine Learning Research},
address = {Atlanta, Georgia, USA},
month = {17--19 Jun},
publisher = {PMLR},
pdf = {http://proceedings.mlr.press/v28/shamir13.pdf},
url = {http://proceedings.mlr.press/v28/shamir13.html},
}


@article{prechelt1998automatic,
title={Automatic early stopping using cross validation: quantifying the criteria},
author={Prechelt, Lutz},
journal={Neural Networks},
volume={11},
number={4},
pages={761--767},
year={1998},
publisher={Elsevier},
doi = {10.1016/S0893-6080(98)00010-0},
}

@Incollection{bottou2010large,
author = {Bottou, L\'{e}on},
title = {Large-Scale Machine Learning with Stochastic Gradient Descent},
year = {2010},
booktitle = {Proceedings of the 19th International Conference on Computational Statistics (COMPSTAT'2010)},
editor = {Lechevallier, Yves and Saporta, Gilbert},
address = {Paris, France},
month = {August},
publisher = {Springer},
pages = {177--187},
url = {http://leon.bottou.org/papers/bottou-2010},
}

@inproceedings{paszke2017automatic,
title={Automatic differentiation in PyTorch},
author={Paszke, Adam and Gross, Sam and Chintala, Soumith and Chanan, Gregory and Yang, Edward and DeVito, Zachary and Lin, Zeming and Desmaison, Alban and Antiga, Luca and Lerer, Adam},
booktitle={NIPS-W},
year={2017},
url = {https://openreview.net/pdf?id=BJJsrmfCZ},
}

@article{maaten2008visualizing,
title={Visualizing data using t-SNE},
author={Maaten, Laurens van der and Hinton, Geoffrey},
journal={Journal of machine learning research},
volume={9},
number={Nov},
pages={2579--2605},
year={2008},
url={http://jmlr.csail.mit.edu/papers/v9/vandermaaten08a.html},
}

@Manual{dask,
title = {Dask: Library for dynamic task scheduling},
author = {{Dask Development Team}},
year = {2016},
url = {https://dask.org},
}

@article{gilbert1992global,
title={Global convergence properties of conjugate gradient methods for optimization},
author={Gilbert, Jean Charles and Nocedal, Jorge},
journal={SIAM Journal on optimization},
volume={2},
number={1},
pages={21--42},
year={1992},
publisher={SIAM},
doi={10.1137/0802003},
}

@incollection{maren2015prob,
title = {Probabilistic Line Searches for Stochastic Optimization},
author = {Mahsereci, Maren and Hennig, Philipp},
booktitle = {Advances in Neural Information Processing Systems 28},
editor = {C. Cortes and N. D. Lawrence and D. D. Lee and M. Sugiyama and R. Garnett},
pages = {181--189},
year = {2015},
publisher = {Curran Associates, Inc.},
url = {http://papers.nips.cc/paper/5753-probabilistic-line-searches-for-stochastic-optimization.pdf}
}

@inproceedings{leaky-relu,
title={Rectifier nonlinearities improve neural network acoustic models},
author={Maas, Andrew L and Hannun, Awni Y and Ng, Andrew Y},
booktitle={Proc. icml},
volume={30},
number={1},
pages={3},
year={2013}
}

@inproceedings{relu,
title={Rectified linear units improve restricted boltzmann machines},
author={Nair, Vinod and Hinton, Geoffrey E},
booktitle={Proceedings of the 27th international conference on machine learning (ICML-10)},
pages={807--814},
year={2010}
}

@inproceedings{prelu,
title={Delving deep into rectifiers: Surpassing human-level performance on imagenet classification},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE international conference on computer vision},
pages={1026--1034},
year={2015}
}


@article{elu,
title={Fast and accurate deep network learning by exponential linear units (elus)},
author={Clevert, Djork-Arn{\'e} and Unterthiner, Thomas and Hochreiter, Sepp},
journal={arXiv preprint arXiv:1511.07289},
year={2015}
}

@inproceedings{xavier,
title={Understanding the difficulty of training deep feedforward neural networks},
author={Glorot, Xavier and Bengio, Yoshua},
booktitle={Proceedings of the thirteenth international conference on artificial intelligence and statistics},
pages={249--256},
year={2010}
}

@inproceedings{kaiming,
title={Delving deep into rectifiers: Surpassing human-level performance on imagenet classification},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE international conference on computer vision},
pages={1026--1034},
year={2015}
}

@article{adam,
title={Adam: A method for stochastic optimization},
author={Kingma, Diederik P and Ba, Jimmy},
journal={arXiv preprint arXiv:1412.6980},
year={2014}
}

@Book{nesterov2013a,
author = {Nesterov, Yurii},
title = {Introductory lectures on convex optimization: A basic course},
volume = {87},
pages = {},
editor = {},
publisher = {Springer Science \& Business Media},
address = {},
year = {2013},
doi = {10.1007/978-1-4419-8853-9},
keywords = {}}

@Article{bubeck2015convex,
author = {Bubeck, Sébastien and others},
title = {Convex optimization: Algorithms and complexity},
journal = {Foundations and Trends® in Machine Learning},
volume = {8},
number = {3-4},
pages = {231–231},
year = {2015},
abstract = {},
location = {},
}

@Article{wilson2017b,
author = {Wilson, Ashia C and Roelofs, Rebecca and Stern, Mitchell and Srebro, Nathan and Recht, Benjamin},
title = {The Marginal Value of Adaptive Gradient Methods in Machine Learning},
journal = {arXiv preprint arXiv:1705.08292},
volume = {},
number = {},
pages = {},
year = {2017},
location = {}}