Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paper: Better and faster hyper-parameter optimization with Dask #464

Open
wants to merge 40 commits into
base: 2019
from

Conversation

Projects
None yet
3 participants
@stsievert
Copy link

commented May 21, 2019

Co-authors are Tom Augspurger and Matthew Rocklin. The paper will be built on http://procbuild.scipy.org/download/stsievert-dask-ml-model-selection

@stsievert

This comment has been minimized.

Copy link
Author

commented May 21, 2019

This paper includes some experimental results. As per dask/dask-ml#221 (comment), I plan to re-run these before the conference to squash a bug fixed in dask/dask-ml#497 (but rerunning very likely won't change the plots much).

@NickleDave

This comment has been minimized.

Copy link

commented Jun 11, 2019

Hey @stsievert I'm one of the reviewers for this.
Nice work! Looks like Hyperband will be a very useful addition to Dask-ML.

Some high level things:

  • Would it make sense to move the paragraph introducing Dask from the start of the related work section to the start of the "Adaptive Model Selection in Dask"? It's a little confusing to explain what Dask is and not mention it again for a page, and I think it helps to start the "related work" section with the 10k feet view of hyperparameter optimization
  • Could you say a little more to give an intuitive sense of how Hyperband works when you first introduce it? You say "Hyperband trains many models in parallel and decides to stop models at particular times to preserve computation" and then move on to a sketch of the proof; but as I understand it, some models get stopped at every "iteration" of Hyperband and only the ones whose validation loss is still decreasing at some rate considered promising are kept for the next iteration (right?). I think it would help to explain that only some models stop early while others that seem to be doing better are carried on for more training.
  • I don't find Hyperband docs at the links in the footnotes ... but I'm guessing that's because the PR hasn't been merged yet, and those will work in the future?
  • I notice the plots show loss, which I understand demonstrates how the algorithm works in terms of its derivation, but would it be helpful to show plots with error on a test set? Just to know for sure that the models actually perform better. I saw they do this in the original paper (but that was for massive experiments on many datasets)
  • Would it be possible to provide one more example with a different algorithm?
  • and possibly some toy problem that someone could run somewhat easily when it's local on a laptop? I understand if both of those are a big ask, given that this an algorithm aimed at reducing really expensive hyperparameter search, I just think some more generic example (e.g. a toy convnet classifying Fashion MNIST) might help make the demo more intuitive

After a first pass I would say this 90-95% of the way to meeting all the criteria.
@deniederhut I can provide a more detailed review later this week.

@stsievert stsievert force-pushed the stsievert:dask-ml-model-selection branch from 20291ee to 540e3d4 Jun 12, 2019

@stsievert

This comment has been minimized.

Copy link
Author

commented Jun 12, 2019

Thanks for the review @NickleDave! I've pushed a couple changes that make the paper easier to interpret IMO.

Before the individual responses, here are my TODOs:

  • move Dask paragraph in "prior work" section
  • add paragraphs better explaining Hyperband
  • in experiments: make validation dataset larger; have validation score be more stable
  • in experiments: (possibly) compare with Scikit-learn's RandomizedSearchCV (or similar) on one machine.
  • (for documentation) smaller example in dask-examples
  • (prior to start of conference) get #221 merged for doc fix

move the paragraph introducing Dask ... to start of the "Adaptive Model Selection in Dask"?

Good suggestion. Done.

intuitive [explanation] of how Hyperband works when you first introduce it?

I've edited the paragraphs just under the "Hyperband" section title, and added some words too. I think it better introduces the algorithm. Can you give some more feedback? Thank you.

I don't find Hyperband docs at the links in the footnotes ... but I'm guessing that's because the PR hasn't been merged yet, and those will work in the future?

#221 should be merged shortly. I'll update the link then.

I notice the plots show loss, ... show plots with error on a test set? Just to know for sure that the models actually perform better.

Loss is all I can show because this a regression problem (thanks – some typos fixed). I show validation loss.

Showing test loss instead of validation loss would require significant implementation. I'd rather the validation set be very large to approximate the true error.

Would it be possible to provide one more example with a different algorithm?

Are you asking to compare one more model selection algorithm alongside HyperbandSearchCV, HyperbandSearchCV(..., patience=True) and RandomizedSearchCV?

I can try an equivalent version of RandomizedSearchCV on one machine, but I'm not sure it'll work.

I just think some more generic example (e.g. a toy convnet classifying Fashion MNIST) might help make the demo more intuitive

I think future work will be to implement that example in dask-examples. I'm seeing this as a suggestion for documentation improvements, not as an edit for the paper. Am I reading that right?

(I've also added the appendix with the complete code, so now # complete definition in Appendix means something).

@stsievert stsievert referenced this pull request Jun 14, 2019

Merged

ENH: Hyperband implementation #221

7 of 7 tasks complete
@deniederhut

This comment has been minimized.

Copy link
Member

commented Jun 22, 2019

Hey @stsievert ! It looks like you've been making some edits. Feel free to @ us when you feel this is ready for a second look 😄

@stsievert stsievert changed the title Paper: Better and faster model selection with Dask Paper: Better and faster hyper-parameter optimization with Dask Jun 22, 2019

@stsievert

This comment has been minimized.

Copy link
Author

commented Jun 22, 2019

Thanks @deniederhut! I've added some simulations, re-organized a bit and edited with a fine-toothed comb. I've pushed my changes, and I think they're ready for another look. They're visible at http://procbuild.scipy.org/download/stsievert-dask-ml-model-selection.

The largest edit I've made has been adding the simulations as suggested by @NickleDave. This is a good suggestion, and presents one of my results more cleanly. The resulting figure is a good illustration of this:

Screen Shot 2019-06-22 at 1 32 23 PM

This enables cleaner presentation of my two main takeaways:

  • Hyperband is a principled early stopping scheme for RandomizedSearchCV that finds better hyperparameters with less training
  • Hyperband takes less time in the presence of serial or parallel computational resources.

I am working on improving the graphs that show best score. The graph above will be updated later today has been updated, and the graph on best score vs time will be updated by tomorrow (hopefully with error bars too).

@stsievert

This comment has been minimized.

Copy link
Author

commented Jun 23, 2019

@NickleDave some of your comments have been more useful than I initially thought. With the synthetic result I can present a new result/figure and have implemented a pull request implementing with the new feature (dask/dask-ml#527). The figure is in the paper and also dask/dask-ml#527 (comment).

@NickleDave

This comment has been minimized.

Copy link

commented Jun 24, 2019

I'm glad the comments were helpful.
My first impression is these additional figures do help clarify things.
Please do let us know when you are ready for another review.

@deniederhut how fixed is the deadline of the 25th?
I need to finish something that day but will have a good chunk of time to review Wednesday the 26th

@deniederhut

This comment has been minimized.

Copy link
Member

commented Jun 25, 2019

implemented a pull request implementing with the new feature

Whoa! This is one of the cooler things I've seen come out of our open review process. Come find me at SciPy and I'll buy both of you a beer 😄

how fixed is the deadline

We try not to publicize this, but we build in some wiggle room to the deadlines. If Scott is okay getting the feedback on Wed, then it's okay with me

@stsievert

This comment has been minimized.

Copy link
Author

commented Jun 25, 2019

If Scott is okay getting the feedback on Wed, then it's okay with me

Yup, fine by me. I'll try to reply on Wednesday night.

Please do let us know when you are ready for another review.

I think I'm ready for another review as mentioned in #464 (comment). I'll put in some work to try and update the image denoising figure before Wednesday (though I believe only the image will change).

Come find me at SciPy and I'll buy both of you a beer 😄

👍 🍺 plus GitHub user names will be on the badges this year!

Show resolved Hide resolved papers/scott_sievert/hyperband.rst Outdated
Show resolved Hide resolved papers/scott_sievert/hyperband.rst
evaluates every hyperparameter value Hyperband evaluates.
Parallel Experiments
====================

This comment has been minimized.

Copy link
@stsievert

stsievert Jun 26, 2019

Author

I still need to re-run these simulations – the prioritization of jobs is ready for this now in dask/dask-ml#527

@deniederhut

This comment has been minimized.

Copy link
Member

commented Jun 27, 2019

@NickleDave did you find some time to re-review yesterday?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.