Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
GSoC 2015 Proposal: Global optimization based Hyper parameter optimization
Name: Christof Angermueller
Web : http://cangermueller.com
Github : cangermueller
Twitter : @cangermueller
1 About me
I am a PhD student in Bioinformatics at the University of Cambridge and European Bioinformatics Institute (EBI/EMBL), supervised by Oliver Stegle and Zoubin Ghahramani. My research is about machine learning algorithms for analyzing high-dimensional biological data, which includes Bayesian matrix factorization methods and deep neural networks. I have been using Python regularly for both my research and as general purpose programming language for more than five years, implemented a Python package for variational Bayesian matrix factorization (vbmfa), and aim to contribute to open source development in the long. To start with, I want to work on the following GSoC topic with scikit learn. I decided to contribute to scikit-learn, since a) I use it on a daily basis for various machine learning tasks, b) it has a very active community, and c) allows me to combine my interests in machine learning and software engineering.
Scikit-learn is a suite for various machine learning algorithms. These algorithms have often hyper-parameters, which are not learnt from data, but specified by the user. Optimizing hyper-parameters is critical for generalization performance, and often time consuming, since it requires retraining the machine learning algorithm and evaluating its fitness for different hyper-parameter settings.
Currently, scikit-learn only provides the module
GridSearchCV  for exhaustive grid search, and
RandomizedSearchCV  for randomized hyper-parameter optimization. While the first is intractable if the hyper-parameter space is large, the latter has been outperformed by more sophisticated sequential model based optimization (SMBO) algorithms , which take previous evaluations of the fitness function into account to choose parameters with the highest chance of improvement.
The goal of the proposed GSoC topic is to implement a SMBO algorithm based on Gaussian Processes (GP) as described by Snoek et al. , which a) allows to explore the hyper-parameter space more efficiently and more accurately than
RandomizedSearchCV, b) is well integrated into the scikit-learn infrastructure, and c) is easy to use for transparently optimizing various scikit-learn algorithms.
GPSearchCV (tentative name) will be a new scikit-learn module for hyper-parameter optimization based on Gaussian Processes . It will be similar to the
spearmint python package , but more lightweight and consistent with the scikit-learn framework. It will have same interfaces as
RandomizedSearchCV, and will allow to easily optimize hyper-parameters of different scikit-learn algorithms like SVMs and random forests. Gaussian Processes with Matern 5/2 kernel will be used as prior distribution over the fitness function, and the expected improvement acquisition function will be used to choose the next hyper-parameters. In the first version, kernel parameters will be estimated by maximum likelihood, since integrating over them is currently not supported by the Gaussian Process scikit-learn module, and would also be computationally more costly. Implementing slice sampling for Bayesian treatment of parameters is one of my later goals (see below).
3.2 Unit tests
I will write unit tests to assure that my implementation works correctly for different scenarios.
GPSearchCV module will only become popular if it renders superior result over existing modules. I will therefore compare its performance with
spearmint. I will evaluate their performance to optimize hyper-parameter of different classification and regression algorithms.
I will follow the scikit-learn conventions to document all interfaces, and write code examples on how to use
GPSearchCV. Links to my benchmark results will allow users to judge which optimization module (
GPSearchCV) is most appropriate for their task.
4 Additional features
If time remains after I accomplished all milestones described above, I will work on the following points in the given order:
4.1 Marginalizing over kernel parameters
Bayesian treatment of parameters by marginalizing over them instead of maximizing the likelihood is computationally more costly, but improves generalization performance. I will implement slice sampling to approximate the integral over hyper-parameters, which might also be incorporated into the scikit-learn Gaussian Process module.
4.2 Parallelization of hyper-parameter search
Instead of waiting until the fitness function has been evaluated for one hyper-parameter configuration before trying out new parameters, several configurations can be evaluated in parallel to speed-up the optimization. I will use all completed, and integrate over pending evaluations to approximate the expected acquisition function, from which multiple hyper-parameter configurations can be chosen in parallel.
4.3 Additional acquisition functions
I will implement two alternative acquisition functions, which might be more suited for certain optimization tasks than the expected improvement: 1) the probability of improvement, and 2) the GP upper confidence bound , which has an additional parameter to trade-off exploitation versus exploration.
4.4 Modeling cost for evaluation the fitness function
The time for evaluating the fitness function can vary considerably for different hyper-parameters, e.g. choosing by a different number of hidden layers in a neural network. I will therefore use a second Gaussian Process to model the time for evaluating the fitness function, and to compute the expected improvement per second.
5 Tentative Timeline
Now – 25 May: Learning period
- Submit several PR requests, if possible related to Gaussian Processes and hyper-parameter optimization
- Familiarize with implementation of
- Read more about hyper-parameter optimization and Gaussian Processes
Week 1, 2: Gaussian Processes
- Implement Matern 5/2 Kernel and maximum likelihood parameter estimation
- Extend the existing Gaussian Process module and improve its performance if necessary
Week 3, 4: Acquisition function and optimization pipeline
- Implement expected improvement acquisition function
- Implement search over hyper-parameter space
- Build optimizing pipeline from parts implemented so far
- 1 July: Submit mid-term evaluation
Week 5, 6: First working prototype
- Sanity check of first prototype by comparing results with
- Refactor code
- Test basic features outlined in 3.1
Week 7, 9: Additional Features
- Slice sampling of hyper-parameters
Week 10, 11: Wrap-up
- Profile implementation and improve performance
- Perform benchmarking
Week 12: Documentation
- Document interfaces
- Write usage examples
- 28 August: Submit code examples to Google
 Bergstra et al., “Algorithms for Hyper-Parameter Optimization.”
 Snoek, Larochelle, and Adams, “Practical Bayesian Optimization of Machine Learning Algorithms.”
 Spearmint: https://github.com/JasperSnoek/spearmint
 Srinivas et al., “Gaussian Process Optimization in the Bandit Setting.”