Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Visualization of model hyperparameter optimization curves
The goal of this project is to provide users of the
mlr package with a way of visualizing what happens during the tuning process that identifies the best hyperparameters for given data. This will enable users to assess the impact of different parameters and provide pointers to authors of learning methods what parameters have an impact in practice and how to improve their approaches.
Many machine learning algorithms have lots of parameters that need to be set in order to achieve optimal performance on a given data set. Doing this manually is a tedious and error-prone task. The
mlr package implements not only a interface to dozens of different learning algorithms in R, but also a set of generic hyperparameter optimisation methods -- given a learner, its parameters and data, it will automatically identify the best parameter setting for the particular case.
While good parameter settings can be determined efficiently,
mlr currently provides no means of visualizing this process. The user is given a result without much explanation of how this result was arrived at. Understanding what happens during the process is not only interesting from the user's point of view, but also crucial for understanding what happens and linking this back to an understanding of the behaviour of the machine learning algorithm on the data. Such understanding can inform improvements for the particular approach.
This project will create visualizations of hyperparameter tuning for
will allow the plotting of a hyperparameter against a scoring function, showing the effect of
tuning the specified hyperparameter. It will furthermore include support for plotting multiple
hyperparameters and scoring functions, along with ablation analysis (a method for identifying the most important parameters).
The path taken from the starting parameter configuration to the end result is stored in an optimization path data structure that is part of the
ParamHelpers package. The data structure should contain all the necessary information, but may need to be extended to accommodate more detail.
The plotting should use
ggvis, in line with the other visualizations in
mlr. Providing interactive functionality, e.g. through
shiny, would be desirable.
Applicants should have:
- Experience using or developing in R, and development tools such as git.
- Experience with visualization methods.
- A background in computer science or engineering will be beneficial.
Implement a simple visualization that plots the points on an optimization path with respect to the achieved performance. The
mlr tutorial gives details on how to get started.
Bernd Bischl (firstname.lastname@example.org) is one of the primary author of mlr and ParamHelpers and has mentored for GSoC before.
Lars Kotthoff (email@example.com) is one of the primary authors of mlr and has mentored for GSoC before.