Skip to content
seba-1511 edited this page Apr 11, 2018 · 42 revisions


Introduction

Randopt is a python package that will help you manage and run experiments. On top of that, it provides functionalities for results analysis and hyper-parameter search.

Randopt is still in its early days and we are working on adding more features while keeping the current streamlined workflow.

Overview

Randopt aims to simplify the typical machine learning workflow.



  • Code a new experiment.

  • Run the experiment and obtain results.

  • Analyse results and come up with improved parameters.


It tackles this goal by solving the following three challenges.

Manageable result data It is common for machine learning practitioners to run dozens to thousands of experiments for a given task. Logging all these results is a hassle, especially in a distributed environment. Randopt addresses this problem by assigning to each experiment a folder and dumping all result data in a dedicated JSON file.

Hyperparameter search utilities Finding good hyper-parameters is crucial to any machine learning application. But this facet of machine learning is difficult, time-consuming, and sometimes closer to art than engineering. Randopt solves this issue via automated and flexible hyperparameter search tools.

Experimental analysis While automated tools can bring you a long way, nothing beats careful human analysis. Randopt offers programmatic and Web visualization interfaces to allow you to rapidly swift through your results data and find why your model is not converging.

The rest of this document provides installation instructions and describes the overall pipeline of randopt to solve a simple example. Tutorials in the sidebar provide in-depth coverage of each of randopt's features.

Installation

No dependencies. To install randopt run

pip install randopt

Or if you want to install the latest development version, follow these instructions.

  1. git clone https://github.com/seba-1511/randopt/
  2. cd randopt
  3. git checkout dev
  4. python setup.py develop

Code sources are available on the GitHub repo.

Simple Example

The randopt pipeline closely follows the machine learning workflow described above.

  1. The user annotates its current experiment to record parameters and results.
  2. Upon execution hyperparameters, results, and additional data are stored in JSON summaries.
  3. These summaries are then available for further analysis via the programmatic API or the Web visualization utility.

To illustrate this pipeline, let us work through a simple toy example. Our goal will be to minimize the 2-dimensional quadratic

f = lambda x, y: x**2 + y**2

In order to record results, we import randopt and instanciate an experiment named simple.

import randopt as ro
experiment = ro.Experiment(name='simple')

We're now ready to create our first JSON summary.

x = 3
y = 4
result = f(x, y)
experiment.add_result(result, data={
    'x': x,
    'y': y
})

After calling add_result(), you should be able to find a JSON summary in the newly created directory ./randopt_results/simple/. By default, all experiments will be stored in randopt_results/. This can be changed via the directory argument of the Experiment constructor.

Changing the values of x and y, and running the experiment again will create more JSON summaries. Once multiple summaries are created, you can visualize your results by calling roviz.py on the directory containing your experiments. In our case, this would be

roviz.py randopt_results/simple_example/

which should open your default browser with a visualization page.

Further Readings

The simple example above was meant to demonstrate randopt's rudimentary pipeline. To learn more, we suggest the following resources.

Tutorials

The tutorials will walk you through specific, more advanced features of randopt.

  1. Managing Experiments explains how to work with multiple experiments. In particular it goes through the process of creating JSON summaries, how to deal with parallel/distributed experiments, how to associate experimental attachments to a JSON summaries, and suggests a way to dissociate your experimental results from your version controlled codebase.

  2. Optimizing Hyperparams introduces the hyperparameter optimization utilities. It consists of two parts: the programmatic API and the command line interface (ropt.py). Currently, the supported algorithms are random search, grid search, and a simplified version of evolutionary search.

  3. Visualizing Results covers the visualization of experimental results. It expands on the usage of the Web interface (roviz.py) to build a shareable visualization of results. In the second part, we build a custom visualization script to consistantly monitor the progress of an experiment from the command line.

More Examples

To see some code samples, have a look at the examples folder. A non-exhaustive selection is presented below.

  • multi_params.py expands on the simple tutorial above by introducing parameter samplers and part of the programmatic API.
  • grad_descent.py provides an example of more complex samplers, as well as visualizing list with roviz.py.
  • gs_example.py and evo_example.py demonstrate the usage of grid search and evolutionary search, respectively.

Documentation

The full API documentation is also available on the following pages.

Roadmap

To follow the development of randopt, head to the issue tracker. We also happily welcome contributions, in case your favorite feature is missing.