---
toc: true
layout: post
description: Releasing OpenML Deep learning libraries compatible with keras, pytorch and mxnet.
categories: [openml, deep learning]
title: Reproducible deep learning with OpenML
date: 2020-05-06
---

Deep learning is facing a reproducibility crisis right now[1]. The scale of experiments and there are numerous  hyperparameters that affect performance, which makes it hard for the author to write a reproducibility document. The current best way to make an experiment reproducible is to upload the code. However, that's not optimal in a lot of situations where we have a huge undocumented codebase and someone would like to just reproduce the model.
OpenML[2] is an online machine learning platform for sharing and organizing data, machine learning algorithms and experiments. Until now we only provided support for classical machine learning and libraries like Sklearn and MLR. We see there is a huge need for reproducible deep learning now. To solve this issue OpenML is launching its deep learning plugins for popular deep learning libraries like Keras, MXNet, and Pytorch.

Here we have a small tutorial on how to use our pytorch extension with MNIST dataset. 

**Setup**<br>
To install openml and openml pytorch extension execute this instruction in your terminal
<br>
```pip install openml openml_pytorch```

In [None]:
!pip install openml openml_pytorch

Collecting openml
[?25l  Downloading https://files.pythonhosted.org/packages/68/5b/cd32bb85651eccebfb489cc6ef7f060ce0f62350a6239127e398313090cc/openml-0.10.2.tar.gz (158kB)

[K     |██                              | 10kB 28.5MB/s eta 0:00:01
[K     |████▏                           | 20kB 6.1MB/s eta 0:00:01
[K     |██████▏                         | 30kB 8.6MB/s eta 0:00:01
[K     |████████▎                       | 40kB 10.9MB/s eta 0:00:01
[K     |██████████▎                     | 51kB 7.2MB/s eta 0:00:01
[K     |████████████▍                   | 61kB 8.4MB/s eta 0:00:01
[K     |██████████████▍                 | 71kB 9.6MB/s eta 0:00:01
[K     |████████████████▌               | 81kB 10.7MB/s eta 0:00:01
[K     |██████████████████▌             | 92kB 8.5MB/s eta 0:00:01
[K     |████████████████████▋           | 102kB 9.3MB/s eta 0:00:01
[K     |██████████████████████▊         | 112kB 9.3MB/s eta 0:00:01
[K     |████████████████████████▊       | 122kB 9.3MB/s eta

Let's import the necessary libraries

In [None]:
import torch.nn
import torch.optim
import openml
import openml_pytorch

import logging

Set the apikey for openml python library, you can find your api key in your openml.org account

In [None]:
openml.config.apikey = 'key'

Define a sequential network that does initial image reshaping and normalization model

In [None]:
processing_net = torch.nn.Sequential(
    openml_pytorch.layers.Functional(function=torch.Tensor.reshape,
                                                shape=(-1, 1, 28, 28)),
    torch.nn.BatchNorm2d(num_features=1)
)
print(processing_net)

Sequential(
  (0): Functional()
  (1): BatchNorm2d(1, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)


Define a sequential network that does the extracts the features from the image.

In [None]:
features_net = torch.nn.Sequential(
    torch.nn.Conv2d(in_channels=1, out_channels=32, kernel_size=5),
    torch.nn.LeakyReLU(),
    torch.nn.MaxPool2d(kernel_size=2),
    torch.nn.Conv2d(in_channels=32, out_channels=64, kernel_size=5),
    torch.nn.LeakyReLU(),
    torch.nn.MaxPool2d(kernel_size=2),
)
print(features_net)

Sequential(
  (0): Conv2d(1, 32, kernel_size=(5, 5), stride=(1, 1))
  (1): LeakyReLU(negative_slope=0.01)
  (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (3): Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1))
  (4): LeakyReLU(negative_slope=0.01)
  (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)


Define a sequential network that flattens the features and compiles the results into probabilities for each digit.


In [None]:
results_net = torch.nn.Sequential(
    openml_pytorch.layers.Functional(function=torch.Tensor.reshape,
                                                shape=(-1, 4 * 4 * 64)),
    torch.nn.Linear(in_features=4 * 4 * 64, out_features=256),
    torch.nn.LeakyReLU(),
    torch.nn.Dropout(),
    torch.nn.Linear(in_features=256, out_features=10),
)
print(results_net)

Sequential(
  (0): Functional()
  (1): Linear(in_features=1024, out_features=256, bias=True)
  (2): LeakyReLU(negative_slope=0.01)
  (3): Dropout(p=0.5, inplace=False)
  (4): Linear(in_features=256, out_features=10, bias=True)
)


 The main network, composed of the above specified networks.

In [None]:
model = torch.nn.Sequential(
    processing_net,
    features_net,
    results_net
)
print(model)

Sequential(
  (0): Sequential(
    (0): Functional()
    (1): BatchNorm2d(1, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (1): Sequential(
    (0): Conv2d(1, 32, kernel_size=(5, 5), stride=(1, 1))
    (1): LeakyReLU(negative_slope=0.01)
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1))
    (4): LeakyReLU(negative_slope=0.01)
    (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (2): Sequential(
    (0): Functional()
    (1): Linear(in_features=1024, out_features=256, bias=True)
    (2): LeakyReLU(negative_slope=0.01)
    (3): Dropout(p=0.5, inplace=False)
    (4): Linear(in_features=256, out_features=10, bias=True)
  )
)


Download the OpenML task for the mnist 784 dataset.

In [None]:
task = openml.tasks.get_task(3573)

Run the model on the task and publish the results on openml.org

In [None]:

run = openml.runs.run_model_on_task(model, task, avoid_duplicate_runs=False)

run.publish()

print('URL for run: %s/run/%d' % (openml.config.server, run.run_id))

URL for run: https://www.openml.org/api/v1/xml/run/10452577


By going to the published URL you can check the model performance and other metadata

![](run.png)

We hope that openml deep learning plugins can help in reproducing deep learning experiments and provide a universal reproducibility platform for the experiments.
Here are the links of all supported deep learning plugins right now:

*   MXNet: https://github.com/openml/openml-mxnet

*   Keras: https://github.com/openml/openml-keras

*   Pytorch: https://github.com/openml/openml-pytorch

*   ONNX: https://github.com/openml/openml-onnx

There are examples of how to use these libraries in the Github repos. These libraries are in the development stage right now so we would appreciate any feedback on Github issues of these libraries. Links:

1.   https://www.wired.com/story/artificial-intelligence-confronts-reproducibility-crisis/
2.   https://www.openml.org