# Tabular Data Classification with NNI in AML

This simple example is to use NNI NAS 2.0(Retiarii) framework to search for the best neural architecture for tabular data classification task in Azure Machine Learning training platform.

The video demo is https://www.youtube.com/watch?v=PDVqBmm7Cro and https://www.bilibili.com/video/BV1oy4y1W7GF.

## Step 1: Prepare the dataset

The first step is to prepare the dataset. Here we use the Titanic dataset as an example.

In [None]:
!az login

In [2]:
from dataset import TitanicDataset
from nni.retiarii import serialize

train_dataset = serialize(TitanicDataset, root='./data', train=True)
test_dataset = serialize(TitanicDataset, root='./data', train=False)

## Step 2: Define the Model Space

Model space is defined by users to express a set of models that they want to explore, which contains potentially good-performing models. In Retiarii(NNI NAS 2.0) framework, a model space is defined with two parts: a base model and possible mutations on the base model.

### Step 2.1: Define the Base Model

Defining a base model is almost the same as defining a PyTorch (or TensorFlow) model. Usually, you only need to replace the code ``import torch.nn as nn`` with ``import nni.retiarii.nn.pytorch as nn`` to use NNI wrapped PyTorch modules. Below is a very simple example of defining a base model.

In [5]:
import torch
import torch.nn.functional as F
import torch.nn as nn

class Net(nn.Module):

    def __init__(self, input_size):
        super().__init__()

        self.fc1 = nn.Linear(input_size, 16)
        self.bn1 = nn.BatchNorm1d(16)
        self.dropout1 = nn.Dropout(0.0)

        self.fc2 = nn.Linear(16, 16)
        self.bn2 = nn.BatchNorm1d(16)
        self.dropout2 = nn.Dropout(0.0)

        self.fc3 = nn.Linear(16, 2)

    def forward(self, x):

        x = self.dropout1(F.relu(self.bn1(self.fc1(x))))
        x = self.dropout2(F.relu(self.bn2(self.fc2(x))))
        x = torch.sigmoid(self.fc3(x))
        return x
    
model_space = Net(len(train_dataset.__getitem__(0)[0]))

### Step 2.2: Define the Model Mutations

A base model is only one concrete model, not a model space. NNI provides APIs and primitives for users to express how the base model can be mutated, i.e., a model space that includes many models. The following will use inline Mutation APIs as a simple example. 

In [3]:
import torch
import torch.nn.functional as F
import nni.retiarii.nn.pytorch as nn

class Net(nn.Module):

    def __init__(self, input_size):
        super().__init__()

        self.hidden_dim1 = nn.ValueChoice(
            [16, 32, 64, 128, 256, 512, 1024], label='hidden_dim1')
        self.hidden_dim2 = nn.ValueChoice(
            [16, 32, 64, 128, 256, 512, 1024], label='hidden_dim2')

        self.fc1 = nn.Linear(input_size, self.hidden_dim1)
        self.bn1 = nn.BatchNorm1d(self.hidden_dim1)
        self.dropout1 = nn.Dropout(nn.ValueChoice([0.0, 0.25, 0.5]))

        self.fc2 = nn.Linear(self.hidden_dim1, self.hidden_dim2)
        self.bn2 = nn.BatchNorm1d(self.hidden_dim2)
        self.dropout2 = nn.Dropout(nn.ValueChoice([0.0, 0.25, 0.5]))

        self.fc3 = nn.Linear(self.hidden_dim2, 2)

    def forward(self, x):

        x = self.dropout1(F.relu(self.bn1(self.fc1(x))))
        x = self.dropout2(F.relu(self.bn2(self.fc2(x))))
        x = torch.sigmoid(self.fc3(x))
        return x

model_space = Net(len(train_dataset.__getitem__(0)[0]))

Besides inline mutations, Retiarii also provides ``mutator``, a more general approach to express complex model space.

## Step 3: Explore the Defined Model Space

In the NAS process, the search strategy repeatedly generates new models, and the model evaluator is for training and validating each generated model. The obtained performance of a generated model is collected and sent to the search strategy for generating better models.

Users can choose a proper search strategy to explore the model space, and use a chosen or user-defined model evaluator to evaluate the performance of each sampled model.

### Step 3.1: Choose a Search Strategy

In [4]:
import nni.retiarii.strategy as strategy

simple_strategy = strategy.Random()

[2021-05-10 11:53:15] INFO (hyperopt.utils/MainThread) Failed to load dill, try installing dill via "pip install dill" for enhanced pickling support.
[2021-05-10 11:53:15] INFO (hyperopt.fmin/MainThread) Failed to load dill, try installing dill via "pip install dill" for enhanced pickling support.


### Step 3.2: Choose or Write a Model Evaluator

In the context of PyTorch, Retiarii has provided two built-in model evaluators, designed for simple use cases: classification and regression. These two evaluators are built upon the awesome library PyTorch-Lightning.

In [5]:
import nni.retiarii.evaluator.pytorch.lightning as pl

trainer = pl.Classification(train_dataloader=pl.DataLoader(train_dataset, batch_size=16),
                                val_dataloaders=pl.DataLoader(
                                test_dataset, batch_size=16),
                                max_epochs=20)

GPU available: True, used: False
[2021-05-10 11:53:19] INFO (lightning/MainThread) GPU available: True, used: False
TPU available: None, using: 0 TPU cores
[2021-05-10 11:53:19] INFO (lightning/MainThread) TPU available: None, using: 0 TPU cores


## Step 4: Configure the Experiment

After all the above are prepared, it is time to configure an experiment to do the model search. The basic experiment configuration is as follows: 

In [6]:
from nni.retiarii.experiment.pytorch import RetiariiExeConfig, RetiariiExperiment

exp = RetiariiExperiment(model_space, trainer, [], simple_strategy)

exp_config = RetiariiExeConfig('aml')
exp_config.experiment_name = 'titanic_example'
exp_config.trial_concurrency = 2
exp_config.max_trial_number = 20
exp_config.max_experiment_duration = '2h'
exp_config.experiment_working_directory = '' # an absolute path
# exp_config.trial_gpu_number = 1
exp_config.nni_manager_ip = ''  # your nni_manager_ip

Running NNI experiments on the AML(Azure Machine Learning) training service is also simple, you only need to configure the following additional fields:

In [7]:
exp_config.training_service.use_active_gpu = False
exp_config.training_service.subscription_id = '' # your subscription id
exp_config.training_service.resource_group = '' # your resource group
exp_config.training_service.workspace_name = '' # your workspace name
exp_config.training_service.compute_target = '' # your compute target
exp_config.training_service.docker_image = 'kvartet/nnitest:v2.1'  # your docker image

## Step 5: Run and View the Experiment

You can launch the experiment now! 

Besides, NNI provides WebUI to help users view the experiment results and make more advanced analysis.

error message in nnimanager.log:

```
[2021-05-10 11:57:07] ERROR [ 'TypeError: Converting circular structure to JSON\n    at JSON.stringify (<anonymous>)\n    at NNIDataStore.storeTrialJobEvent (C:\\Users\\win10\\anaconda3\\envs\\nni_test\\lib\\site-packages\\nni_node\\core\\nniDataStore.js:59:105)\n    at NNIManager.requestTrialJobsStatus (C:\\Users\\win10\\anaconda3\\envs\\nni_test\\lib\\site-packages\\nni_node\\core\\nnimanager.js:409:38)' ]
```

In [8]:
exp.run(exp_config, 8745)

[2021-05-10 11:53:28] Creating experiment, Experiment ID: ws9bmlnv
[2021-05-10 11:53:28] Connecting IPC pipe...
[2021-05-10 11:53:41] Statring web server...
[2021-05-10 11:53:43] Setting up...
[2021-05-10 11:53:51] Dispatcher started
[2021-05-10 11:53:51] Web UI URLs: http://10.28.211.140:8745 http://169.254.91.194:8745 http://169.254.193.245:8745 http://192.168.30.1:8745 http://192.168.234.1:8745 http://10.28.211.140:8745 http://100.64.161.174:8745 http://127.0.0.1:8745
[2021-05-10 11:53:52] Starting strategy...
[2021-05-10 11:53:52] Random search running in fixed size mode. Dedup: on.
[2021-05-10 11:53:52] Strategy started!
GPU available: True, used: False
[2021-05-10 11:57:02] (lightning) GPU available: True, used: False
TPU available: None, using: 0 TPU cores
[2021-05-10 11:57:02] (lightning) TPU available: None, using: 0 TPU cores
[2021-05-10 11:57:17] Stopping experiment, please wait...
[2021-05-10 11:57:19] Dispatcher exiting...Exception in thread 
Thread-9:
Traceback (most rece

OSError: [Errno 22] Invalid argument

## Step 6: Export the top Model

Exporting the top model script is also very convenient.

In [None]:
print('Final model:')
for model_code in exp.export_top_models():
    print(model_code)