# Malaria predictor: Image classification problem

# Convolutional Neural Network with Keras in Azure ML Services

## Importing libraries and configuring the Azure ML services

In [1]:
%matplotlib inline
import numpy as np
import os
import matplotlib.pyplot as plt

In [2]:
import azureml
from azureml.core import Workspace

# check core SDK version number
print("Azure ML SDK Version: ", azureml.core.VERSION)

Azure ML SDK Version:  1.0.33


### Defining Azure values in global variables

In [3]:
subscription_id = os.getenv("SUBSCRIPTION_ID", default="83674078-c3fc-41e3-9cf6-93f29065e2a4")
resource_group = os.getenv("RESOURCE_GROUP", default="CapstoneIA")
workspace_name = os.getenv("WORKSPACE_NAME", default="MalariaCNNKeras")
workspace_region = os.getenv("WORKSPACE_REGION", default="northeurope")

### Initialize workspace
Initialize a Workspace object from the existing workspace you created in the Prerequisites step. Workspace.from_config() creates a workspace object from the details stored in config.json.

In [4]:

try:
    ws = Workspace(subscription_id = subscription_id, resource_group = resource_group, workspace_name = workspace_name)
    # write the details of the workspace to a configuration file to the notebook library
    ws.write_config()
    print("Workspace configuration succeeded. Skip the workspace creation steps below")
except:
    print("Workspace not accessible. Creating a new workspace below")
    # Create the workspace using the specified parameters
    ws = Workspace.create(name = workspace_name,
                      subscription_id = subscription_id,
                      resource_group = resource_group, 
                      location = workspace_region,
                      create_resource_group = True,
                      exist_ok = True)
    ws.get_details()

    # write the details of the workspace to a configuration file to the notebook library
    ws.write_config()

Performing interactive authentication. Please follow the instructions on the terminal.
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code FXQDKE5WQ to authenticate.
Interactive authentication successfully completed.
Workspace configuration succeeded. Skip the workspace creation steps below


In [5]:
#ws = Workspace.from_config()
print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep='\n')

Workspace name: MalariaCNNKeras
Azure region: northeurope
Subscription id: 83674078-c3fc-41e3-9cf6-93f29065e2a4
Resource group: CapstoneIA


In [6]:
# write the details of the workspace to a configuration file to the notebook library
ws.write_config()

## Create compute resources for your training experiments
Many of the sample notebooks use Azure ML managed compute (AmlCompute) to train models using a dynamically scalable pool of compute. In this section you will create default compute clusters for use by the other notebooks and any other operations you choose.

To create a cluster, you need to specify a compute configuration that specifies the type of machine to be used and the scalability behaviors. Then you choose a name for the cluster that is unique within the workspace that can be used to address the cluster later.

The cluster parameters are:

- vm_size - this describes the virtual machine type and size used in the cluster. All machines in the cluster are the same type. You can get the list of vm sizes available in your region by using the CLI command az vm list-skus -o tsv
- min_nodes - this sets the minimum size of the cluster. If you set the minimum to 0 the cluster will shut down all nodes while note in use. Setting this number to a value higher than 0 will allow for faster start-up times, but you will also be billed when the cluster is not in use.
- max_nodes - this sets the maximum size of the cluster. Setting this to a larger number allows for more concurrency and a greater distributed processing of scale-out jobs.

To create a CPU cluster now, run the cell below. The autoscale settings mean that the cluster will scale down to 0 nodes when inactive and up to 4 nodes when busy.

In [7]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Choose a name for your CPU cluster
cpu_cluster_name = "ML-VM-DSVM"

# Verify that cluster does not exist already
try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print("Found existing cpucluster")
except ComputeTargetException:
    print("Creating new cpucluster")
    
    # Specify the configuration for the new cluster
    compute_config = AmlCompute.provisioning_configuration(vm_size="STANDARD_DS12_V2",
                                                           min_nodes=0,
                                                           max_nodes=4)

    # Create the cluster with the specified name and configuration
    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)
    
    # Wait for the cluster to complete, show the output log
    cpu_cluster.wait_for_completion(show_output=True)

Found existing cpucluster


### Create/Open an Azure ML experiment
Let's create an experiment named "keras-malaria" and a folder to hold the training scripts. The script runs will be recorded under the experiment in Azure.

In [8]:
from azureml.core import Experiment

script_folder = './keras-malaria'
os.makedirs(script_folder, exist_ok=True)

exp = Experiment(workspace=ws, name='malaria')
print(exp.name)
print(exp.list(ws))

malaria
[Experiment(Name: malaria,
Workspace: MalariaCNNKeras)]


### Upload Malaria dataset to default datastore
A datastore is a place where data can be stored that is then made accessible to a Run either by means of mounting or copying the data to the compute target. A datastore can either be backed by an Azure Blob Storage or and Azure File Share (ADLS will be supported in the future). For simple data handling, each workspace provides a default datastore that can be used, in case the data is not already in Blob Storage or File Share.

In [9]:
ds = ws.get_default_datastore()

n this next step, we will upload the training and test set into the workspace's default datastore, which we will then later be mount on an AmlCompute cluster for training.

In [9]:
ds.upload(src_dir='./data', target_path='malaria', overwrite=True, show_progress=True)

Uploading ./data/x_images_arrays_zip_1000.npz
Uploading ./data/y_infected_labels_1000.npz
Uploaded ./data/y_infected_labels_1000.npz, 1 files out of an estimated total of 2
Uploaded ./data/x_images_arrays_zip_1000.npz, 2 files out of an estimated total of 2


$AZUREML_DATAREFERENCE_b3879516944548e9b31877cc2fcb6b3c

## Get default Compute resource
You can create a compute target for training your model but we will use default AmlCompute type CPU as our  training compute resource.

In [18]:
from azureml.core.compute import ComputeTarget
#compute_target = ws.get_default_compute_target(type="CPU")
compute_target = ComputeTarget(ws, 'cpucluster')
# use get_status() to get a detailed status for the current cluster. 
#print(compute_target.get_status().serialize())
print(compute_target.get_status())

<azureml.core.compute.amlcompute.AmlComputeStatus object at 0x7f7214f2b160>


In [17]:
compute_target = ws.get_default_compute_target(type="CPU")
print(compute_target.get_status())

AttributeError: 'NoneType' object has no attribute 'get_status'

In [12]:
compute_targets = ws.compute_targets
for name, ct in compute_targets.items():
    print(name, ct.type, ct.provisioning_state)

cpucluster AmlCompute Succeeded


In [10]:
cpu_cluster

<azureml.core.compute.dsvm.DsvmCompute at 0x7f92ee04d0f0>

## Copy the training files into the script folder
- Important: Upload the most recent .py file to the current active directory before running the next command. It will move this file to the script_folder where the azure training job will get the file. 

In [11]:
import shutil

# the training logic is in the keras_mnist.py file.
shutil.copy('./train_cnn_gen.py', script_folder)

'./keras-malaria/train_cnn_gen.py'

In [12]:
script_folder

'./keras-malaria'

## Create TensorFlow estimator & add Keras
Next, we construct an azureml.train.dnn.TensorFlow estimator object, use the  compute target, and pass the mount-point of the datastore to the training code as a parameter. The TensorFlow estimator is providing a simple way of launching a TensorFlow training job on a compute target. It will automatically provide a docker image that has TensorFlow installed. In this case, we add keras package (for the Keras framework obviously), and matplotlib package for plotting a "Loss vs. Accuracy" chart and record it in run history.

In [13]:
from azureml.train.dnn import TensorFlow

script_params = {
    '--data-folder': ds.path('malaria').as_download(),
    '--batch-size': 32,
    '--x_filename': 'x_images_arrays_zip_21765.npz',
    '--y_filename': 'y_infected_labels_21765.npz',
    '--training_size': '21765',
    '--n_epochs': 25,
    '--learning_rate': 0.001
}

est = TensorFlow(source_directory=script_folder,
                 script_params=script_params,
                 compute_target=cpu_cluster, 
                 pip_packages=['keras', 'matplotlib'],
                 conda_packages=['scikit-learn'],
                 entry_script='train_cnn_gen.py', 
                 use_gpu=False)

framework_version is not specified, defaulting to version 1.13.


## Submit job to run
Submit the estimator to the Azure ML experiment to kick off the execution.

In [14]:
run = exp.submit(est)

### Monitor the Run
As the Run is executed, it will go through the following stages:

Preparing: A docker image is created matching the Python environment specified by the TensorFlow estimator and it will be uploaded to the workspace's Azure Container Registry. This step will only happen once for each Python environment -- the container will then be cached for subsequent runs. Creating and uploading the image takes about 5 minutes. While the job is preparing, logs are streamed to the run history and can be viewed to monitor the progress of the image creation.

Scaling: If the compute needs to be scaled up (i.e. the AmlCompute cluster requires more nodes to execute the run than currently available), the cluster will attempt to scale up in order to make the required amount of nodes available. Scaling typically takes about 5 minutes.

Running: All scripts in the script folder are uploaded to the compute target, data stores are mounted/copied and the entry_script is executed. While the job is running, stdout and the ./logs folder are streamed to the run history and can be viewed to monitor the progress of the run.

Post-Processing: The ./outputs folder of the run is copied over to the run history

There are multiple ways to check the progress of a running job. We can use a Jupyter notebook widget.

Note: The widget will automatically update ever 10-15 seconds, always showing you the most up-to-date information about the run

In [15]:
from azureml.widgets import RunDetails
RunDetails(run).show()

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

We can also periodically check the status of the run object, and navigate to Azure portal to monitor the run.

In [None]:
run.wait_for_completion(show_output=True)

## Show some metrics from the experiment run

In [13]:
run.get_metrics()

{'Loss': [0.6947016944885254, 0.6735986909866333, 0.5722702510356903],
 'Accuracy': [0.544125, 0.573125, 0.704875],
 'Final test loss': 0.38306117510795595,
 'Final test accuracy': 0.873,
 'Training size': 10000.0,
 'Accuracy vs Loss': 'aml://artifactId/Accuracy vs Loss.png'}

In [27]:
run.get_details()

{'runId': 'malaria_1558194666_179a3f01',
 'target': 'cpucluster',
 'status': 'Completed',
 'startTimeUtc': '2019-05-18T15:51:17.677653Z',
 'endTimeUtc': '2019-05-18T15:51:47.615513Z',
 'properties': {'azureml.runsource': 'experiment',
  'AzureML.DerivedImageName': 'azureml/azureml_04ffe2fa28e82945988e51c8d6a84351',
  'ContentSnapshotId': 'e231c4de-654f-4bd2-ab39-63e7b73f1152',
  'azureml.git.repository_uri': None,
  'azureml.git.branch': None,
  'azureml.git.commit': None,
  'azureml.git.dirty': 'False',
  'azureml.git.build_id': None,
  'azureml.git.build_uri': None,
  'mlflow.source.git.branch': None,
  'mlflow.source.git.commit': None,
  'mlflow.source.git.repoURL': None},
 'runDefinition': {'script': 'train_cnn.py',
  'arguments': ['--data-folder',
   '$AZUREML_DATAREFERENCE_01a180b961394f52819fe2db7d7fea9f',
   '--batch-size',
   '16',
   '--x_filename',
   'x_images_arrays_zip_100.npz',
   '--y_filename',
   'y_infected_labels_100.npz'],
  'sourceDirectoryDataStore': None,
  'fra

In [28]:
run.get_file_names()

['Accuracy vs Loss.png',
 'azureml-logs/55_batchai_execution.txt',
 'azureml-logs/60_control_log.txt',
 'azureml-logs/80_driver_log.txt',
 'logs/azureml/azureml.log',
 'outputs/model/model.h5',
 'outputs/model/model.json']

In [15]:
run.cancel()