# DQ0 SDK Demo
## Prerequistes
* Installed DQ0 SDK. Install with `pip install dq0-sdk`
* Installed DQ0 CLI.
* Proxy running and registered from the DQ0 CLI with `dq0-cli proxy add ...`
* Valid session of DQ0. Log in with `dq0 user login`
* Running instance of DQ0 CLI server: `dq0 server start`

## Concept
The two main structures to work with DQ0 quarantine via the DQ0 SDK are
* Project - the current model environment, a workspace and directory the user can define models in. Project also provides access to trained models.
* Experiment - the DQ0 runtime to execute training runs in the remote quarantine.

Start by importing the core classes

In [None]:
# import dq0-sdk cli
from dq0.sdk.cli import Project, Experiment

## Create a project
Projects act as the working environment for model development.
Each project has a model directory with a .meta file containing the model uuid, attached data sources etc.
Creating a project with `Project.create(name='model_1')` is equivalent to calling the DQ0 Cli command `dq0-cli project create model_1`

In [None]:
# create a project with name 'model_1'. Automatically creates the 'model_1' directory and changes to this directory.
project = Project(name='model_1')

## Load a project
Alternatively, you can load an existing project by first cd'ing into this directory and then call Project.load()
This will read in the .meta file of this directory

In [None]:
%cd model_1

In [None]:
# Alternative: load a project from the current model directory
project = Project.load()

## Create Experiment
To execute DQ0 training commands inside the quarantine you define experiments for your projects.
You can create as many experiments as you like for one project.

In [None]:
# Create experiment for project
experiment = Experiment(project=project, name='experiment_1')

## Get and attach data source
For new projects you need to attach a data source. Existing (loaded) projects usually already have data sources attached.

In [None]:
# first get some info about available data sources
sources = project.get_available_data_sources()

# print info abouth the first source
info = project.get_data_info(sources[0]['uuid'])
info

Get the dataset description:

In [None]:
# print data description
info['description']

Also, inspect the data column types including allowed values for feature generation:

In [None]:
# print information about column types and values
info['types']

And some sample data if available:

In [None]:
# get sample data
project.get_sample_data(sources[0]['uuid'])

Now, attach the dataset to our project

In [None]:
# attach the first dataset
project.attach_data_source(sources[0]['uuid'])

## Define a convolutional neural network for the Cifar-10 image data
Working with DQ0 is basically about defining two functions:
* setup_data() - called right before model training to prepare attached data sources
* setup_model() - actual model definition code
The easiest way to define those functions is to write them in the notebook (inline) and pass them to the project before calling deploy. Alternatively, the user can write the complete user_model.py to the project's directory.

### Define fuctions inline
First variant with functions passed to the project instance. Note that you need to define imports inline inside the functions as only those code blocks are replaced in the source files.

In [None]:
# define functions

def setup_data(self):
    # load input data
    if self.data_source is None:
        logger.error('No data source found')
        return

    X, y = self.data_source.read()

    # check data format
    import pandas as pd
    import numpy as np
    if isinstance(X, pd.DataFrame):
        X = X.values
    else:
        if not isinstance(X, np.ndarray):
            raise Exception('X is not np.ndarray')

    if isinstance(y, pd.Series):
        y = y.values
    else:
        if not isinstance(y, np.ndarray):
            raise Exception('y is not np.ndarray')

    # prepare data
    if y.ndim == 2:
        # make non-dimensional array (just to avoid Warnings by Sklearn)
        y = np.ravel(y)

    self._num_classes = len(np.unique(y))  # np.nan, np.Inf in y are 
    # counted as classes by np.unique

    # encodes target labels with interger values between 0 and 
    # self._num_classes - 1
    self.label_encoder = LabelEncoder()
    y = self.label_encoder.fit_transform(y)

    from sklearn.model_selection import train_test_split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, 
                                                        random_state=42)

    # back to column vector: transform one-dimensional array into column vector
    y_train = y_train[:, np.newaxis]
    y_test = y_test[:, np.newaxis]
    
    # set data member variables
    self.X_train = X_train
    self.X_test = X_test
    self.y_train = y_train
    self.y_test = y_test


def setup_model(self):
    import tensorflow.compat.v1 as tf
    self.optimizer = 'Adam'
    # To set optimizer parameters, instantiate the class:
    #   self.optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
    self.metrics = ['accuracy']
    self.loss = tf.keras.losses.SparseCategoricalCrossentropy()
    # As an alternative, define the loss function with a string
    self.epochs = 50
    self.batch_size = 250
    regularization_param = 1e-3
    regularizer_dict = {
        'kernel_regularizer': tf.keras.regularizers.l2(regularization_param)
    }

    self.model = tf.keras.Sequential()

    # generate convolutional and pooling layers
    self.model.add(tf.keras.layers.Conv2D(32, (5, 5), activation='relu',
                   input_shape=(32, 32, 3), **regularizer_dict))
    self.model.add(tf.keras.layers.MaxPooling2D((2, 2)))
    self.model.add(tf.keras.layers.Conv2D(32, (5, 5), activation='relu',
                   **regularizer_dict))
    self.model.add(tf.keras.layers.MaxPooling2D((2, 2)))

    # stack fully-connected (aka dense) layers on top
    self.model.add(tf.keras.layers.Flatten())
    self.model.add(tf.keras.layers.Dense(128, activation='tanh', **regularizer_dict))
    self.model.add(tf.keras.layers.Dense(self._num_classes, activation='softmax'))

    self.model.summary()


# set model code in project
project.set_model_code(setup_data=setup_data, setup_model=setup_model, 
                       parent_class_name='NeuralNetworkClassification')

### Define functions as source code
Second variant, writing the complete model. Template can be retrieved by `!cat models/user_model.py` which is created by Project create.

In [None]:
%%writefile models/user_model.py

import logging

from dq0.sdk.models.tf import NeuralNetworkClassification

logger = logging.getLogger()


class UserModel(NeuralNetworkClassification):
    """Derived from dq0.sdk.models.tf.NeuralNetwork class

    Model classes provide a setup method for data and model
    definitions.

    Args:
        model_path (:obj:`str`): Path to the model save destination.
    """
    def __init__(self, model_path):
        super().__init__(model_path)

    def setup_data(self):
        """Setup data function. See code above..."""
        pass

    def setup_model(self):
        """Setup model function See code above..."""
        pass


## Train the model
After testing the model locally directly in this notebook, it's time to train it inside the DQ0 quarantine. This is done by calling experiment.train() which in turn calls the Cli commands `dq0-cli project deploy` and `dq0-cli model train`

In [None]:
run = experiment.train()

train is executed asynchronously. You can wait for the run to complete or get the state with get_state:
(TBD: in the future there could by a jupyter extension that shows the run progress in a widget.)

In [None]:
# wait for completion
run.wait_for_completion(verbose=True)

When the run has completed you can retrieve the results:

In [None]:
# get training results
print(run.get_results())

After train dq0 will run the model checker to evaluate if the trained model is safe and allowed for prediction. Get the state of the checker run together with the other state information with the get_state() function:

In [None]:
# get the state whenever you like
print(run.get_state())

## Predict
Finally, it's time to use the trained model to predict something

In [None]:
import numpy as np
import pandas as pd

# get the latest model
model = project.get_latest_model()

# check DQ0 privacy clearing
if model.predict_allowed:

    # get numpy predict data
    predict_data = model.X_test[:10]
    
    y_actual = model.y_test[:10]
    y_actual = model.label_encoder.inverse_transform(np.ravel(y_actual))
    
    # let's visualize the pics
    plt.figure(figsize=(10, 10))
    for i in range(10):
        plt.subplot(np.ceil(num_images_to_plot / 5), 5, i + 1)
        plt.xticks([])
        plt.yticks([])
        plt.grid(False)
        plt.imshow(predict_data[i], cmap=plt.cm.binary)
        plt.xlabel(self.model.data_source.class_names[y_actual[i]])
    plt.show()
    
    # call predict
    run = model.predict(predict_data)

    # wait for completion
    run.wait_for_completion(verbose=True)

In [None]:
# get predict results
y_pred = run.get_results()['predict']
print(y_pred)

Let us quickly assess how good the predictions of our model are by generating the confusion matrix:

In [None]:
import numpy as np
from dq0.sdk.data.utils.plotting import compute_confusion_matrix

y_pred = model.label_encoder.inverse_transform(np.ravel(y_pred))

normalize=False # set to True to have the matrix entries normalized per row 
cm, labels_list = compute_confusion_matrix(y_actual, y_pred, normalize=normalize)

if normalize:
    fmt = '.2f'
else:
    fmt = 'd'

fig, ax = plt.subplots()
if len(labels_list) > 10:
    annot_kws = {'size': 6}  # reduce font size to avoid cluttering
    xticks_rotation = '45'
else:
    annot_kws = None
sns.heatmap(cm, ax=ax, annot=True, cbar=True, fmt=fmt, cmap=cmap,
            annot_kws=annot_kws)

# labels, title and ticks
ax.set_xlabel('Predicted labels')
ax.set_ylabel('Actual labels')
ax.set_title(title)
ax.xaxis.set_ticklabels(labels_list)
ax.yaxis.set_ticklabels(labels_list)

ax.grid(False)

# rotate the tick labels and set their alignment
if xticks_rotation.lower() != 'horizontal'.lower():
    for c_ax in [ax.get_xticklabels(), ax.get_yticklabels()]:
        plt.setp(c_ax, rotation=45, ha="right", rotation_mode="anchor")

fig.tight_layout()
plt.close(fig)