# DQ0 SDK / CLI Demo
## Prerequistes
* Installed DQ0 SDK. Install with `pip install dq0sdk`
* Proxy running and registered from the DQ0 CLI with `dq0-cli proxy add ...`
* Valid session of DQ0. Log in with `dq0 auth login`

To communicate with the DQ0 instance the DQ0 CLI Server needs be running.
Change to the directory of your DQ0 CLI installation:

In [None]:
%cd /path/to/your/cli/installation/cli

Then start the server with the following command:

In [None]:
import subprocess
proc = subprocess.Popen(['./dq0', 'server', 'start'])

## Concept
The two main structures to work with DQ0 quarantine via the DQ0 SDK are
* Project - the current model environment, a workspace and directory the user can define models in. Project also provides access to trained models.
* Experiment - the DQ0 runtime to execute training runs in the remote quarantine.

Start by importing the core classes

In [None]:
# import dq0sdk cli
from dq0sdk.cli import Project, Experiment

## Create a project
Projects act as the working environment for model development.
Each project has a model directory with a .meta file containing the model uuid, attached data sources etc.
Creating a project with `Project.create(name='model_1')` is equivalent to calling the DQ0 Cli command `dq0-cli model create --name model_1`

In [None]:
# create a project with name 'model_1'. Automatically creates the 'model_1' directory and changes to this directory.
project = Project(name='model_1')

## Load a project
Alternatively, you can load an existing project by first cd'ing into this directory and then call Project.load()
This will read in the .meta file of this directory

In [None]:
%cd model_1

In [None]:
# Alternative: load a project from the current model directory
project = Project.load()

## Create Experiment
To execute DQ0 training commands inside the quarantine you define experiments for your projects.
You can create as many experiments as you like for one project.

In [None]:
# Create experiment for project
experiment = Experiment(project=project, name='experiment_1')

## Load or attach data source
For new projects you need to attach a data source. Existing (loaded) projects usually already have data sources attached.

In [None]:
# get list of available data sources
sources = project.get_available_data_sources()

# attach the first dataset
project.attach_data_source(sources[0]['UUID'])

## Define the model
Working with DQ0 is basically about defining two functions:
* setup_data() - called right before model training to prepare attached data sources
* setup_model() - actual model definition code
The easiest way to define those functions is to write them in the notebook (inline) and pass them to the project before calling deploy. Alternatively, the user can write the complete user_model.py to the project's directory.

### Define fuctions inline
First variant with functions passed to the project instance

In [None]:
# define functions

def setup_data():
    # load data
    if len(self.data_sources) < 1:
        logger.error('No data source found')
        return
    source = next(iter(self.data_sources.values()))

    data = source.read()

    from sklearn.model_selection import train_test_split
    X_train_df, X_test_df, y_train_ts, y_test_ts =\
        train_test_split(data.iloc[:, :-1],
                         data.iloc[:, -1],
                         test_size=0.33,
                         random_state=42)
    self.input_dim = X_train_df.shape[1]

    # set data member variables
    self.X_train = X_train_df
    self.X_test = X_test_df
    self.y_train = y_train_ts
    self.y_test = y_test_ts
    
def setup_model():
    from tensorflow import keras
    self.learning_rate = 0.3
    self.epochs = 5
    self.num_microbatches = 1
    self.model = keras.Sequential([
        keras.layers.Input(self.input_dim),
        keras.layers.Dense(10, activation='tanh'),
        keras.layers.Dense(10, activation='tanh'),
        keras.layers.Dense(2, activation='softmax')])
    
def preprocess():
    # some data preprocessing
    pass
    
# set model code in project
project.set_model_code(setup_data=setup_data, setup_model=setup_model, parent_class_name='NeuralNetwork')

# set data code in project
project.set_data_code(preprocess=preprocess, parent_class_name='CSVSource')

### Define functions as source code
Second variant, writing the complete model. Template can be retrieved by `!cat models/user_model.py` which is created by Project create.

In [None]:
%%writefile models/user_model.py

import logging

from dq0sdk.models.tf.neural_network import NeuralNetwork

from sklearn.model_selection import train_test_split

from tensorflow import keras

logger = logging.getLogger()


class UserModel(NeuralNetwork):
    def __init__(self, model_path):
        super().__init__(model_path)
        self.learning_rate = 0.3
        self.epochs = 5
        self.num_microbatches = 1
        self.verbose = 0
        self.metrics = ['accuracy', 'mse']
        self.input_dim = None

    def setup_data(self):
        # load data
        if len(self.data_sources) < 1:
            logger.error('No data source found')
            return
        source = next(iter(self.data_sources.values()))

        data = source.read()

        X_train_df, X_test_df, y_train_ts, y_test_ts =\
            train_test_split(data.iloc[:, :-1],
                             data.iloc[:, -1],
                             test_size=0.33,
                             random_state=42)
        self.input_dim = X_train_df.shape[1]

        # set data member variables
        self.X_train = X_train_df
        self.X_test = X_test_df
        self.y_train = y_train_ts
        self.y_test = y_test_ts

    def setup_model(self):
        self.model = keras.Sequential([
            keras.layers.Input(self.input_dim),
            keras.layers.Dense(10, activation='tanh'),
            keras.layers.Dense(10, activation='tanh'),
            keras.layers.Dense(2, activation='softmax')])


Do the same for data/user_source.py to define the preprocess() function.

## Train the model
After testing the model locally directly in this notebook, it's time to train it inside the DQ0 quarantine. This is done by calling experiment.train() which in turn calls the Cli commands `dq0-cli model deploy` and `dq0-cli model train`

In [None]:
run = experiment.train()

train is executed asynchronously. You can wait for the run to complete or get the state with get_state:
(TBD: in the future there could by a jupyter extension that shows the run progress in a widget.)

In [None]:
# wait for completion
run.wait_for_completion(verbose=True)

# or get the state whenever you like
print(run.get_state())

When the run has completed you can retrieve the results:

In [None]:
# get training results
print(run.get_results())

## Predict
Finally, it's time to use the trained model to predict something

In [None]:
import numpy as np

# get the latest model
model = project.get_latest_model()

# check DQ0 privacy clearing
if model.predict_allowed:

    # call predict
    run = model.predict(np.array([1, 2, 3]))

    # wait for completion
    run.wait_for_completion(verbose=True)

    # get training results
    print(run.get_results())