# DQ0 SDK Demo
## Prerequistes
* Installed DQ0 SDK. Install with `pip install dq0-sdk`
* Installed DQ0 CLI.
* Proxy running and registered from the DQ0 CLI with `dq0 proxy add ...`
* Valid session of DQ0. Log in with `dq0 user login`
* Running instance of DQ0 CLI server: `dq0 server start`

## Concept
The two main structures to work with DQ0 quarantine via the DQ0 SDK are
* Project - the current model environment, a workspace and directory the user can define models in. Project also provides access to trained models.
* Experiment - the DQ0 runtime to execute training runs in the remote quarantine.

Start by importing the core classes

In [None]:
# ensure we are in the dq0-sdk directory
%cd ../

In [None]:
# import dq0-sdk api
from dq0.sdk.cli import Project, Experiment

## 1a. Create a project
Projects act as the working environment for model development.
Each project has a model directory with a .meta file containing the model uuid, attached data sources etc.
Creating a project with `Project.create(name='model_1')` is equivalent to calling the DQ0 Cli command `dq0 project create model_1`. Note that this newly created project only exists locally on your filesystem as long as you dont start a run or commit/deploy manually.

In [None]:
# create a project with name 'model_1' and type 'ml', which will provide us a 
# project template for machine learning purposes. 
# Automatically creates the 'model_1' directory and changes to this directory.
project = Project(name='model_1', project_type='ml')

We chose to create this project using the default machine learning template. This will define several entry points for executing runs in an *MLproject* file, including:

- my_model.py:
A model training demo with DQ0's Differential Privacy Module (dq0.makedp). Uses DQ0's census dataset.

- mlflow_quickstart.py:
A simple Python module to run any code with DQ0's managed mlflow.

- keras_model_simple.py:
A simple model training example without using DQ0's Differential Privacy Module (dq0.makedp)

- transform.py:
A basic data transformation module for creating new transformed datasets in DQ0. Uses DQ0's census dataset.

## 1b. Load a project
Alternatively, you can load an existing project by first cd'ing into this directory and then call Project.load()
This will read in the .meta file of this directory

In [None]:
%cd ../dq0-cli/census

In [None]:
# Alternative: load a project from the current model directory
project = Project.load()

In [None]:
project.project_uuid

## 2. Check data sources
New projects, by default, have all available datasets already attached to them. These data sources are typically defined by the data owner.

In [None]:
# first get some info about available data sources
sources = project.get_available_data_sources()

# get info about the first source
info = project.get_data_info(sources[0])
info # make sure this is the census dataset

Get the dataset description:

In [None]:
# print data description
info['data_description']

Also, inspect the data column types including allowed values for feature generation:

In [None]:
# print information about column types and values
info['data_type']

If you want to detach a dataset, type the following:

In [None]:
# if you want to detach some data
# project.detach_data_source(sources[0])

## 3. Create Experiment
To execute DQ0 training commands inside the quarantine you define experiments for your projects.
You can create as many experiments as you like for one project.

In [None]:
# Create experiment for project
experiment = Experiment(project=project, name='experiment_1')

## 4. Create a training run using dq0-makedp
Working with DQ0 is basically about defining two functions:
* setup_data() - called right before model training to prepare attached data sources
* setup_model() - actual model definition code

The easiest way to define those functions is to write them in the notebook (inline) and pass them to the project before starting a run. Alternatively, the user can write the complete user_model.py to the project's directory.

The machine learning project template ('ml') we chose earlier when creating the project already includes these classes for us. They can be found and adjusted in the project folder. We will continue with the template code in this tutorial.

To start a run, we call experiment.run(args, datasets) which in turn calls the Cli commands `dq0 project commit` and `dq0 commit run`.

In [None]:
# set run parameters
args = {
    'DP-epsilon': "1",
    'entry-point': "train_dq0_makedp",
    'job-type': "commit.run.train",
    'module-path': "my_model.py",
}

# define datasets for this run - we choose to run it on only one dataset. 
datasets = [sources[4]]

run = experiment.run(args, datasets=datasets)

train is executed asynchronously. You can wait for the run to complete or get the state with get_state:
(TBD: in the future there could by a jupyter extension that shows the run progress in a widget.)

In [None]:
# wait for completion
run.wait_for_completion(verbose=True)

When the run has completed you can retrieve the results:

In [None]:
# get training results
print(run.get_results())

In [None]:
# if an error occured, you can query it with
run.get_error()

After train dq0 will run the model checker to evaluate if the trained model is safe and allowed for prediction. Get the state of the checker run together with the other state information with the get_state() function:

In [None]:
# get the state whenever you like
print(run.get_state())

## Predict
Finally, it's time to use the trained model to predict something

In [None]:
import numpy as np
import pandas as pd

# check DQ0 privacy clearing
if model.predict_allowed:

    # create predict set
    records = [
        {
            'lastname': 'some-lastname',
            'firstname': 'some-firstname',
            'age': 45,
            'workclass':'Private',
            'fnlwgt': 544091,
            'education': 'HS-grad',
            'education-num': 9,
            'marital-status': 'Married-AF-spouse',
            'occupation': 'Exec-managerial',
            'relationship': 'Wife',
            'race': 'White',
            'sex': 'Female',
            'capital-gain': 0,
            'capital-loss': 0,
            'hours-per-week': 25,
            'native-country': 'United-States',
            'income': '<=50K'
        },
        {
            'lastname': 'some-lastname',
            'firstname': 'some-firstname',
            'age': 29,
            'workclass': 'Federal-gov',
            'fnlwgt': 162298,
            'education': 'Masters',
            'education-num': 14,
            'marital-status': 'Married-civ-spouse',
            'occupation': 'Exec-managerial',
            'relationship': 'Husband',
            'race': 'White',
            'sex': 'Male',
            'capital-gain': 34084,
            'capital-loss': 0,
            'hours-per-week': 70,
            'native-country': 'United-States',
            'income': '<=50K'
        }
    ]
    dataset = pd.DataFrame.from_records(records)
    
    # drop target (included above only because of compatability with preprocess function)
    dataset.drop(['income'], axis=1, inplace=True)

    # load or get numpy predict data
    # predict_data = np.load(‘X_demo_predict.npy’)
    predict_data = dataset.to_numpy()

    # call predict
    #run = model.predict(predict_data)
    run = model.predict(predict_data)

    # wait for completion
    run.wait_for_completion(verbose=True)

In [None]:
# get predict results
print(run.get_results())