<img src="https://cybersecurity-excellence-awards.com/wp-content/uploads/2017/06/366812.png">

<h1><center>Darwin Risk Building (Draft Version) </center></h1>

Prior to getting started, there are a few things you want to do:
1. Enter your username and password to ensure that you're able to log in successfully
2. Set the path to your dataset. If left unfilled, you will be testing an example dataset on the server. 
3. Set the dataset path for feature importance
  - For global feature importance, the dataset path remains the same as your original dataset
  - For individual row's feature importance, you need to specify a path to a dataset that contains no more than 500       rows.

Once you're up and running, here are a few things to be mindful of:
1. For every run, check the job status (i.e. requested, failed, running, completed) and wait for job to complete before proceeding. 
2. If you're not satisfied with your model and think that Darwin can benefit from extra training, use the resume function.

Risk: Given the failure datetime, riskindex will include from 0 (healthy) to 1 (failure) with given leadtime and functional form. Note that until the riskindex becomes 1 the system has not failed yet. Riskindex can be thought as a inverse of the remaining time to the future failure.

# Import necessary libraries

In [None]:
%matplotlib inline
import pandas as pd
from amb_sdk.sdk import DarwinSdk
from time import sleep

# Set Darwin SDK

In [None]:
ds = DarwinSdk()
ds.set_url('https://darwin-api.sparkcognition.com/v1/')

# Register user [if you are not yet resigtered]

In [None]:
"""
api_key = ''
status, msg = ds.auth_login('password', api_key)
if not status:
    print(msg)
status, msg = ds.auth_register_user('username', 'password','email@emailaddress.com')
if not status:
    print(msg)
"""

# Login, User, ONE

In [None]:
status, msg = ds.auth_login_user('username','login')
if not status:
    print(msg)

In [None]:
# Set local path to files - Please update to reflect your machine
path = '/Users/jchoi/darwin-sdk/examples/Risk/'

# Upload data (failure data and timeseries data)

In [None]:
# failure date data
(code, response) = ds.upload_dataset(path+'sets/failures.csv', 'unittest-failures-data')

# timeseries data
(code, response) = ds.upload_dataset(path+'sets/sensor_ts.csv', 'unittest-timeseries-data')

# Create riskindex
1. The lead time is in the unit of second (the example is for half week).
2. The lead time is the half width of the riskindex time period from 0 to 1
2. shutdown_column is the datetime when the system gets failed.
3. return_column is the datetime when the system return back healthy.
4. shutdiwn_column and return_column are in the failure dataset
5. functional_form is the risk function shape including sigmoid, step, linear, exponetial

In [None]:
lead_time = 3600 * 24 * 7 // 2 # half week risk 
(code, response) = ds.create_risk_info('unittest-failures-data', 'unittest-timeseries-data',
                                        return_column="Date Returned to Service",
                                        shutdown_column="Shutdown Date",
                                        lead_time=lead_time, functional_form="sigmoid")

ds.wait_for_job(response['job_name'])

# Download the artifact (riskindex)

In [None]:
status, response = ds.download_artifact(response['artifact_name'])

In [None]:
df_risk = pd.read_csv(response['filename'])

# Read local datasets again for plotting

In [None]:
# read timeseries data for datetime index
df_ts = pd.read_csv(path+'sets/sensor_ts.csv')
df_ts['datetime'] = pd.to_datetime(df_ts['datetime'], errors='coerce')

# concatenate two dfs
df = pd.concat([df_ts, df_risk], axis=1)

# set datetime index
df= df.set_index('datetime')

In [None]:
df['risk'].plot()

In [None]:
# see the failures
df_failure = pd.read_csv(path+'sets/failures.csv')
df_failure

In [None]:
# let's zoom-in the date around 2012-02-10 failure date
df['risk']['2012-01-25':'2012-02-12'].plot()

# Generate a full csv file and Upload for suprevised training
Now we generated the dataset which includes the risk index as a supervised label. 

In [None]:
#View full file upon which you can build a model and convert to csv
df.to_csv("assetrisk.csv")
df.head()

#Upload Full CSV to Darwin for predictions
(code, response) = ds.upload_dataset(path+'assetrisk.csv')

# Build model
Now the dataset becomes the same as supervised model.
One can consult the supervised notebook for more detail

In [None]:
target = "risk"
model = target + "_model0"
status, job_id = ds.create_model(dataset_names = 'assetrisk.csv', \
                                 target = target, \
                                 model_name =  model, \
                                 max_train_time = '00:10'#,\
                                 #feature_eng = 'auc'
                                )
if status:
    ds.wait_for_job(job_id['job_name'])
else:
    print(job_id)

In [None]:
ds.lookup_job_status_name(job_id['job_name'])

In [None]:
#Run Predictions    
status, artifact = ds.run_model('assetrisk.csv', model)
sleep(1)
ds.wait_for_job(artifact['job_name'])

In [None]:
#Get predictions
status, prediction = ds.download_artifact(artifact['artifact_name'])
if status:
    ds.wait_for_job(job_id['job_name'])
else:
    print(job_id)

In [None]:
#Rename
prediction = prediction.rename(columns={target:target+'_pred' })

df = df.reset_index()

# concatenate two df
df = pd.concat([df, prediction], axis=1)

# set datetime index
df= df.set_index('datetime')

In [None]:
#Plot predictions vs actual
df[target].plot()
df[target+'_pred'].plot()
#plt.legend(['Predicted','Actual'])

In [None]:
# Delete all models and datasets
ds.delete_all_datasets()
ds.delete_all_models()