<img src="https://cybersecurity-excellence-awards.com/wp-content/uploads/2017/06/366812.png">

<h1><center>Darwin Normal Behavior Modeling (NBM) Example </center></h1>

# Prior to getting started:

First, 
<br>if you have just received a new api key from support, you will need to register your key and create a new user (see Register user cell)

Second, in the Environment Variables cell: 
1. Set your username and password to ensure that you're able to log in successfully
2. Set the path to the location of your datasets if you are using your own data.  The path is set for the examples.

Here are a few things to be mindful of:
1. For every run, check the job status (i.e. requested, failed, running, completed) and wait for job to complete before proceeding. 
2. If you're not satisfied with your model and think that Darwin can do better by exploring a larger search space, use the resume function.

## Set Darwin SDK

In [None]:
from amb_sdk.sdk import DarwinSdk
ds = DarwinSdk()
ds.set_url('https://darwin-api.sparkcognition.com/v1/')

## Register user (if needed, read above)

In [None]:
# Use only if you have a new api-key and 
# no registered users - fill in the appropriate fields then execute

#Enter your support provided api key and api key password below to register/create new users
api_key = ''
api_key_pw = ''
status, msg = ds.auth_login(api_key_pw, api_key)
if not status:
    print(msg)

#Create a new user
status, msg = ds.auth_register_user('username', 'password','email@emailaddress.com')
if not status:
    print(msg)

## Environment Variables

In [None]:
#Set your user id and password accordingly
USER="[your Darwin user id]"
PW="[your Darwin password]"

# Set path to datasets - The default below assumes Jupyter was started from amb-sdk/examples/Enterprise/
# Modify accordingly if you wish to use your own data
PATH_TO_DATASET = '../../sets/'
TRAIN_DATASET = 'wind_turbine.csv'

# A timestamp is used to create a unique name in the event you execute the workflow multiple times or with 
# different datasets.  File names must be unique in Darwin.
import datetime
ts = '{:%Y%m%d%H%M%S}'.format(datetime.datetime.now())

## Import necessary libraries

In [None]:
# Import necessary libraries
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
from IPython.display import Image
from time import sleep
import os
import numpy as np
from sklearn.metrics import r2_score

# User Login

In [None]:
status, msg = ds.auth_login_user(USER,PW)
if not status:
    print(msg)

# Data Upload

**Read dataset and view a file snippet**
<br>After setting up the dataset path, the next step is to upload the dataset from your local device to the server.

In [None]:
# Preview dataset
df = pd.read_csv(os.path.join(PATH_TO_DATASET, TRAIN_DATASET))
df.head()

**Upload dataset to Darwin**

In [None]:
# Upload dataset
status, dataset = ds.upload_dataset(os.path.join(PATH_TO_DATASET, TRAIN_DATASET))
print(status)
print(dataset)

if not status:
    print(dataset)

**Clean dataset**

In [None]:
# clean dataset
status, job_id = ds.clean_data(TRAIN_DATASET)

if status:
    ds.wait_for_job(job_id['job_name'])
else:
    print(job_id)

# Create and Train Model 

We will now build a model that will learn the normal behavior of an asset based on a failure date.<br> The failure date in our example dataset is 8/24/15. <br> You will have to specify a different failure date for your custom dataset. <br> You can also specify a recovery_dates when the asset comes back online.


In [None]:
model = "model" + "-" + ts
status, job_id = ds.create_model(dataset_names = TRAIN_DATASET, \
                                 failure_dates = ['08/24/15'], \
                                 model_name =  model, \
                                 nbm = True, \
                                 max_train_time = '00:10')
if status:
    ds.wait_for_job(job_id['job_name'], time_limit=720)
else:
    print(job_id)

## Extra Training (Optional)
Run the following cell for extra training, no need to specify parameters

In [None]:
# Train some more
status, job_id = ds.resume_training_model(dataset_names = TRAIN_DATASET,
                                          model_name = model,
                                          max_train_time = '00:10')
                                          
if status:
    ds.wait_for_job(job_id['job_name'],time_limit=720)
else:
    print(job_id)

## Analyze Model
Analyze model provides feature importance ranked by the model. <br> It indicates a general view of which features pose a bigger impact on the model

In [None]:
# Retrieve feature importance of built model
#status, artifact = ds.analyze_model(model)
status, analyze_id = ds.analyze_model(job_id['model_name'], 
                                      job_name='Darwin_analyze_model_job-' + ts, 
                                      artifact_name='Darwin_analyze_model_artifact-' + ts)
sleep(1)
if status:
    ds.wait_for_job(analyze_ids['job_name'])
else:
    print(analyze_id)
status, feature_importance = ds.download_artifact(analyze_id['artifact_name'])

In [None]:
status, feature_importance = ds.download_artifact(analyze_id['artifact_name'])

In [None]:
feature_importance[:10]

## Predictions
**Perform model prediction on the the training dataset.**

In [None]:
status, artifact = ds.run_model(TRAIN_DATASET, model)
sleep(1)
ds.wait_for_job(artifact['job_name'])

**Download predictions from Darwin's server.**

In [None]:
status, prediction = ds.download_artifact(artifact['artifact_name'])

Create plots showing the risk index prediction of the model

In [None]:
#Plot the risk predictions
prediction.set_index(pd.to_datetime(df['timestamp']), inplace=True)
prediction.plot()

## Find out which machine learning model did Darwin use:

In [None]:
status, model_type = ds.lookup_model_name(model)
print(model_type['description']['best_genome'])