# Network Operations

In this notebook we will assemble our project.  We will explore different functions on our dataset and compile them into a workflow ready for production.

The functions we will use will be a mix of `hub` based functions from our [MLRun Functions](http://github.com/mlrun/functions) repo, local and git based notebooks.

> The notebook should be run after generating the data in the [Generator Notebook](generator.ipynb)


we will start by setting up our environment, Loading MLRun and some utilities we will need

In [4]:
# Utils
import os
import json
import urllib
import numpy as np

# MLRun imports
from mlrun import mlconf

# Setup API Endpoint
mlconf.dbpath = 'http://mlrun-api:8080'

Now lets define our current project 

## Create a project from a git repository

In [5]:
from mlrun import new_project

# update the dir and repo to reflect real locations 
# the remote git repo must be initialized in GitHub
project_dir = os.path.abspath('./')
remote_git = 'https://github.com/mlrun/demo-network-operations.git'

# Create the project
newproj = new_project('network-operations', project_dir, init_git=True)

# We can update our project directory to the latest status by running
# newproj.pull()

Now that we have our project directory, lets forword our artifacts there to keep track of them

In [6]:
# Define an artifact path to keep track of where our artifacts are going
ARTIFACT_PATH =  os.path.join(os.path.abspath(newproj.context), 'artifacts')
mlconf.artifact_path = ARTIFACT_PATH

In [7]:
mlconf.hub_url='/User/functions/{name}/function.yaml'

## Create and run functions

As we receive a new dataset, the first thing we would like to do is to explore it a bit, we can do that using our `describe` function in `mlrun/functions`

In [8]:
from mlrun import mount_v3io, new_model_server

In [9]:
# Import the functions
# Functions From hub
newproj.set_function(func='hub://aggregate', name='aggregate')
newproj.set_function(func='hub://describe', name='describe')
newproj.set_function(func='hub://feature_selection', name="feature_selection")
newproj.set_function(func='hub://sklearn_classifier', name='train')
newproj.set_function(func='hub://test_classifier', name='test')
newproj.set_function(func='hub://model_server', name="serving")
newproj.set_function(func='hub://model_server_tester', name="model_server-tester")
newproj.set_function(func='hub://concept_drift', name="concept_drift")
newproj.set_function(func='hub://stream_to_parquet', name="s2p")
newproj.set_function(func='hub://virtual_drift', name="virtual_drift")

<mlrun.runtimes.kubejob.KubejobRuntime at 0x7fa5719402b0>

## Generate the dataset
If needed go to [Generator](./generator.ipynb) and run the local workflow to generate the metrics dataset to `data/metrics`

## Run the functions locally to develop the workflow

now we can **Run** the function locally on our sample data, we would like to get some details on our `raw` data

## Register raw data as project level artifact

In [None]:
# Define base Dataset
import random
data_dir = os.path.join(os.path.abspath(newproj.context), 'data')
dataset_filename = random.choice(list(filter(lambda x: x.endswith('parquet'), os.listdir(data_dir))))
metrics_path = os.path.join(data_dir, dataset_filename)

import pandas as pd
# Drop alternate error columns
label_column = 'is_error'
raw = pd.read_parquet(metrics_path)
raw = raw.drop([col for col in raw.columns if (col != label_column) & (col.endswith(label_column))], axis=1)

from mlrun.artifacts import DatasetArtifact
metrics_artifact = DatasetArtifact('metrics', raw, format='pq', target_path=os.path.join(data_dir, 'metrics.pq'))
newproj.log_artifact(metrics_artifact)

### Get statistics about the metrics data

In [8]:
from mlrun import NewTask, mount_v3io

In [9]:
describe_task = NewTask(
    name="describe", 
    handler="summarize",  
    params={"key": "summary", 
            "label_column": label_column, 
            'class_labels': ['0', '1'],
            'plot_hist': True,
            'plot_dest': 'plots-metrics'},
    inputs={"table": 'store://network-operations/metrics'},
    artifact_path=ARTIFACT_PATH)

In [10]:
decsribe_run = newproj.func('describe').apply(mount_v3io()).run(describe_task)

[mlrun] 2020-05-24 08:08:20,681 starting run describe uid=24351772d2b44cb9b31f5c2e3b6a0e16  -> http://mlrun-api:8080
[mlrun] 2020-05-24 08:08:21,235 Job is running in the background, pod: describe-w7k5j
[mlrun] 2020-05-24 08:08:36,108 starting local run: main.py # summarize
[mlrun] 2020-05-24 08:08:40,946 Loaded dataset
[mlrun] 2020-05-24 08:08:40,947 Using 5768 samples
[mlrun] 2020-05-24 08:08:57,900 log artifact histograms at /User/demo-network-operations/artifacts/plots/hist.html, size: 273037, db: Y
[mlrun] 2020-05-24 08:09:00,754 log artifact imbalance at /User/demo-network-operations/artifacts/plots/imbalance.html, size: 16108, db: Y
[mlrun] 2020-05-24 08:09:01,623 log artifact correlation at /User/demo-network-operations/artifacts/plots/corr.html, size: 31478, db: Y

[mlrun] 2020-05-24 08:09:01,969 run executed, status=completed
final state: succeeded


project,uid,iter,start,state,name,labels,inputs,parameters,results,artifacts
network-operations,...3b6a0e16,0,May 24 08:08:40,completed,describe,v3io_user=adminkind=jobowner=adminhost=describe-w7k5j,table,"key=summarylabel_column=is_errorclass_labels=['0', '1']plot_hist=Trueplot_dest=plots-metrics",scale_pos_weight=11.27,histogramsimbalancecorrelation


to track results use .show() or .logs() or in CLI: 
!mlrun get run 24351772d2b44cb9b31f5c2e3b6a0e16 --project network-operations , !mlrun logs 24351772d2b44cb9b31f5c2e3b6a0e16 --project network-operations
[mlrun] 2020-05-24 08:09:11,275 run executed, status=completed


In [14]:
pd.read_parquet('data/metrics.pq')

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,cpu_utilization,latency,packet_loss,throughput,is_error
timestamp,company,data_center,device,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2020-05-20 10:16:35.920,Wade-Roberts,Randy_Keys,8919132955287,77.815499,4.456389,0.000000,243.271095,False
2020-05-20 10:16:35.920,Wade-Roberts,Randy_Keys,0710764644526,65.603641,0.000000,0.631231,277.539436,False
2020-05-20 10:16:35.920,Wade-Roberts,Walters_Stream,6020184338581,76.170562,1.916736,0.000000,246.517323,False
2020-05-20 10:16:35.920,Wade-Roberts,Walters_Stream,0413547681017,87.475039,5.261735,0.000000,287.652382,False
2020-05-20 10:16:35.920,Fernandez__Santana_and_Maddox,Miguel_Harbors,1999552611743,74.220483,0.000000,0.580491,250.257507,False
...,...,...,...,...,...,...,...,...
2020-05-20 11:16:35.920,Wade-Roberts,Walters_Stream,0413547681017,87.559992,0.000000,1.170780,233.372520,False
2020-05-20 11:16:35.920,Fernandez__Santana_and_Maddox,Miguel_Harbors,1999552611743,72.517446,7.954549,0.000000,271.094488,False
2020-05-20 11:16:35.920,Fernandez__Santana_and_Maddox,Miguel_Harbors,1574258149595,66.221307,3.390972,1.589218,240.394251,False
2020-05-20 11:16:35.920,Fernandez__Santana_and_Maddox,Gomez_Path,0319937899106,92.502891,4.810873,0.157370,224.640897,False


### Create the feature vector

We will use our [Aggregate](https://github.com/mlrun/functions/blob/master/aggregate/aggregate.ipynb) function to create rolling window features for our feature vector.

In doing so we hope that we could help our algorithms identify local errors by using a windowed trend

In [17]:
# Define aggregate task
aggregate_task = NewTask(
    name='aggregate',
    params={'metrics': ['cpu_utilization', 'throughput', 'packet_loss', 'latency'],
            'metric_aggs': ['mean', 'sum', 'std', 'var', 'min', 'max', 'median'],
            'suffix': 'daily',
            'append_to_df': True,
            'window': 20,
            'center': False,
            'save_to': os.path.join('data', 'aggregate.pq'),
            'drop_na': True},
    inputs={'df_artifact': '/User/demo-network-operations/data/metrics.pq'},
    handler='aggregate')

In [None]:
aggregate_run = newproj.func('aggregate').apply(mount_v3io()).run(aggregate_task)

### Get statistics about the feature vector

In [84]:
aggregate_describe_task = NewTask(
    name="describe-aggregate", 
    handler="summarize",  
    params={"key": "summary", 
            "label_column": label_column, 
            'class_labels': ['0', '1'],
            'plot_hist': True,
            'plot_dest': 'plots-aggregate',
            'sample': 0.3},
    inputs={"table": aggregate_run.outputs['aggregate']},
    artifact_path=ARTIFACT_PATH)

In [None]:
aggregate_decsribe_run = newproj.func('describe').apply(mount_v3io()).run(aggregate_describe_task)

In [31]:
%env V3IO_API

'v3io-webapi.default-tenant.svc:8081'

In [56]:
df = pd.read_parquet('artifacts/data/aggregate.pq')

## Create workflow to train the model
After reviewing the data and creating the feature vector we move to training our model.  
For this task we will use an **LGBM** classifier.  To control the training process we will supply a `model_config` dictionary with the following parameters:
- **CLASS**: Model-specific parameters.
- **FIT**: Training parameters (like epoch when needed)
- **META**: Model and Package version

### Setup model configurations

In [7]:
model_configs = {
    "CLASS" : {
        "boosting_type"      : "gbdt",
        "num_leaves"         : 300,
        "max_depth"          : 50,
        "learning_rate"      : 0.1,
        "n_estimators"       : 300,
        "objective"          : "binary",
        "scale_pos_weight"   : 1,    
        "min_split_gain"     : 0.0,
        "min_child_samples"  : 20,
        "subsample"          : 1,
        "colsample_bytree"   : 1,
        "reg_alpha"          : 0,
        "reg_lambda"         : 1,
        "n_jobs"             : 16,
        "silent"             : True,
        "importance_type"    : "split",
        "random_state"       : 1},
    "FIT" : {
        "verbose"               : False
    },
    "META" : {
        "class" : "lightgbm.sklearn.LGBMClassifier",
        "version" : "2.3.1"
    }
}
model_config_path = os.path.join(os.path.abspath(newproj.context), 'data', 'lgb_model.json')
with open(model_config_path, 'w') as f:
          f.write(json.dumps(model_configs))

In [8]:
newproj.log_artifact('lgb_configs',
                     target_path = os.path.abspath(model_config_path))

[mlrun] 2020-05-20 16:25:09,186 log artifact lgb_configs at /User/demo-network-operations/data/lgb_model.json, size: None, db: Y


### Create Pipeline Workflow

In [57]:
%%writefile src/workflow.py
from kfp import dsl
from mlrun import mount_v3io, mlconf
import os
from nuclio.triggers import V3IOStreamTrigger

funcs = {}
projdir = os.getcwd()
label_column = 'is_error'
model_inference_stream = '/bigdata/network-operations/inference_stream'
model_inference_url = f'http://v3io-webapi:8081{model_inference_stream}'


def init_functions(functions: dict, project=None, secrets=None):
    for f in functions.values():
        # Add V3IO Mount
        f.apply(mount_v3io())
        
        # Always pull images to keep updates
        f.spec.image_pull_policy = 'Always'
        
    functions['serving'].metadata.name = 'netops-server'
    functions['serving'].spec.min_replicas = 1
    functions['serving'].spec.build.baseImage = 'mlrun/mlrun:0.4.7'
    
    # Define inference-stream related triggers
    functions['serving'].set_envs({'INFERENCE_STREAM': model_inference_stream})
    functions['s2p'].add_trigger('labeled_stream', V3IOStreamTrigger(url=f'{model_inference_url}@s2p'))
    functions['s2p'].set_envs({'window': 10,
                               'features': 'cpu_utilization',
                               'save_to': '/bigdata/inference_pq/',
                               'base_dataset': 'store://network-operations/test_test_set_preds',
                               'hub_url': '/User/functions/{name}/function.yaml',
                               'mount_path': '/bigdata',
                               'mount_remote': '/bigdata'})
                
        
@dsl.pipeline(
    name='Network Operations Demo',
    description='Train a Failure Prediction LGBM Model over sensor data'
)
def kfpipeline(
        df_artifact = 'store://network-operations/metrics',
        metrics = ['cpu_utilization', 'throughput', 'packet_loss', 'latency'],
        metric_aggs = ['mean', 'sum', 'std', 'var', 'min', 'max', 'median'],
        suffix = 'daily',
        window = 20,
        describe_table = 'netops',
        describe_sample = 0.3,
        label_column = label_column,
        class_labels = [1, 0],
        SAMPLE_SIZE      = -1, # -n for random sample of n obs, -1 for entire dataset, +n for n consecutive rows
        TEST_SIZE        = 0.1,       # 10% set aside
        TRAIN_VAL_SPLIT  = 0.75,      # remainder split into train and val
    ):
    
    describe = funcs['describe'].as_step(name='describe-raw-data',
                                                handler="summarize",  
                                                params={"key": f"{describe_table}_raw", 
                                                        "label_column": label_column, 
                                                        'class_labels': ['0', '1'],
                                                        'plot_hist': True,
                                                        'plot_dest': 'plots/raw',
                                                        'sample': describe_sample},
                                                inputs={"table": df_artifact},
                                                outputs=["summary", "scale_pos_weight"])
    
    # Run preprocessing on the data
    aggregate = funcs['aggregate'].as_step(name='aggregate',
                                                  params={'metrics': metrics,
                                                          'metric_aggs': metric_aggs,
                                                          'suffix': suffix,
                                                          },
                                                  inputs={'df_artifact': df_artifact},
                                                  outputs=['aggregate'],
                                                  handler='aggregate',
                                                  image='mlrun/ml-models:0.4.7')

    describe = funcs['describe'].as_step(name='describe-aggregation',
                                                handler="summarize",  
                                                params={"key": f"{describe_table}_aggregate", 
                                                        "label_column": label_column, 
                                                        'class_labels': class_labels,
                                                        'plot_hist': True,
                                                        'plot_dest': 'plots/aggregation',
                                                        'sample': describe_sample},
                                                inputs={"table": aggregate.outputs['aggregate']},
                                                outputs=["summary", "scale_pos_weight"])
    
    feature_selection = funcs['feature_selection'].as_step(name='feature_selection',
                                                           handler='feature_selection',
                                                           params={'k': 5,
                                                                   'min_votes': 3,
                                                                   'label_column': label_column},
                                                           inputs={'df_artifact': aggregate.outputs['aggregate']},
                                                           outputs=['feature_scores', 
                                                                    'max_scaled_scores_feature_scores'
                                                                    'selected_features_count', 
                                                                    'selected_features'],
                                                           image='mlrun/ml-models:0.4.7')
    
    describe = funcs['describe'].as_step(name='describe-feature-vector',
                                            handler="summarize",  
                                            params={"key": f'{describe_table}_features', 
                                                    "label_column": label_column, 
                                                    'class_labels': class_labels,
                                                    'plot_hist': True,
                                                    'plot_dest': 'plots/feature_vector'},
                                            inputs={"table": feature_selection.outputs['selected_features']},
                                            outputs=["summary", "scale_pos_weight"])
    
    train = funcs['train'].as_step(name='train',
                                   params={"sample"          : SAMPLE_SIZE, 
                                           "label_column"    : label_column,
                                           "test_size"       : TEST_SIZE},
                                   inputs={"dataset"         : feature_selection.outputs['selected_features']},
                                   hyperparams={'model_pkg_class': ["sklearn.ensemble.RandomForestClassifier", 
                                                                    "sklearn.linear_model.LogisticRegression",
                                                                    "sklearn.ensemble.AdaBoostClassifier"]},
                                   selector='max.accuracy',
                                   outputs=['model', 'test_set'],
                                   image='mlrun/ml-models:0.4.7')
    
    test = funcs['test'].as_step(name='test',
                                 handler='test_classifier',
                                 params={'label_column': label_column},
                                 inputs={'models_path': train.outputs['model'],
                                         'test_set': train.outputs['test_set']},
                                 image='mlrun/ml-models:0.4.7')
    
    # deploy the model using nuclio functions
    deploy = funcs['serving'].deploy_step(models={'predictor': train.outputs['model']}, tag='v1')
    
    # test out new model server (via REST API calls)
    tester = funcs["model_server-tester"].as_step(name='model-tester',
                                                  params={'addr': deploy.outputs['endpoint'], 
                                                          'model': "predictor",
                                                          'label_column': label_column},
                                                  inputs={'table': train.outputs['test_set']})
    
    # Streaming demo functions
#     preprocessor = funcs['preprocessor'].deploy_step(name='preprocessor').after(deploy)
    
    generator = funcs['generator'].deploy_step(name='generator').after(preprocessor)
    
    concept_drift = funcs['concept_drift'].as_step(name='concept_drift_deployer',
                                                   params={'models': ['ddm', 'eddm', 'pagehinkley'],
                                                           'label_col': 'is_error',
                                                           'prediction_col': 'prediction',
                                                           'hub_url': '/User/functions/{name}/function.yaml',
                                                           'output_tsdb': '/bigdata/network-operations/drift_tsdb',
                                                           'input_stream': 'http://v3io-webapi:8081/bigdata/network-operations/inference_stream@cd2',
                                                           'output_stream': '/bigdata/network-operations/drift_stream'},
                                                   inputs={'base_dataset': 'store://network-operations/test_test_set_preds'},
                                                   artifact_path=mlconf.artifact_path,
                                                   image='mlrun/ml-models:0.4.7').after(deploy)
    
    s2p = funcs['s2p'].deploy_step(project='network-operations').after(deploy)
    

Overwriting src/workflow.py


## Add workflow

In [58]:
newproj.set_workflow('main', os.path.join(os.path.abspath(newproj.context), 'src', 'workflow.py'))

## Save Project

In [59]:
newproj.save(os.path.join(newproj.context, 'project.yaml'))

### Run workflow

In [60]:
newproj.run('main', artifact_path=ARTIFACT_PATH, dirty=True)





[mlrun] 2020-06-02 17:08:21,223 Pipeline run id=2d2180b4-d609-4485-a985-cb7ec0c514f1, check UI or DB for progress


'2d2180b4-d609-4485-a985-cb7ec0c514f1'

## Test endpoint

In [21]:
import pandas as pd
import requests
import json

In [22]:
# Set model
model_name = 'predictor'

# Load pre-processed data example
df = pd.read_parquet(os.path.join(os.path.abspath(newproj.context), 'artifacts', 'data', 'aggregate.pq'))

# Set sample
sample = df.head(1).fillna(0).drop(columns=['is_error']).values.tolist()
msg = {'instances': sample}

# Set endpoint
addr = 'http://192.168.224.209:32666'

In [23]:
# Send Request
req = requests.post(f'{addr}/{model_name}/predict', data=json.dumps(msg))
req.__dict__

{'_content': b'[false]',
 '_content_consumed': True,
 '_next': None,
 'status_code': 200,
 'headers': {'Server': 'nuclio', 'Date': 'Sun, 05 Apr 2020 14:33:42 GMT', 'Content-Type': 'application/json', 'Content-Length': '7'},
 'raw': <urllib3.response.HTTPResponse at 0x7f57e5460630>,
 'url': 'http://192.168.224.209:32666/predictor/predict',
 'encoding': None,
 'history': [],
 'reason': 'OK',
 'cookies': <RequestsCookieJar[]>,
 'elapsed': datetime.timedelta(0, 0, 54624),
 'request': <PreparedRequest [POST]>,
 'connection': <requests.adapters.HTTPAdapter at 0x7f57ec406f60>}