# Network Operations

In this notebook we will assemble our project.  We will explore different functions on our dataset and compile them into a workflow ready for production.

The functions we will use will be a mix of `hub` based functions from our [MLRun Functions](http://github.com/mlrun/functions) repo, local and git based notebooks.

> The notebook should be run after generating the data in the [Generator Notebook](generator.ipynb)


we will start by setting up our environment, Loading MLRun and some utilities we will need

In [1]:
# Utils
import os
import json
import urllib
import numpy as np

# MLRun imports
from mlrun import mlconf
import kfp

# Setup API Endpoint
mlconf.dbpath = 'http://mlrun-api:8080'

Now lets define our current project 

## Create a project from a git repository

In [2]:
from mlrun import new_project

# update the dir and repo to reflect real locations 
# the remote git repo must be initialized in GitHub
project_dir = '../'
remote_git = 'https://github.com/zilbermanor/demo-network-operations.git'

# Create the project
newproj = new_project('network-operations', project_dir, init_git=True)

# We can update our project directory to the latest status by running
# newproj.pull()

Now that we have our project directory, lets forword our artifacts there to keep track of them

In [3]:
# Define an artifact path to keep track of where our artifacts are going
ARTIFACT_PATH =  os.path.join('/User', 'demo-network-operations', 'artifacts')
mlconf.artifact_path = os.path.join('/User', 'demo-network-operations', 'artifacts')

## Create and run functions

As we receive a new dataset, the first thing we would like to do is to explore it a bit, we can do that using our `describe` function in `mlrun/functions`

In [4]:
from mlrun import mount_v3io, new_model_server

In [5]:
# Define function versions to take
describe_tag = 'master'
aggregate_tag = 'development'
train_tag = 'master'
test_tag = 'master'

In [6]:
# Import the functions

# Nuclio fuction from Notebook
newproj.set_function(func='notebooks/generator.ipynb', name='generator')

# Kubernetes Job from Notebook
newproj.set_function(func=f'hub://aggregate:{aggregate_tag}', name='aggregate')

# Functions From hub
newproj.set_function(func=f'hub://describe:{describe_tag}', name='describe')

newproj.set_function(func=f'hub://sklearn_classifier:{train_tag}', name='train')

newproj.set_function(func=f'hub://test_classifier:{test_tag}', name='test')

# Nuclio based Model Server From local notebook
newproj.set_function(func='notebooks/model-server.ipynb', name="serving")

<mlrun.runtimes.function.RemoteRuntime at 0x7f2dfebb11d0>

## Generate the dataset
If needed go to [Generator](./generator.ipynb) and run the local workflow to generate the metrics dataset to `data/metrics`

## Run the functions locally to develop the workflow

now we can **Run** the function locally on our sample data, we would like to get some details on our `raw` data

## Register raw data as project level artifact

In [7]:
# Define base Dataset
metrics_path = os.path.join('/', 'User', 'demo-network-operations', 'data', 'metrics.parquet')
newproj.log_artifact('metrics', target_path = metrics_path)

[mlrun] 2020-04-02 05:13:30,917 log artifact metrics at /User/demo-network-operations/data/metrics.parquet, size: None, db: Y


### Get statistics about the metrics data

In [8]:
from mlrun import NewTask

In [9]:
describe_task = NewTask(
    name="describe", 
    handler="summarize",  
    params={"key": "summary", 
            "label_column": "is_error", 
            'class_labels': ['0', '1'],
            'plot_hist': True,
            'plot_dest': 'plots-metrics'},
    inputs={"table": 'store://network-operations/metrics'},
    artifact_path=ARTIFACT_PATH)

In [10]:
decsribe_run = newproj.func('describe').apply(mount_v3io()).run(describe_task)

[mlrun] 2020-04-02 05:14:20,023 starting run describe uid=b62d02ac731749309a2c3da05a45f374  -> http://mlrun-api:8080
[mlrun] 2020-04-02 05:14:20,794 Job is running in the background, pod: describe-lrwtj
findfont: Font family ['sans-serif'] not found. Falling back to DejaVu Sans.
findfont: Font family ['sans-serif'] not found. Falling back to DejaVu Sans.
findfont: Font family ['sans-serif'] not found. Falling back to DejaVu Sans.
findfont: Font family ['sans-serif'] not found. Falling back to DejaVu Sans.
[mlrun] 2020-04-02 05:16:36,367 log artifact histograms at /User/demo-network-operations/artifacts/plots/hist.html, size: 495513, db: Y
[mlrun] 2020-04-02 05:16:44,746 log artifact imbalance at /User/demo-network-operations/artifacts/plots/imbalance.html, size: 16524, db: Y
[mlrun] 2020-04-02 05:16:45,962 log artifact correlation at /User/demo-network-operations/artifacts/plots/corr.html, size: 43386, db: Y

[mlrun] 2020-04-02 05:16:46,344 run executed, status=completed
final state: s

project,uid,iter,start,state,name,labels,inputs,parameters,results,artifacts
network-operations,...5a45f374,0,Apr 02 05:15:18,completed,describe,host=describe-lrwtjkind=jobowner=adminv3io_user=admin,table,"class_labels=['0', '1']key=summarylabel_column=is_errorplot_dest=plots-metricsplot_hist=True",scale_pos_weight=11.27,histogramsimbalancecorrelation


to track results use .show() or .logs() or in CLI: 
!mlrun get run b62d02ac731749309a2c3da05a45f374 --project network-operations , !mlrun logs b62d02ac731749309a2c3da05a45f374 --project network-operations
[mlrun] 2020-04-02 05:16:55,610 run executed, status=completed


### Create the feature vector

We will use our [Aggregate](https://github.com/mlrun/functions/blob/master/aggregate/aggregate.ipynb) function to create rolling window features for our feature vector.

In doing so we hope that we could help our algorithms identify local errors by using a windowed trend

In [11]:
# Define aggregate task
aggregate_task = NewTask(
    name='aggregate',
    params={'metrics': ['cpu_utilization', 'throughput', 'packet_loss', 'latency'],
            'labels': ['is_error'],
            'metric_aggs': ['mean', 'sum'],
            'label_aggs': ['max'],
            'suffix': 'daily',
            'append_to_df': True,
            'window': 5,
            'center': True,
            'save_to': os.path.join(project_dir, 'data', 'aggregate.pq')},
    inputs={'df_artifact': 'store://network-operations/metrics'},
    handler='aggregate')

In [14]:
aggregate_run = newproj.func('aggregate').apply(mount_v3io()).run(aggregate_task)

[mlrun] 2020-04-02 05:17:52,486 starting run aggregate uid=b48fccbf8b824f19b0bec2d9de6c364f  -> http://mlrun-api:8080
[mlrun] 2020-04-02 05:17:53,091 Job is running in the background, pod: aggregate-x7ckm
[mlrun] 2020-04-02 05:18:31,815 Aggregating /User/demo-network-operations/data/metrics.parquet
[mlrun] 2020-04-02 05:18:33,129 log artifact aggregate at /User/demo-network-operations/artifacts/b48fccbf8b824f19b0bec2d9de6c364f/aggregate.parquet, size: 577194, db: Y

[mlrun] 2020-04-02 05:18:33,604 run executed, status=completed
final state: succeeded


project,uid,iter,start,state,name,labels,inputs,parameters,results,artifacts
network-operations,...de6c364f,0,Apr 02 05:18:31,completed,aggregate,host=aggregate-x7ckmkind=jobowner=adminv3io_user=admin,df_artifact,"append_to_df=Truecenter=Truelabel_aggs=['max']labels=['is_error']metric_aggs=['mean', 'sum']metrics=['cpu_utilization', 'throughput', 'packet_loss', 'latency']save_to=../data/aggregate.pqsuffix=dailywindow=5",,aggregate


to track results use .show() or .logs() or in CLI: 
!mlrun get run b48fccbf8b824f19b0bec2d9de6c364f --project network-operations , !mlrun logs b48fccbf8b824f19b0bec2d9de6c364f --project network-operations
[mlrun] 2020-04-02 05:18:42,415 run executed, status=completed


### Get statistics about the feature vector

In [15]:
aggregate_describe_task = NewTask(
    name="describe-aggregate", 
    handler="summarize",  
    params={"key": "summary", 
            "label_column": "is_error", 
            'class_labels': ['0', '1'],
            'plot_hist': True,
            'plot_dest': 'plots-aggregate'},
    inputs={"table": aggregate_run.outputs['aggregate']},
    artifact_path=ARTIFACT_PATH)

In [16]:
aggregate_decsribe_run = newproj.func('describe').apply(mount_v3io()).run(aggregate_describe_task)

[mlrun] 2020-04-02 05:18:42,739 starting run describe-aggregate uid=2334c0af6ab744829c225cb9fb58f7b8  -> http://mlrun-api:8080
[mlrun] 2020-04-02 05:18:43,809 Job is running in the background, pod: describe-aggregate-f6nzm
findfont: Font family ['sans-serif'] not found. Falling back to DejaVu Sans.
findfont: Font family ['sans-serif'] not found. Falling back to DejaVu Sans.
findfont: Font family ['sans-serif'] not found. Falling back to DejaVu Sans.
findfont: Font family ['sans-serif'] not found. Falling back to DejaVu Sans.
[mlrun] 2020-04-02 05:25:39,697 log artifact histograms at /User/demo-network-operations/artifacts/plots/hist.html, size: 3389697, db: Y
[mlrun] 2020-04-02 05:26:15,604 log artifact imbalance at /User/demo-network-operations/artifacts/plots/imbalance.html, size: 16524, db: Y
[mlrun] 2020-04-02 05:26:17,085 log artifact correlation at /User/demo-network-operations/artifacts/plots/corr.html, size: 67690, db: Y

[mlrun] 2020-04-02 05:26:17,495 run executed, status=com

project,uid,iter,start,state,name,labels,inputs,parameters,results,artifacts
network-operations,...fb58f7b8,0,Apr 02 05:19:37,completed,describe-aggregate,host=describe-aggregate-f6nzmkind=jobowner=adminv3io_user=admin,table,"class_labels=['0', '1']key=summarylabel_column=is_errorplot_dest=plots-aggregateplot_hist=True",scale_pos_weight=11.27,histogramsimbalancecorrelation


to track results use .show() or .logs() or in CLI: 
!mlrun get run 2334c0af6ab744829c225cb9fb58f7b8 --project network-operations , !mlrun logs 2334c0af6ab744829c225cb9fb58f7b8 --project network-operations
[mlrun] 2020-04-02 05:26:25,138 run executed, status=completed


## Create workflow to train the model
After reviewing the data and creating the feature vector we move to training our model.  
For this task we will use an **LGBM** classifier.  To control the training process we will supply a `model_config` dictionary with the following parameters:
- **CLASS**: Model-specific parameters.
- **FIT**: Training parameters (like epoch when needed)
- **META**: Model and Package version

### Setup model configurations

In [17]:
model_configs = {
    "CLASS" : {
        "boosting_type"      : "gbdt",
        "num_leaves"         : 300,
        "max_depth"          : 50,
        "learning_rate"      : 0.1,
        "n_estimators"       : 300,
        "objective"          : "binary",
        "scale_pos_weight"   : 1,    
        "min_split_gain"     : 0.0,
        "min_child_samples"  : 20,
        "subsample"          : 1,
        "colsample_bytree"   : 1,
        "reg_alpha"          : 0,
        "reg_lambda"         : 1,
        "n_jobs"             : 16,
        "silent"             : True,
        "importance_type"    : "split",
        "random_state"       : 1},
    "FIT" : {
        "verbose"               : False
    },
    "META" : {
        "class" : "lightgbm.sklearn.LGBMClassifier",
        "version" : "2.3.1"
    }
}
model_config_path = os.path.join(newproj.context, 'data', 'lgb_model.json')
with open(model_config_path, 'w') as f:
          f.write(json.dumps(model_configs))

In [18]:
newproj.log_artifact('lgb_configs',
                     target_path = os.path.abspath(model_config_path))

[mlrun] 2020-04-02 05:26:25,599 log artifact lgb_configs at /User/demo-network-operations/data/lgb_model.json, size: None, db: Y


### Create Pipeline Workflow

In [40]:
%%writefile ../src/workflow.py
from kfp import dsl
from mlrun import mount_v3io
import os

funcs = {}

def init_functions(functions: dict, project=None, secrets=None):
    for f in functions.values():
        # Add V3IO Mount
        f.apply(mount_v3io())
        
        # Always pull images to keep updates
        f.spec.image_pull_policy = 'Always'
        
    functions['aggregate'].spec.image = 'mlrun/ml-models:0.4.6'
    for fn, fv in functions.items():
        print(f'Function: {fn}')
        print(fv.spec)
        
        
@dsl.pipeline(
    name='Network Operations Demo',
    description='Train a Failure Prediction LGBM Model over sensor data'
)
def kfpipeline(
        df_artifact = 'store://network-operations/metrics',
        metrics = ['cpu_utilization'],
        labels = ['is_error'],
        metric_aggs = ['mean', 'sum'],
        label_aggs = ['max'],
        suffix = 'daily',
        append_to_df = True,
        window = 5,
        center = True,
        save_to = os.path.join('data', 'aggregate.pq'),
        describe_table = 'summary',
        label_column = 'is_error',
        class_labels = [1, 0],
        SAMPLE_SIZE      = -1, # -n for random sample of n obs, -1 for entire dataset, +n for n consecutive rows
        TEST_SIZE        = 0.1,       # 10% set aside
        TRAIN_VAL_SPLIT  = 0.75,      # remainder split into train and val
        RNG              = 1,
        config_filepath = 'store://network-operations/lgb_configs',
    ):
    
    describe = funcs['describe'].as_step(name='describe-raw-data',
                                                handler="summarize",  
                                                params={"key": "summary", 
                                                        "label_column": "is_error", 
                                                        'class_labels': ['0', '1'],
                                                        'plot_hist': True,
                                                        'plot_dest': 'plots'},
                                                inputs={"table": df_artifact},
                                                outputs=["summary", "scale_pos_weight"])
    
    # Run preprocessing on the data
    aggregate = funcs['aggregate'].as_step(name='aggregate',
                                                  params={'df_artifact': df_artifact,
                                                          'metrics': metrics,
                                                          'labels': labels,
                                                          'metric_aggs': metric_aggs,
                                                          'label_aggs': label_aggs,
                                                          'suffix': suffix,
                                                          'append_to_df': append_to_df,
                                                          'window': window,
                                                          'center': center,
                                                          'save_to': save_to},
                                                  outputs=['aggregate'],
                                                  handler='aggregate',
                                                  image='mlrun/mlrun')

    describe = funcs['describe'].as_step(name='describe-feature-vector',
                                                handler="summarize",  
                                                params={"key": "summary", 
                                                        "label_column": "is_error", 
                                                        'class_labels': ['0', '1'],
                                                        'plot_hist': True,
                                                        'plot_dest': 'plots'},
                                                inputs={"table": aggregate.outputs['aggregate']},
                                                outputs=["summary", "scale_pos_weight"])
    
    train = funcs['train'].as_step(name='train', 
                                          handler='train_model',
                                          params={'model_pkg_class' : config_filepath,
                                                  'sample'          : -1,
                                                  'label_column'    : "is_error",
                                                  'test_size'       : 0.10,
                                                  'train_val_split' : 0.75,
                                                  'rng'             : 1},
                                          inputs={"data_key": aggregate.outputs['aggregate']},
                                          outputs=['model', 'test-set'])
    
#     test = funcs['test'].as_step()
    
    # deploy the model using nuclio functions
    deploy = funcs['serving'].deploy_step(project='nuclio-serving',
                                                 models={'predictor': train.outputs['model']})

Overwriting ../src/workflow.py


## Add workflow

In [33]:
newproj.set_workflow('main', os.path.join('/', 'User', 'demo-network-operations', 'src', 'workflow.py'))

### Run workflow

In [36]:
newproj.run('main', dirty=True)





[mlrun] 2020-04-02 05:55:06,854 Pipeline run id=d53a2c05-8021-4ec8-b36b-79a3762e4f09, check UI or DB for progress


'd53a2c05-8021-4ec8-b36b-79a3762e4f09'

## Test endpoint

In [1]:
import pandas as pd
import requests
import json

In [2]:
# Set model
model_name = 'predictor'

# Load pre-processed data example
df = pd.read_parquet('/User/demo-network-operations/data/aggregate.pq')

# Set sample
sample = df.head(1).fillna(0).drop(columns=['is_error']).values.tolist()
msg = {'instances': sample}

# Set endpoint
addr = 'http://3.136.215.154:31092'

In [16]:
# Send Request
req = requests.post(f'{addr}/{model_name}/predict', data=json.dumps(msg))
req.__dict__

{'_content': b'[0.0]',
 '_content_consumed': True,
 '_next': None,
 'status_code': 200,
 'headers': {'Server': 'nuclio', 'Date': 'Fri, 13 Mar 2020 10:25:37 GMT', 'Content-Type': 'application/json', 'Content-Length': '5'},
 'raw': <urllib3.response.HTTPResponse at 0x7fb5137cd128>,
 'url': 'http://3.136.215.154:31092/predictor/predict',
 'encoding': None,
 'history': [],
 'reason': 'OK',
 'cookies': <RequestsCookieJar[]>,
 'elapsed': datetime.timedelta(0, 1, 156261),
 'request': <PreparedRequest [POST]>,
 'connection': <requests.adapters.HTTPAdapter at 0x7fb51332c908>}

## Save Project

In [35]:
newproj.save(os.path.join(newproj.context, 'project.yaml'))