# Network Operations

In this notebook we will assemble our project.  We will explore different functions on our dataset and compile them into a workflow ready for production.

The functions we will use will be a mix of `hub` based functions from our [MLRun Functions](http://github.com/mlrun/functions) repo, local and git based notebooks.

> The notebook should be run after generating the data in the [Generator Notebook](./notebooks/generator.ipynb)


we will start by setting up our environment, Loading MLRun and some utilities we will need

In [1]:
# Utils
import os
import json
import urllib
import numpy as np

# MLRun imports
from mlrun import mlconf

> If you are using another version of our `hub://` please set it up in the following cell.
* The url can parse {name} and {tag} to the given url

In [2]:
# Set Hub URL address if using a local version
# mlconf.hub_url = '/User/functions/{name}/function.yaml'

Now lets define our current project 

## Create a project from a git repository

In [3]:
from mlrun import new_project, set_environment

# Setup project definitions including name, base path and
# artifacts path
project_name_base = 'network-operations'
project_dir = os.path.abspath('./')
project_name, artifact_path = set_environment(artifact_path=os.path.abspath('./artifacts'), project=project_name_base, user_project=True)

# Create the project
newproj = new_project(project_name, project_dir, init_git=False)

# We can update our project directory to the latest status by running
# newproj.pull()

## Create and run functions

As we receive a new dataset, the first thing we would like to do is to explore it a bit, we can do that using our `describe` function in `mlrun/functions`

In [4]:
from mlrun import mount_v3io, new_model_server

In [5]:
# Import the functions
# Functions From hub
tag = 'master'
newproj.set_function(func=f'hub://aggregate:{tag}', name='aggregate')
newproj.set_function(func=f'hub://describe:{tag}', name='describe')
newproj.set_function(func=f'hub://feature_selection:{tag}', name="feature_selection")
newproj.set_function(func=f'hub://sklearn_classifier:{tag}', name='train')
newproj.set_function(func=f'hub://test_classifier:{tag}', name='test')
newproj.set_function(func=f'hub://model_server_tester:{tag}', name="model_server-tester")
newproj.set_function(func=f'hub://concept_drift:{tag}', name="concept_drift")
newproj.set_function(func=f'hub://stream_to_parquet:{tag}', name="s2p")
newproj.set_function(func=f'hub://virtual_drift:{tag}', name="virtual_drift")

# Streaming
src_path = os.path.abspath('notebooks/')
newproj.set_function(func=os.path.join(src_path, 'generator.ipynb'), name='generator')
newproj.set_function(func=os.path.join(src_path, 'preprocessor.ipynb'), name='create_feature_vector')
newproj.set_function(func=os.path.join(src_path, 'server.ipynb'), name="serving")
newproj.set_function(func=os.path.join(src_path, 'labeled_stream_creator.ipynb'), name="labeled_stream")

<mlrun.runtimes.function.RemoteRuntime at 0x7fd58cf78c50>

## Generate the dataset
If needed go to [Generator](notebooks/generator.ipynb) and run the local workflow to generate the metrics dataset to `data/metrics`

## Run the functions locally to develop the workflow

now we can **Run** the function locally on our sample data, we would like to get some details on our `raw` data

## Register raw data as project level artifact

In [6]:
# Define base Dataset
import random
data_dir = os.path.join(os.path.abspath(newproj.context), 'data')
dataset_filename = random.choice(list(filter(lambda x: (x.endswith('pq') or x.endswith('parquet')), os.listdir(data_dir))))
metrics_path = os.path.join(data_dir, dataset_filename)
print(f'Selected {metrics_path} as base dataset, Prepearing dataset')

import pandas as pd
# Drop alternate error columns
label_column = 'is_error'
raw = pd.read_parquet(metrics_path)
raw = raw.drop([col for col in raw.columns if (col != label_column) & (col.endswith(label_column))], axis=1)
dataset_path = os.path.join(data_dir, 'metrics.pq')
raw.to_parquet(dataset_path)
print(f'Finished prepearing dataset {raw.shape}, logging artifact to store://{newproj.name}/netops-project_metrics')

# Add to the project as a Dataset Artifact
from mlrun.artifacts import DatasetArtifact
from mlrun import get_or_create_ctx
mlctx = get_or_create_ctx('netops-project')
mlctx._project = project_name
mlctx.log_dataset(key='metrics', df=raw, format='parquet', target_path=dataset_path)

Selected /User/mlrun-demos/demos/network-operations/data/metrics.pq as base dataset, Prepearing dataset
Finished prepearing dataset (5768, 5), logging artifact to store://network-operations-orz/netops-project_metrics
> 2021-01-26 12:47:55,229 [info] logging run results to: http://mlrun-api:8080




<mlrun.artifacts.dataset.DatasetArtifact at 0x7fd58cf46510>

### Get statistics about the metrics data

In [7]:
from mlrun import NewTask
from mlrun.platforms import auto_mount

In [8]:
describe_task = NewTask(
    name="describe", 
    handler="summarize",  
    params={"key": "summary", 
            "label_column": label_column, 
            'class_labels': ['0', '1'],
            'plot_hist': True,
            'plot_dest': 'plots-metrics'},
    inputs={"table": metrics_path},
    artifact_path=artifact_path)

In [9]:
decsribe_run = newproj.func('describe').apply(auto_mount()).run(describe_task)

> 2021-01-26 12:48:48,391 [info] starting run describe uid=1a9b40d927de42bba0faeaa4aef3d737 DB=http://mlrun-api:8080
> 2021-01-26 12:48:48,586 [info] Job is running in the background, pod: describe-jrqdk
> 2021-01-26 12:48:58,239 [info] run executed, status=completed
final state: completed


project,uid,iter,start,state,name,labels,inputs,parameters,results,artifacts
network-operations-orz,...aef3d737,0,Jan 26 12:48:53,completed,describe,v3io_user=orzkind=jobowner=orzhost=describe-jrqdk,table,"key=summarylabel_column=is_errorclass_labels=['0', '1']plot_hist=Trueplot_dest=plots-metrics",,histogramsviolinimbalanceimbalance-weights-veccorrelation-matrixcorrelation


to track results use .show() or .logs() or in CLI: 
!mlrun get run 1a9b40d927de42bba0faeaa4aef3d737 --project network-operations-orz , !mlrun logs 1a9b40d927de42bba0faeaa4aef3d737 --project network-operations-orz
> 2021-01-26 12:49:07,793 [info] run executed, status=completed


### Create the feature vector

We will use our [Aggregate](https://github.com/mlrun/functions/blob/master/aggregate/aggregate.ipynb) function to create rolling window features for our feature vector.

In doing so we hope that we could help our algorithms identify local errors by using a windowed trend

In [10]:
# Define aggregate task
from mlrun import NewTask
aggregate_task = NewTask(
    name='aggregate',
    params={'metrics': ['cpu_utilization', 'throughput', 'packet_loss', 'latency'],
            'metric_aggs': ['mean', 'sum', 'std', 'var', 'min', 'max', 'median'],
            'suffix': 'daily',
            'append_to_df': True,
            'window': 20,
            'center': False,
            'save_to': os.path.join('data', 'aggregate.pq'),
            'drop_na': True},
    inputs={'df_artifact': f'store://{newproj.name}/netops-project_metrics'},
    handler='aggregate')

In [11]:
aggregate_run = newproj.func('aggregate').apply(mount_v3io()).run(aggregate_task)

> 2021-01-26 12:49:07,813 [info] starting run aggregate uid=a385f89578e24f9fb600f86944f615c0 DB=http://mlrun-api:8080
> 2021-01-26 12:49:07,994 [info] Job is running in the background, pod: aggregate-wnplh
> 2021-01-26 12:49:11,628 [info] Aggregating /User/mlrun-demos/demos/network-operations/data/metrics.pq
> 2021-01-26 12:49:11,731 [info] Logging artifact
> 2021-01-26 12:49:12,014 [info] run executed, status=completed
final state: completed


project,uid,iter,start,state,name,labels,inputs,parameters,results,artifacts
network-operations-orz,...44f615c0,0,Jan 26 12:49:11,completed,aggregate,v3io_user=orzkind=jobowner=orzhost=aggregate-wnplh,df_artifact,"metrics=['cpu_utilization', 'throughput', 'packet_loss', 'latency']metric_aggs=['mean', 'sum', 'std', 'var', 'min', 'max', 'median']suffix=dailyappend_to_df=Truewindow=20center=Falsesave_to=data/aggregate.pqdrop_na=True",,aggregate


to track results use .show() or .logs() or in CLI: 
!mlrun get run a385f89578e24f9fb600f86944f615c0 --project network-operations-orz , !mlrun logs a385f89578e24f9fb600f86944f615c0 --project network-operations-orz
> 2021-01-26 12:49:14,125 [info] run executed, status=completed


### Get statistics about the feature vector

In [12]:
aggregate_describe_task = NewTask(
    name="describe-aggregate", 
    handler="summarize",  
    params={"key": "summary", 
            "label_column": label_column, 
            'class_labels': ['0', '1'],
            'plot_hist': True,
            'plot_dest': 'plots-aggregate',
            'sample': 0.3},
    inputs={"table": aggregate_run.outputs['aggregate']},
    artifact_path=artifact_path)

In [13]:
aggregate_decsribe_run = newproj.func('describe').apply(mount_v3io()).run(aggregate_describe_task)

> 2021-01-26 12:49:14,143 [info] starting run describe-aggregate uid=d5264a29331041d1b44d12a3e5098d01 DB=http://mlrun-api:8080
> 2021-01-26 12:49:14,301 [info] Job is running in the background, pod: describe-aggregate-5zr2l
> 2021-01-26 12:54:02,013 [info] run executed, status=completed
  fig.tight_layout(pad=2.0)
final state: completed


project,uid,iter,start,state,name,labels,inputs,parameters,results,artifacts
network-operations-orz,...e5098d01,0,Jan 26 12:49:18,completed,describe-aggregate,v3io_user=orzkind=jobowner=orzhost=describe-aggregate-5zr2l,table,"key=summarylabel_column=is_errorclass_labels=['0', '1']plot_hist=Trueplot_dest=plots-aggregatesample=0.3",,histogramsviolinimbalanceimbalance-weights-veccorrelation-matrixcorrelation


to track results use .show() or .logs() or in CLI: 
!mlrun get run d5264a29331041d1b44d12a3e5098d01 --project network-operations-orz , !mlrun logs d5264a29331041d1b44d12a3e5098d01 --project network-operations-orz
> 2021-01-26 12:54:04,896 [info] run executed, status=completed


## Create workflow to train the model
After reviewing the data and creating the feature vector we move to training our model.  
For this task we will use an **LGBM** classifier.  To control the training process we will supply a `model_config` dictionary with the following parameters:
- **CLASS**: Model-specific parameters.
- **FIT**: Training parameters (like epoch when needed)
- **META**: Model and Package version

### Setup model configurations

In [None]:
model_configs = {
    "CLASS" : {
        "boosting_type"      : "gbdt",
        "num_leaves"         : 300,
        "max_depth"          : 50,
        "learning_rate"      : 0.1,
        "n_estimators"       : 300,
        "objective"          : "binary",
        "scale_pos_weight"   : 1,    
        "min_split_gain"     : 0.0,
        "min_child_samples"  : 20,
        "subsample"          : 1,
        "colsample_bytree"   : 1,
        "reg_alpha"          : 0,
        "reg_lambda"         : 1,
        "n_jobs"             : 16,
        "silent"             : True,
        "importance_type"    : "split",
        "random_state"       : 1},
    "FIT" : {
        "verbose"               : False
    },
    "META" : {
        "class" : "lightgbm.sklearn.LGBMClassifier",
        "version" : "2.3.1"
    }
}

config_dir = os.path.join(os.path.abspath(newproj.context), 'config')
model_config_path = os.path.join(config_dir, 'lgb_model.json')
os.makedirs(config_dir, exist_ok=True)
with open(model_config_path, 'w') as f:
          f.write(json.dumps(model_configs))

In [None]:
newproj.log_artifact('lgb_configs',
                     target_path = os.path.abspath(model_config_path))

### Create Pipeline Workflow

In [None]:
%%writefile src/workflow.py
from kfp import dsl
from mlrun import mount_v3io, mlconf
import os
from nuclio.triggers import V3IOStreamTrigger, CronTrigger

funcs = {}
projdir = os.getcwd()
projdir_path = f"/{os.environ['V3IO_USERNAME']}{projdir[len('/User'):]}"
labeled_stream_path = os.path.join(projdir_path, 'streaming', 'labeled_stream')
container = 'users'
full_path_projdir = os.path.join('/', container, os.environ["V3IO_USERNAME"], projdir[6:])

# Define a specific hub url?
# mlconf.hub_url = 'https://raw.githubusercontent.com/mlrun/functions/{tag}/{name}/function.yaml'
# mlconf.hub_url |= '/User/functions/{name}/function.yaml'

model_inference_stream = os.path.join(full_path_projdir, 'streaming', 'predictions')
labeled_stream = os.path.join(full_path_projdir, 'streaming', 'labeled_stream')

webapi_url = 'http://v3io-webapi:8081'
model_inference_url = f'{webapi_url}{model_inference_stream}'
labeled_stream_url = f'{webapi_url}{labeled_stream}'

def init_functions(functions: dict, project=None, secrets=None):
    for f in functions.values():
        # Add V3IO Mount
        f.apply(mount_v3io())
        
        # Always pull images to keep updates
        f.spec.image_pull_policy = 'Always'
    
    # Define inference-stream related triggers
    functions['s2p'].add_trigger('labeled_stream', V3IOStreamTrigger(container=container,
                                                                     path=labeled_stream_path,
                                                                     seekTo='earliest',
                                                                     partitions=[0],
                                                                     consumerGroup='s2p',
                                                                     name='labeled_stream'))
    functions['generator'].add_trigger('cron', CronTrigger(interval='1m'))
    functions['labeled_stream'].add_trigger('cron', CronTrigger(interval='1m'))
    functions['create_feature_vector'].add_trigger('cron', CronTrigger(interval='1m'))
    functions['serving'].add_trigger('cron', CronTrigger(interval='1m'))
                
        
@dsl.pipeline(
    name='Network Operations Demo',
    description='Train a Failure Prediction LGBM Model over sensor data'
)
def kfpipeline(
        # aggregate
        df_artifact = os.path.join(projdir, 'data', 'metrics.pq'),
        metrics: list = ['cpu_utilization', 'throughput', 'packet_loss', 'latency'],
        metric_aggs: list = ['mean', 'sum', 'std', 'var', 'min', 'max', 'median'],
        suffix = 'daily',
        window: int = 10,

        # describe
        describe_table = 'netops',
        describe_sample: float = 0.3,
        label_column = 'is_error',
        class_labels: list = [1, 0],
        plot_hist: bool = True,
    
        # Feature selection
        k: int = 5,
        min_votes: int = 3,
    
        # Train
        sample_size: int      = -1,        # -n for random sample of n obs, -1 for entire dataset, +n for n consecutive rows
        test_size: float        = 0.1,       # 10% set aside
        train_val_split: float  = 0.75,      # remainder split into train and val
    
        # Test
        predictions_col = 'predictions',
    
        # Deploy
        deploy_streaming: bool = True,
        aggregate_fn_url = 'hub://aggregate',
        streaming_features_table = os.path.join(projdir, 'streaming', 'features'),
        streaming_predictions_table = os.path.join(projdir, 'streaming', 'predictions'),
    
        # Streaming
        streaming_metrics_table = os.path.join(projdir, 'streaming', 'metrics'),
        generator_metrics_configuration = os.path.join(projdir, 'src', 'metric_configurations.yaml'),
        batches_to_generate = 20, # Setting this will define for how many batches the streaming pipeline will run (put -1 to keep always live)
    
        # labeled stream creator
        streaming_labeled_table = labeled_stream,        
        
        # Concept drift
        deploy_concept_drift: bool = True,
        secs_to_generate: int = 10,
        concept_drift_models: list = ['ddm', 'eddm', 'pagehinkley'],
        output_tsdb = os.path.join(projdir, 'streaming', 'drift_tsdb'),
        input_stream = labeled_stream_url,
        output_stream = os.path.join(projdir, 'streaming', 'drift_stream'),
        streaming_parquet_table =  os.path.join(projdir, 'streaming', 'inference_pq'),
    
        # Virtual drift
        results_tsdb_container = 'users',
        results_tsdb_table = os.path.join(full_path_projdir[7:], 'streaming', 'drift_magnitude')
    ):
    
    # Run preprocessing on the data
    aggregate = funcs['aggregate'].as_step(name='aggregate',
                                                  params={'metrics': metrics,
                                                          'metric_aggs': metric_aggs,
                                                          'suffix': suffix,
                                                          'window': window},
                                                  inputs={'df_artifact': df_artifact},
                                                  outputs=['aggregate'],
                                                  handler='aggregate',
                                                  image='mlrun/ml-models')

    describe = funcs['describe'].as_step(name='describe-aggregation',
                                        handler="summarize",  
                                        params={"key": f"{describe_table}_aggregate", 
                                                "label_column": label_column, 
                                                'class_labels': class_labels,
                                                'plot_hist': plot_hist,
                                                'plot_dest': 'plots/aggregation',
                                                'sample': describe_sample},
                                        inputs={"table": aggregate.outputs['aggregate']},
                                        outputs=["summary", "scale_pos_weight"])
    
    feature_selection = funcs['feature_selection'].as_step(name='feature_selection',
                                                           handler='feature_selection',
                                                           params={'k': k,
                                                                   'min_votes': min_votes,
                                                                   'label_column': label_column},
                                                           inputs={'df_artifact': aggregate.outputs['aggregate']},
                                                           outputs=['feature_scores', 
                                                                    'max_scaled_scores_feature_scores'
                                                                    'selected_features_count', 
                                                                    'selected_features'],
                                                           image='mlrun/ml-models')
    
    describe = funcs['describe'].as_step(name='describe-feature-vector',
                                            handler="summarize",  
                                            params={"key": f'{describe_table}_features', 
                                                    "label_column": label_column, 
                                                    'class_labels': class_labels,
                                                    'plot_hist': plot_hist,
                                                    'plot_dest': 'plots/feature_vector'},
                                            inputs={"table": feature_selection.outputs['selected_features']},
                                            outputs=["summary", "scale_pos_weight"])
    
    train = funcs['train'].as_step(name='train',
                                   params={"sample"          : sample_size, 
                                           "label_column"    : label_column,
                                           "test_size"       : test_size,
                                           "train_val_split" : train_val_split},
                                   inputs={"dataset"         : feature_selection.outputs['selected_features']},
                                   hyperparams={'model_pkg_class': ["sklearn.ensemble.RandomForestClassifier", 
                                                                    "sklearn.linear_model.LogisticRegression",
                                                                    "sklearn.ensemble.AdaBoostClassifier"]},
                                   selector='max.accuracy',
                                   outputs=['model', 'test_set'],
                                   image='mlrun/ml-models')
    
    test = funcs['test'].as_step(name='test',
                                 handler='test_classifier',
                                 params={'label_column': label_column,
                                         'predictions_column': predictions_col},
                                 inputs={'models_path': train.outputs['model'],
                                         'test_set': train.outputs['test_set']},
                                 outputs=['test_set_preds'],
                                 image='mlrun/ml-models')

    
    with dsl.Condition(deploy_streaming == True):
        
        # deploy the model using nuclio functions
        deploy = funcs['serving'].deploy_step(env={'model_path': train.outputs['model'],
                                                   'FEATURES_TABLE': streaming_features_table,
                                                   'PREDICTIONS_TABLE': streaming_predictions_table,
                                                   'prediction_col': predictions_col,
                                                   'BATCHES_TO_GENERATE': batches_to_generate}, 
                                              tag='v1')

        # test out new model server (via REST API calls)
        tester = funcs["model_server-tester"].as_step(name='model-tester',
                                                      params={'addr': deploy.outputs['endpoint'], 
                                                              'model': "predictor",
                                                              'label_column': label_column},
                                                      inputs={'table': train.outputs['test_set']},
                                                      outputs=['test_set_preds'])
    
        # Streaming demo functions
        preprocessor = funcs['create_feature_vector'].deploy_step(env={'aggregate_fn_url': aggregate_fn_url,
                                                                       'METRICS_TABLE': streaming_metrics_table,
                                                                       'FEATURES_TABLE': streaming_features_table,
                                                                       'metrics': metrics,
                                                                       'metric_aggs': metric_aggs,
                                                                       'suffix': suffix,
                                                                       'base_dataset': train.outputs['test_set'],
                                                                       'label_col': label_column,
                                                                       'BATCHES_TO_GENERATE': batches_to_generate}).after(tester)

        labeled_stream_creator = funcs['labeled_stream'].deploy_step(env={'METRICS_TABLE': streaming_metrics_table,
                                                                          'PREDICTIONS_TABLE': streaming_predictions_table,
                                                                          'OUTPUT_STREAM': streaming_labeled_table,
                                                                          'label_col': label_column,
                                                                          'prediction_col': predictions_col,
                                                                          'BATCHES_TO_GENERATE': batches_to_generate}).after(tester)

        generator = funcs['generator'].deploy_step(env={'SAVE_TO': streaming_metrics_table,
                                                        'SECS_TO_GENERATE': secs_to_generate,
                                                        'METRICS_CONFIGURATION_FILEPATH': generator_metrics_configuration,
                                                        'BATCHES_TO_GENERATE': batches_to_generate}).after(preprocessor)
        
        with dsl.Condition(deploy_concept_drift == True):

            concept_builder = funcs['concept_drift'].deploy_step(skip_deployed=True)

            concept_drift = funcs['concept_drift'].as_step(name='concept_drift_deployer',
                                                           params={'models': concept_drift_models,
                                                                   'label_col': label_column,
                                                                   'prediction_col': predictions_col,
                                                                   'output_tsdb': output_tsdb,
                                                                   'input_stream': f'{input_stream}@cds',
                                                                   'output_stream': output_stream},
                                                           inputs={'base_dataset': test.outputs['test_set_preds']},
                                                           artifact_path=mlconf.artifact_path,
                                                           image=concept_builder.outputs['image']).after(labeled_stream_creator)

            s2p = funcs['s2p'].deploy_step(env={'window': 100,
                                                'features': metrics,
                                                'save_to': streaming_parquet_table,
                                                'base_dataset': test.outputs['test_set_preds'],
                                                'results_tsdb_container': 'users',
                                                'results_tsdb_table': results_tsdb_table,
                                                'mount_path': '/users/orz',
                                                'mount_remote': '/User'}).after(labeled_stream_creator)
    

## Add workflow

In [None]:
newproj.set_workflow('main', os.path.join(os.path.abspath(newproj.context), 'src', 'workflow.py'))

## Save Project

In [None]:
newproj.save(os.path.join(newproj.context, 'project.yaml'))

## Run the pipeline
In this cell we will run the `main` workflow via `KubeFlow Pipelines` on top of our cluster.  
Running the pipeline may take some time. Due to possible jupyter timeout, it's best to track the pipeline's progress via KFP or the MLRun UI.

In [None]:
newproj.run('main', artifact_path=artifact_path, dirty=True)