# Notebook for Splunk Machine Learning Toolkit Container for TensorFlow

This notebook contains an example workflow how to work on custom containerized code that seamlessly interfaces with the Splunk Machine Learning Toolkit (MLTK) Container for TensorFlow. As an example we use a custom classifier built on keras and tensorflow.
Note: All code cells below have metadata attached to be reusable in the underlying MLTK Container for TensorFlow extension and are marked as non deletable and follow the naming convention mltkc_*. Feel free to add your own cells for expermentation but make sure every production ready code should live in the existing staging cells. By default every time you save this notebook the cells are exported into a python module which is then used for running your custom model invoked by Splunk MLTK commands. 

## Stage 0 - import libraries
At stage 0 we define all imports necessary to run our subsequent code depending on various libraries.

In [2]:
# mltkc_import
# this definition exposes all python module imports that should be available in all subsequent commands
import json
import datetime
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras

# global constants
MODEL_DIRECTORY = "/srv/app/model/data/"

In [3]:
# THIS CELL IS NOT EXPORTED - free notebook cell for testing purposes
print("numpy version: " + np.__version__)
print("pandas version: " + pd.__version__)
print("TensorFlow version: " + tf.__version__)
print("Keras version: " + keras.__version__)

numpy version: 1.16.2
pandas version: 0.24.2
TensorFlow version: 2.0.0-alpha0
Keras version: 2.2.4-tf


## Stage 1 - get a data sample from Splunk
In Splunk run a search to pipe a prepared dataset into this environment.

| inputlookup server_power.csv<br>| fit MLTKContainer mode=stage algo=linear_regressor epochs=10 batch_size=32 ac_power from total* into app:server_power_regression

After you run this search your data set sample is available as a csv inside the container to develop your model. The name is taken from the into keyword ("my_model" in the example above) or set to "default" if no into keyword is present. This step is intended to work with a subset of your data to create your custom model.

In [4]:
# mltkc_stage
# this cell is not executed from MLTK and should only be used for staging data into the notebook environment
def stage(name):
    with open("data/"+name+".csv", 'r') as f:
        df = pd.read_csv(f)
    with open("data/"+name+".json", 'r') as f:
        param = json.load(f)
    return df, param

In [5]:
# THIS CELL IS NOT EXPORTED - free notebook cell for testing purposes
df, param = stage("server_power_regression")
print(df[0:1])
print(df.shape)
print(str(param))

   ac_power  total-unhalted_core_cycles  total-instructions_retired  \
0     220.0                   4708152.0                   3924639.0   

   total-last_level_cache_references  total-memory_bus_transactions  \
0                            75140.0                         5130.0   

   total-cpu-utilization  total-disk-accesses  total-disk-blocks  \
0                   0.99                  0.0                0.0   

   total-disk-utilization  
0                     0.0  
(31271, 9)
{'options': {'args': ['ac_power', 'total*'], 'model_name': 'server_power_regression', 'algo_name': 'MLTKContainer', 'params': {'batch_size': '32', 'mode': 'stage', 'epochs': '10', 'algo': 'linear_regressor'}, 'feature_variables': ['total*'], 'mlspl_limits': {'handle_new_cat': 'default', 'use_sampling': 'true', 'streaming_apply': 'false', 'max_model_size_mb': '1500', 'max_distinct_cat_values_for_scoring': '1000', 'max_score_time': '60000', 'max_memory_usage_mb': '12000', 'max_inputs': '1000000000', 'max_fi

## Stage 2 - create and initialize a model

In [6]:
# mltkc_init
# initialize the model
# params: data and parameters
# returns the model object which will be used as a reference to call fit, apply and summary subsequently
def init(df,param):
    X = df[param['feature_variables']]
    print("FIT build model with input shape " + str(X.shape))
    learning_rate = 0.1
    model_name = "default_linear_regressor"
    if 'options' in param:
        if 'model_name' in param['options']:
            model_name = param['options']['model_name']
        if 'params' in param['options']:
            if 'learning_rate' in param['options']['params']:
                learning_rate = int(param['options']['params']['learning_rate'])

    feature_columns = []
    for feature_name in param['feature_variables']:
        feature_columns.append(tf.feature_column.numeric_column(feature_name, dtype=tf.float32))
    
    model = tf.estimator.LinearRegressor(
        feature_columns=feature_columns,
        model_dir=MODEL_DIRECTORY + model_name + "/",
    )
    return model

In [7]:
# test mltkc_stage_create_model
model = init(df,param)
print(model)

W0930 15:00:21.337473 140563149854464 estimator.py:1799] Using temporary folder as model directory: /tmp/tmp3cvxrv6b


FIT build model with input shape (31271, 8)
<tensorflow_estimator.python.estimator.canned.linear.LinearRegressorV2 object at 0x7fd70b118f60>


## Stage 3 - fit the model

In [8]:
# mltkc_stage_create_model_fit
# returns a fit info json object
def make_input_fn(df, param, n_epochs=None, batch_size=None, shuffle=True):
    def input_fn():
        dataset = tf.data.Dataset.from_tensor_slices((df[param['feature_variables']].to_dict(orient='list'), df[param['target_variables']].values))
        if shuffle:
            dataset = dataset.shuffle(buffer_size=len(df))
        return dataset.repeat(n_epochs).batch(batch_size)
    return input_fn

def fit(model,df,param):
    returns = {}
    X = df[param['feature_variables']]
    model_epochs = 100
    model_batch_size = 32
    if 'options' in param:
        if 'params' in param['options']:
            if 'epochs' in param['options']['params']:
                model_epochs = int(param['options']['params']['epochs'])
            if 'batch_size' in param['options']['params']:
                model_batch_size = int(param['options']['params']['batch_size'])
    # connect model training to tensorboard
    log_dir="/srv/notebooks/logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
    # tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)
    # run the training
    input_fn_train = make_input_fn(df,param,model_epochs,model_batch_size)
    model.train(input_fn=input_fn_train, max_steps=model_epochs)
    # memorize parameters
    returns['model_epochs'] = model_epochs
    returns['model_batch_size'] = model_batch_size
    returns['model_loss_acc'] = model.evaluate(input_fn=input_fn_train)
    return returns

In [9]:
returns = fit(model,df,param)
print(returns['model_loss_acc'])

W0930 15:00:24.546078 140563149854464 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/training/training_util.py:238: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W0930 15:00:25.221930 140563149854464 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/feature_column/feature_column_v2.py:2758: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0930 15:00:25.671239 140563149854464 deprecation.py:506] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/training/slot_creator.py:187: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a fu

{'prediction/mean': 76.48488, 'loss': 33587.402, 'average_loss': 33587.27, 'label/mean': 239.80342, 'global_step': 10}


## Stage 4 - apply the model

In [10]:
# mltkc_stage_create_model_apply
def apply(model,df,param):
    X = df[param['feature_variables']]
    model_epochs = 1
    model_batch_size = 32
    if 'options' in param:
        if 'params' in param['options']:
            if 'batch_size' in param['options']['params']:
                model_batch_size = int(param['options']['params']['batch_size'])
    output_fn_train = make_input_fn(df,param,model_epochs,model_batch_size)
    y_hat = pd.DataFrame([p['predictions'] for p in list(model.predict(output_fn_train))])
    return y_hat

In [11]:
# test mltkc_stage_create_model_apply
y_hat = apply(model,df,param)
print(y_hat)

                0
0       69.204559
1       33.916595
2       39.320969
3      269.523132
4      197.207748
5       25.040310
6       28.348602
7      326.383362
8       10.051042
9      138.244644
10     302.008362
11     217.782639
12      37.107815
13      51.944801
14     346.025787
15     115.518867
16      54.119675
17     338.388031
18      15.495632
19     261.154114
20     140.740417
21      15.510792
22      17.191122
23      24.691265
24      20.562876
25       0.680564
26      34.411102
27       5.885104
28       7.791601
29      45.855240
...           ...
31241   44.834221
31242  114.942581
31243   33.703789
31244    8.083603
31245   45.863754
31246   69.659668
31247   45.358166
31248  141.835159
31249  189.336761
31250   38.920429
31251   33.252640
31252   30.828377
31253  483.440491
31254  330.580627
31255   23.038954
31256   25.550045
31257   39.766743
31258    8.223471
31259    7.910411
31260    0.752985
31261  303.735443
31262    0.148260
31263   42.335648
31264   33

## Stage 5 - save the model

In [17]:
# save model to name in expected convention "<algo_name>_<model_name>.h5"
def save(model,name):
    # model.save(MODEL_DIRECTORY + name + ".h5")
    # serving_input_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(tf.feature_column.make_parse_example_spec([input_column]))
    # export_path = model.export_saved_model(MODEL_DIRECTORY + name +"/", serving_input_fn)
    return model

## Stage 6 - load the model

In [None]:
# load model from name in expected convention "<algo_name>_<model_name>.h5"
def load(name):
    # model = keras.models.load_model(MODEL_DIRECTORY + name + ".h5")
    return model

## Stage 7 - provide a summary of the model

In [None]:
# return model summary
def summary(model=None):
    returns = {"version": {"tensorflow": tf.__version__, "keras": keras.__version__} }
    if model is not None:
        returns["summary"] = "linear regressor"
    return returns

## End of Stages
All subsequent cells are not tagged and can be used for further freeform code