# Deep Learning Toolkit for Splunk - UMAP

This notebook contains an example for UMAP that seamlessly interfaces with the Deep Learning Toolkit for Splunk.<br>
<a href="https://umap-learn.readthedocs.io/en/latest/api.html">https://umap-learn.readthedocs.io/en/latest/api.html</a>

Note: By default every time you save this notebook the cells are exported into a python module which is then invoked by Splunk MLTK commands like <code> | fit ... | apply ... | summary </code>. Please read the Model Development Guide in the Deep Learning Toolkit app for more information.

## Stage 0 - import libraries
At stage 0 we define all imports necessary to run our subsequent code depending on various libraries.

In [9]:
# this definition exposes all python module imports that should be available in all subsequent commands
import json
import numpy as np
import pandas as pd
import umap
# ...
# global constants
MODEL_DIRECTORY = "/srv/app/model/data/"

In [10]:
# THIS CELL IS NOT EXPORTED - free notebook cell for testing or development purposes
print("numpy version: " + np.__version__)
print("pandas version: " + pd.__version__)

numpy version: 1.18.1
pandas version: 1.0.1


## Stage 1 - get a data sample from Splunk
In Splunk run a search to pipe a dataset into your notebook environment. Note: mode=stage is used in the | fit command to do this.

| inputlookup diabetes.csv <br>
| fit MLTKContainer mode=stage algo=umap n_components=3 BMI age blood_pressure diabetes_pedigree glucose_concentration number_pregnant serum_insulin skin_thickness into app:diabetes_umap as umap

After you run this search your data set sample is available as a csv inside the container to develop your model. The name is taken from the into keyword ("barebone_model" in the example above) or set to "default" if no into keyword is present. This step is intended to work with a subset of your data to create your custom model.

In [11]:
# this cell is not executed from MLTK and should only be used for staging data into the notebook environment
def stage(name):
    with open("data/"+name+".csv", 'r') as f:
        df = pd.read_csv(f)
    with open("data/"+name+".json", 'r') as f:
        param = json.load(f)
    return df, param

In [12]:
# THIS CELL IS NOT EXPORTED - free notebook cell for testing or development purposes
df, param = stage("diabetes_umap")
print(df.describe())
print(param)

       number_pregnant  glucose_concentration  blood_pressure  skin_thickness  \
count       768.000000             768.000000      768.000000      768.000000   
mean          3.845052             120.894531       69.105469       20.536458   
std           3.369578              31.972618       19.355807       15.952218   
min           0.000000               0.000000        0.000000        0.000000   
25%           1.000000              99.000000       62.000000        0.000000   
50%           3.000000             117.000000       72.000000       23.000000   
75%           6.000000             140.250000       80.000000       32.000000   
max          17.000000             199.000000      122.000000       99.000000   

       serum_insulin         BMI  diabetes_pedigree         age    response  
count     768.000000  768.000000         768.000000  768.000000  768.000000  
mean       79.799479   31.992578           0.471876   33.240885    0.348958  
std       115.244002    7.884160    

## Stage 2 - create and initialize a model

In [13]:
# initialize your model
# available inputs: data and parameters
# returns the model object which will be used as a reference to call fit, apply and summary subsequently
def init(df,param):
    model = {}
    return model

In [17]:
# THIS CELL IS NOT EXPORTED - free notebook cell for testing or development purposes
model = init(df,param)
print(model)

{}


## Stage 3 - fit the model

In [18]:
# train your model
# returns a fit info json object and may modify the model object
def fit(model,df,param):
    # model.fit()
    info = {"message": "model trained"}
    return info

In [19]:
# THIS CELL IS NOT EXPORTED - free notebook cell for testing or development purposes
print(fit(model,df,param))

{'message': 'model trained'}


## Stage 4 - apply the model

In [20]:
# apply your model
# returns the calculated results
def apply(model,df,param):
    X = df[param['feature_variables']]
    p = {
        "n_neighbors": 15,
        "n_components": 2
    }
    min_confidence = 0.0
    if 'options' in param:
        if 'params' in param['options']:
            for k in p.keys():
                if k in param['options']['params']:
                    p[k] = param['options']['params'][k]
    
    #reducer = umap.UMAP(random_state=42)
    reducer = umap.UMAP(
        random_state=42, 
        n_neighbors=int(p['n_neighbors']),
        n_components=int(p['n_components'])
    )

    embedding = reducer.fit_transform(X)
    result = pd.DataFrame(embedding)
    return result

In [21]:
# THIS CELL IS NOT EXPORTED - free notebook cell for testing or development purposes
print(apply(model,df,param))

             0          1
0    15.441371   9.171429
1    12.556065   7.874014
2    15.691984  11.179146
3    -0.318175   4.113552
4    -3.303875   6.644968
..         ...        ...
763  -4.314896   7.316275
764  14.517242   7.850934
765  -1.093813   5.739352
766  13.944445  12.655777
767  12.764308   7.787235

[768 rows x 2 columns]


## Stage 5 - save the model

In [None]:
# save model to name in expected convention "<algo_name>_<model_name>"
def save(model,name):
    return model

## Stage 6 - load the model

In [None]:
# load model from name in expected convention "<algo_name>_<model_name>"
def load(name):
    model = {}
    return model

## Stage 7 - provide a summary of the model

In [None]:
# return a model summary
def summary(model=None):
    returns = {"version": {"numpy": np.__version__, "pandas": pd.__version__} }
    return returns

## End of Stages
All subsequent cells are not tagged and can be used for further freeform code