# Deep Learning Toolkit for Splunk - Forecasting with Prophet

This notebook contains an example how to use Prophet library for forecasting with the Deep Learning Toolkit for Splunk.

Note: By default every time you save this notebook the cells are exported into a python module which is then invoked by Splunk MLTK commands like <code> | fit ... | apply ... | summary </code>. Please read the Model Development Guide in the Deep Learning Toolkit app for more information.

## Stage 0 - import libraries
At stage 0 we define all imports necessary to run our subsequent code depending on various libraries.

In [1]:
# this definition exposes all python module imports that should be available in all subsequent commands
import json
import numpy as np
import pandas as pd
from fbprophet import Prophet
# ...
# global constants
MODEL_DIRECTORY = "/srv/app/model/data/"

Importing plotly failed. Interactive plots will not work.


In [8]:
# THIS CELL IS NOT EXPORTED - free notebook cell for testing or development purposes
print("numpy version: " + np.__version__)
print("pandas version: " + pd.__version__)
print("Prophet: " + str(Prophet))

numpy version: 1.18.1
pandas version: 1.0.1
Prophet: <class 'fbprophet.forecaster.Prophet'>


## Stage 1 - get a data sample from Splunk
In Splunk run a search to pipe a dataset into your notebook environment. Note: mode=stage is used in the | fit command to do this.

| inputlookup bluetooth.csv</br>
| where probe="AxisBoard-5" </br>
| timechart dc(address) as distinct_addresses span=1h </br>
| eval ds=strftime(_time, "%Y-%m-%d"), y=distinct_addresses </br>
| fit MLTKContainer mode=stage algo=prophet_forecast y from ds into app:prophet_forecast </br>

After you run this search your data set sample is available as a csv inside the container to develop your model. The name is taken from the into keyword ("barebone_model" in the example above) or set to "default" if no into keyword is present. This step is intended to work with a subset of your data to create your custom model.

In [35]:
# this cell is not executed from MLTK and should only be used for staging data into the notebook environment
def stage(name):
    with open("data/"+name+".csv", 'r') as f:
        df = pd.read_csv(f)
    with open("data/"+name+".json", 'r') as f:
        param = json.load(f)
    return df, param

In [36]:
# THIS CELL IS NOT EXPORTED - free notebook cell for testing or development purposes
df, param = stage("prophet_forecast")
print(df)
print(param)

                       ds  y
0     2006-01-11 21:00:00  5
1     2006-01-11 22:00:00  8
2     2006-01-11 23:00:00  7
3     2006-01-12 00:00:00  6
4     2006-01-12 01:00:00  2
...                   ... ..
2479  2006-04-30 19:00:00  0
2480  2006-04-30 20:00:00  0
2481  2006-04-30 21:00:00  0
2482  2006-04-30 22:00:00  0
2483  2006-04-30 23:00:00  0

[2484 rows x 2 columns]
{'options': {'params': {'mode': 'stage', 'algo': 'prophet_forecast', 'fit_range_start': '0', 'fit_range_end': '1981'}, 'args': ['y', 'ds'], 'target_variable': ['y'], 'feature_variables': ['ds'], 'model_name': 'prophet_forecast', 'algo_name': 'MLTKContainer', 'mlspl_limits': {'disabled': False, 'handle_new_cat': 'default', 'max_distinct_cat_values': '1000', 'max_distinct_cat_values_for_classifiers': '1000', 'max_distinct_cat_values_for_scoring': '1000', 'max_fit_time': '6000', 'max_inputs': '100000000', 'max_memory_usage_mb': '4000', 'max_model_size_mb': '150', 'max_score_time': '6000', 'streaming_apply': '0', 'use_sampl

## Stage 2 - create and initialize a model

In [37]:
# initialize your model
# available inputs: data and parameters
# returns the model object which will be used as a reference to call fit, apply and summary subsequently
def init(df,param):
    #X = df[param['feature_variables']]
    #Y = df[param['target_variables']]
    model = Prophet()
    return model

In [38]:
# THIS CELL IS NOT EXPORTED - free notebook cell for testing or development purposes
model = init(df,param)

## Stage 3 - fit the model

In [45]:
# train your model
# returns a fit info json object and may modify the model object
def fit(model,df,param):
    fit_range_start = int(param['options']['params']['fit_range_start'].lstrip("\"").rstrip("\""))
    fit_range_end = int(param['options']['params']['fit_range_end'].lstrip("\"").rstrip("\""))
    df_fit = df[fit_range_start:fit_range_end]
    model.fit(df_fit)
    info = {"message": "model trained on range " + str(fit_range_start)+":"+str(fit_range_end) }
    return info

In [46]:
# THIS CELL IS NOT EXPORTED - free notebook cell for testing or development purposes
print(fit(model,df,param))

INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.


{'message': 'model trained on range 0:1981'}


## Stage 4 - apply the model

In [47]:
# apply your model
# returns the calculated results
def apply(model,df,param):
    #future = model.make_future_dataframe(periods=365)
    forecast = model.predict(df)
    return forecast

In [48]:
# THIS CELL IS NOT EXPORTED - free notebook cell for testing or development purposes
print(apply(model,df,param))

                      ds     trend  yhat_lower  yhat_upper  trend_lower  \
0    2006-01-11 21:00:00  1.943216    2.987323    7.730210     1.943216   
1    2006-01-11 22:00:00  1.942276    2.500524    7.375041     1.942276   
2    2006-01-11 23:00:00  1.941337    1.952633    6.871047     1.941337   
3    2006-01-12 00:00:00  1.940398    1.914755    6.701869     1.940398   
4    2006-01-12 01:00:00  1.939458    1.913381    6.658374     1.939458   
...                  ...       ...         ...         ...          ...   
2479 2006-04-30 19:00:00  2.379284   -0.060304    4.983714     2.234225   
2480 2006-04-30 20:00:00  2.379181    0.825049    5.713774     2.233819   
2481 2006-04-30 21:00:00  2.379078    0.971199    5.733541     2.233412   
2482 2006-04-30 22:00:00  2.378976    0.473327    5.370917     2.233070   
2483 2006-04-30 23:00:00  2.378873    0.041037    4.795607     2.232487   

      trend_upper  additive_terms  additive_terms_lower  additive_terms_upper  \
0        1.943216 

## Stage 5 - save the model

In [None]:
# save model to name in expected convention "<algo_name>_<model_name>"
def save(model,name):
    model = {}
    return model

## Stage 6 - load the model

In [None]:
# load model from name in expected convention "<algo_name>_<model_name>"
def load(name):
    model = {}
    return model

## Stage 7 - provide a summary of the model

In [None]:
# return a model summary
def summary(model=None):
    returns = {"version": {"numpy": np.__version__, "pandas": pd.__version__} }
    return returns

## End of Stages
All subsequent cells are not tagged and can be used for further freeform code