# Splunk App for Data Science and Deep Learning - Forecasting with Prophet

This notebook contains an example how to use the Prophet library for forecasting with the Splunk App for Data Science and Deep Learning.

Note: By default every time you save this notebook the cells are exported into a python module which is then invoked by Splunk MLTK commands like <code> | fit ... | apply ... | summary </code>. Please read the Model Development Guide in the Deep Learning Toolkit app for more information.

## Stage 0 - import libraries
At stage 0 we define all imports necessary to run our subsequent code depending on various libraries.

In [3]:
# this definition exposes all python module imports that should be available in all subsequent commands
import json
import numpy as np
import pandas as pd
import prophet
# ...
# global constants
MODEL_DIRECTORY = "/srv/app/model/data/"

Importing plotly failed. Interactive plots will not work.


In [4]:
# THIS CELL IS NOT EXPORTED - free notebook cell for testing or development purposes
print("numpy version: " + np.__version__)
print("pandas version: " + pd.__version__)
print("Prophet: " + prophet.__version__)

numpy version: 1.25.2
pandas version: 2.1.1
Prophet: 1.1.5


## Stage 1 - get a data sample from Splunk
In Splunk run a search to pipe a dataset into your notebook environment. Note: mode=stage is used in the | fit command to do this.

| inputlookup bluetooth.csv</br>
| where probe="AxisBoard-5" </br>
| timechart dc(address) as distinct_addresses span=1h </br>
| eval ds=strftime(_time, "%Y-%m-%d"), y=distinct_addresses </br>
| fit MLTKContainer mode=stage algo=prophet_forecast fit_range_start=0 fit_range_end=1981 y from ds into app:prophet_forecast </br>

After you run this search your data set sample is available as a csv inside the container to develop your model. The name is taken from the into keyword ("barebone_model" in the example above) or set to "default" if no into keyword is present. This step is intended to work with a subset of your data to create your custom model.

In [14]:
# this cell is not executed from MLTK and should only be used for staging data into the notebook environment
def stage(name):
    with open("data/"+name+".csv", 'r') as f:
        df = pd.read_csv(f)
    with open("data/"+name+".json", 'r') as f:
        param = json.load(f)
    return df, param

In [15]:
# THIS CELL IS NOT EXPORTED - free notebook cell for testing or development purposes
df, param = stage("prophet_forecast")
print(df)
print(param)

              ds  y
0     2006-01-11  5
1     2006-01-11  8
2     2006-01-11  7
3     2006-01-12  6
4     2006-01-12  2
...          ... ..
2116  2006-04-10  1
2117  2006-04-10  1
2118  2006-04-10  1
2119  2006-04-10  1
2120  2006-04-10  1

[2121 rows x 2 columns]
{'options': {'params': {'mode': 'stage', 'algo': 'prophet_forecast', 'fit_range_start': '0', 'fit_range_end': '1981'}, 'args': ['y', 'ds'], 'target_variable': ['y'], 'feature_variables': ['ds'], 'model_name': 'prophet_forecast', 'algo_name': 'MLTKContainer', 'mlspl_limits': {'disabled': False, 'handle_new_cat': 'default', 'max_distinct_cat_values': '10000', 'max_distinct_cat_values_for_classifiers': '10000', 'max_distinct_cat_values_for_scoring': '10000', 'max_fit_time': '6000', 'max_inputs': '10000000', 'max_memory_usage_mb': '16000', 'max_model_size_mb': '3000', 'max_score_time': '6000', 'use_sampling': '1'}, 'kfold_cv': None}, 'feature_variables': ['ds'], 'target_variables': ['y']}


## Stage 2 - create and initialize a model

In [16]:
# initialize your model
# available inputs: data and parameters
# returns the model object which will be used as a reference to call fit, apply and summary subsequently
def init(df,param):
    model = prophet.Prophet()
    return model

In [17]:
# THIS CELL IS NOT EXPORTED - free notebook cell for testing or development purposes
model = init(df,param)

## Stage 3 - fit the model

In [18]:
# train your model
# returns a fit info json object and may modify the model object
def fit(model,df,param):
    fit_range_start = int(param['options']['params']['fit_range_start'].lstrip("\"").rstrip("\""))
    fit_range_end = int(param['options']['params']['fit_range_end'].lstrip("\"").rstrip("\""))
    df_fit = df[fit_range_start:fit_range_end]
    model.fit(df_fit)
    info = {"message": "model trained on range " + str(fit_range_start)+":"+str(fit_range_end) }
    return info

In [19]:
# THIS CELL IS NOT EXPORTED - free notebook cell for testing or development purposes
print(fit(model,df,param))

13:54:46 - cmdstanpy - INFO - Chain [1] start processing
13:54:46 - cmdstanpy - INFO - Chain [1] done processing


{'message': 'model trained on range 0:1981'}


## Stage 4 - apply the model

In [20]:
# apply your model
# returns the calculated results
def apply(model,df,param):
    #future = model.make_future_dataframe(periods=365)
    forecast = model.predict(df)
    changepoints = pd.DataFrame(model.changepoints)
    changepoints['changepoint'] = 1
    result = pd.concat([forecast, changepoints], axis=1)
    return result

In [21]:
# THIS CELL IS NOT EXPORTED - free notebook cell for testing or development purposes
apply(model,df,param)

Unnamed: 0,ds,trend,yhat_lower,yhat_upper,trend_lower,trend_upper,additive_terms,additive_terms_lower,additive_terms_upper,weekly,weekly_lower,weekly_upper,multiplicative_terms,multiplicative_terms_lower,multiplicative_terms_upper,yhat,ds.1,changepoint
0,2006-01-11,1.190388,-1.450200,5.037910,1.190388,1.190388,0.529852,0.529852,0.529852,0.529852,0.529852,0.529852,0.0,0.0,0.0,1.720240,NaT,
1,2006-01-11,1.190388,-1.672855,4.968082,1.190388,1.190388,0.529852,0.529852,0.529852,0.529852,0.529852,0.529852,0.0,0.0,0.0,1.720240,NaT,
2,2006-01-11,1.190388,-1.560725,4.967689,1.190388,1.190388,0.529852,0.529852,0.529852,0.529852,0.529852,0.529852,0.0,0.0,0.0,1.720240,NaT,
3,2006-01-12,1.206666,-1.218421,5.187648,1.206666,1.206666,0.772286,0.772286,0.772286,0.772286,0.772286,0.772286,0.0,0.0,0.0,1.978952,NaT,
4,2006-01-12,1.206666,-1.271108,5.226345,1.206666,1.206666,0.772286,0.772286,0.772286,0.772286,0.772286,0.772286,0.0,0.0,0.0,1.978952,NaT,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2116,2006-04-10,2.744948,-0.865130,5.780842,2.744510,2.745344,-0.478032,-0.478032,-0.478032,-0.478032,-0.478032,-0.478032,0.0,0.0,0.0,2.266916,NaT,
2117,2006-04-10,2.744948,-0.890093,5.692588,2.744504,2.745349,-0.478032,-0.478032,-0.478032,-0.478032,-0.478032,-0.478032,0.0,0.0,0.0,2.266916,NaT,
2118,2006-04-10,2.744948,-0.977771,5.483853,2.744499,2.745354,-0.478032,-0.478032,-0.478032,-0.478032,-0.478032,-0.478032,0.0,0.0,0.0,2.266916,NaT,
2119,2006-04-10,2.744948,-0.976668,5.550365,2.744496,2.745358,-0.478032,-0.478032,-0.478032,-0.478032,-0.478032,-0.478032,0.0,0.0,0.0,2.266916,NaT,


## Stage 5 - save the model

In [22]:
# save model to name in expected convention "<algo_name>_<model_name>"
def save(model,name):
    model = {}
    return model

## Stage 6 - load the model

In [23]:
# load model from name in expected convention "<algo_name>_<model_name>"
def load(name):
    model = {}
    return model

## Stage 7 - provide a summary of the model

In [24]:
# return a model summary
def summary(model=None):
    returns = {"version": {"numpy": np.__version__, "pandas": pd.__version__} }
    return returns

## End of Stages
All subsequent cells are not tagged and can be used for further freeform code