# Cisco Time Series Model 

This notebook supports time series data forecasting algorithm based on Cisco Time Series Model. 

The model checkpoint (available on huggingface) is downloaded automatically at first execution of this algorithm. 
After the initial download, the model will always be loaded from the local path /srv/app/model/data.

To update model version, use the hf_repo parameter in the Fit command to download the updated model version.

Example Fit command:

\<Search of time series data\> | table _time Number | sort _time **| fit MLTKContainer algo=tsfm_forecast hf_repo="repo_name" local_path="/srv/app/model/data/model.pt" value_field="Number" forecast_steps=128 * into app:tsfm_forecast**

* MODEL LOADING PARAMETER OPTIONS
  * hf_repo: The repo name to download the model. Will set to Splunk default repo if not specified
  * local_path: (OPTIONAL) If you have downloaded the model checkpoint to your local path and would like to load it directly from local, input the path to the model file
  * If non specified, it will download model from default Huggingface repo
* value_field: The field name of the time series value from the search result
* forecast_steps: Future steps to predict (minutes)

The command will return a list of predicted value for each future step, including P10 to P90 and mean

**NOTE**:
* The model forecasting is conducted on existing timestamps from the input dataset based on the prediction_length (last steps of the time series).
* To predict the future states of the time series, please pad the current time series data with future timestamps and assign 0 to the future time series values.
* Refer to the exaple SPL below to see about the time series padding:

Example SPL with timestamp padding:

| inputlookup internet_traffic.csv | head 10000 | timechart span=5min avg("bits_transferred") as bits_transferred
| eval bits_transferred = bits_transferred / 1024 / 1024
| sort _time

```
Adding data point padding to continue the timeseries for forecasting
```
| append [| makeresults count=128 | eval bits_transferred=0, _time = 0 | streamstats count as pad ]
| eventstats latest(_time) as latest_timestamp
| eval _time=if(pad>0, latest_timestamp + pad*300, _time)
| table _time bits_transferred

```
Forecasting the padded time series with TSFM
```
| fit MLTKContainer algo=tsfm_forecast value_field="bits_transferred" forecast_steps=128 * into app:tsfm_forecast
| tail 5000
| table _time bits_transferred predicted_*

## Stage 0 - import libraries
At stage 0 we define all imports necessary to run our subsequent code depending on various libraries.

In [3]:
# this definition exposes all python module imports that should be available in all subsequent commands
import json
import numpy as np
import pandas as pd
import os
from tqdm import tqdm, trange
from typing import List, Tuple
import torch
from datasets import Dataset as HFDataset
from huggingface_hub import snapshot_download
from app.model.patched_decoder_multi_resolution import PatchedTSMultiResolutionDecoder,TimesfmMRConfig
from app.model.timesfm_multi_resolution import TimesFmMRTorch, TimesFmTorch
from timesfm.pytorch_patched_decoder import create_quantiles
from timesfm import TimesFmHparams, TimesFmCheckpoint
# ...
# global constants
MODEL_DIRECTORY = "/srv/app/model/data/"

## Stage 1 - get a data sample from Splunk

In [8]:
# this cell is not executed from MLTK and should only be used for staging data into the notebook environment
def stage(name):
    with open("data/"+name+".csv", 'r') as f:
        df = pd.read_csv(f)
    with open("data/"+name+".json", 'r') as f:
        param = json.load(f)
    return df, param

In [9]:
# THIS CELL IS NOT EXPORTED - free notebook cell for testing or development purposes
df, param = stage("tsfm_forecast")

In [10]:
df

Unnamed: 0,_time,bits_transferred
0,1118152800,3397.254111
1,1118153100,3538.337298
2,1118153400,3697.843268
3,1118153700,3696.780082
4,1118154000,4370.253163
...,...,...
9995,1121151300,8144.962705
9996,1121151600,8074.665832
9997,1121151900,8217.367633
9998,1121152200,7817.648371


In [11]:
param

{'options': {'params': {'mode': 'stage',
   'algo': 'tsfm_forecast',
   'value_field': '"bits_transferred"',
   'forecast_steps': '128',
   'local_path': '"/srv/app/model/data/splunk--timeseries_foundation_model_v1/model.pt"'},
  'args': ['*'],
  'feature_variables': ['*'],
  'model_name': 'tsfm_forecast',
  'algo_name': 'MLTKContainer',
  'mlspl_limits': {'disabled': False,
   'handle_new_cat': 'default',
   'max_distinct_cat_values': '100',
   'max_distinct_cat_values_for_classifiers': '100',
   'max_distinct_cat_values_for_scoring': '100',
   'max_fit_time': '600',
   'max_inputs': '100000',
   'max_memory_usage_mb': '4000',
   'max_model_size_mb': '30',
   'max_score_time': '600',
   'use_sampling': '1'},
  'kfold_cv': None},
 'feature_variables': ['_time', 'bits_transferred']}

## Stage 2 - create and initialize a model

In [25]:
# initialize your model
# available inputs: data and parameters
# returns the model object which will be used as a reference to call fit, apply and summary subsequently
def init(df,param):
    
    hp = TimesFmHparams(
        context_len=512,
        horizon_len=128,
        num_layers=50,
        use_positional_embedding=False,
        backend="gpu" if torch.cuda.is_available() else "cpu",
    )
    
    try:
        hf_repo = param['options']['params']['hf_repo'].strip("\"")
    except:
        # Need to change to correct default path
        hf_repo = "cisco-ai/cisco-time-series-model-1.0-preview"

    try:
        local_path = param['options']['params']['local_path'].strip("\"")
    except:
        local_path = None
    print(local_path)
    if local_path:
        try:
            print(local_path)
            ckpt = TimesFmCheckpoint(path=local_path)
            model_inst = TimesFmMRTorch(
                hparams=hp,
                checkpoint=ckpt,
                use_multi_resolution=True,
                use_special_token_s=True,
            )
        except:
            # Load from Huggingface instead
            ckpt = TimesFmCheckpoint(huggingface_repo_id=hf_repo)
            model_inst = TimesFmMRTorch(
                hparams=hp,
                checkpoint=ckpt,
                use_multi_resolution=True,
                use_special_token_s=True,
            )
    else:
        ckpt = TimesFmCheckpoint(huggingface_repo_id=hf_repo)
        model_inst = TimesFmMRTorch(
            hparams=hp,
            checkpoint=ckpt,
            use_multi_resolution=True,
            use_special_token_s=True,
        )

    return model_inst

In [None]:
# THIS CELL IS NOT EXPORTED - free notebook cell for testing or development purposes
model = init(df,param)
print(model._model)

## Stage 3 - fit the model

In [15]:
# train your model
# returns a fit info json object and may modify the model object
def fit(model,df,param):
    # model.fit()
    info = {"message": "No model training required"}
    return info

In [16]:
# THIS CELL IS NOT EXPORTED - free notebook cell for testing or development purposes
print(fit(model,df,param))

{'message': 'No model training required'}


## Stage 4 - apply the model

In [21]:
# apply your model
# returns the calculated results
def apply(model,df,param):
    try:
        PREDICTION_LENGTH = int(param['options']['params']['forecast_steps'].strip("\""))
    except:
        PREDICTION_LENGTH = 128 
    try:
        value_field = param['options']['params']['value_field'].strip("\"")
    except:
        cols={'Message': ["ERROR: Please input parameter \'value_field\' indicating the value field of the time series data"]}
        returns=pd.DataFrame(data=cols)
        return returns
    try:
        series_list = [df[value_field].values.tolist()[:-PREDICTION_LENGTH]]
    except:
        cols={'Message': ["ERROR: Failed to load time series data. Make sure your value_field name is correct."]}
        returns=pd.DataFrame(data=cols)
        return returns
        
    # Aggregation factor for low-resolution (i.e. 1-min -> 60-min)
    agg_factor = 60

    # Inference for forecasting
    mean, full = model.forecast(series_list, agg_factor=agg_factor)

    # Obtain mean and quantiles of the forecasted series
    means = series_list[0] + mean.tolist()[0]
    p10 = series_list[0] + full[0,:,1].tolist() 
    p20 = series_list[0] + full[0,:,2].tolist() 
    p30 = series_list[0] + full[0,:,3].tolist() 
    p40 = series_list[0] + full[0,:,4].tolist() 
    p50 = series_list[0] + full[0,:,5].tolist() 
    p60 = series_list[0] + full[0,:,6].tolist() 
    p70 = series_list[0] + full[0,:,7].tolist() 
    p80 = series_list[0] + full[0,:,8].tolist() 
    p90 = series_list[0] + full[0,:,9].tolist() 
    
    cols = {'mean': means, 'p10': p10, 'p20': p20, 'p30': p30, 'p40': p40, 'p50': p50, 'p60': p60, 'p70': p70, 'p80': p80, 'p90': p90}

    result = pd.DataFrame(cols)
    
    return result

In [22]:
# THIS CELL IS NOT EXPORTED - free notebook cell for testing or development purposes
cols = apply(model,df,param)

## Stage 5 - save the model

In [16]:
# save model to name in expected convention "<algo_name>_<model_name>"
def save(model,name):
    model = {}
    return model

## Stage 6 - load the model

In [17]:
# load model from name in expected convention "<algo_name>_<model_name>"
def load(name):
    model = {}
    return model

## Stage 7 - provide a summary of the model

In [18]:
# return a model summary
def summary(model=None):
    returns = {"version": {"numpy": np.__version__, "pandas": pd.__version__} }
    return returns

## End of Stages
All subsequent cells are not tagged and can be used for further freeform code