# Time Series Classification with Flow Forecast (FF)

[Flow Forecast is a deep learning for time series forecasting, classification, and anomaly detection library built in PyTorch](https://github.com/AIStream-Peelout/flow-forecast). In this notebook we will go over how to use Flow Forecast for time series classification. We will be working with the human activity recognition dataset on Kaggle.

In [None]:
import os
from kaggle_secrets import UserSecretsClient
!git clone https://github.com/AIStream-Peelout/flow-forecast.git -b 1_classification_support_full
os.chdir('flow-forecast')
!pip install -r  requirements.txt
!python setup.py develop
user_secrets = UserSecretsClient()
secret_value_0 = user_secrets.get_secret("WANDB_KEY")
os.environ["WANDB_API_KEY"] = secret_value_0


In [None]:
from flood_forecast.trainer import train_function

## Data Preprocessing

Like with all things machine learning we will have to do some basic data preprocessing to get the data in Flow Forecast (FF) format. Here we will simply add a column called "labels." This column will consist of encoded labels of activities in the dataset.

In [None]:
import pandas as pd
df = pd.read_csv("../../input/human-activity-recognition-with-smartphones/train.csv")
df

In [None]:
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
labels = le.fit_transform(df["Activity"])
df["labels"] = labels

In [None]:
df.to_csv("train.csv")

In [None]:
df = pd.read_csv("../../input/human-activity-recognition-with-smartphones/train.csv")
labels = le.transform(df["Activity"])
df["labels"] = labels
df.to_csv("test.csv")

## Training the Model

So these parameters will be very similar to the forcasting parameters. The main difference here is the parameters that are classification specific:

- `dataset_params.n_classes`: The number of classes in your multi-label classification problem.
- `model_params.output_dim`: This should match the `n_classes` parameter supplied to the data-loader
- `dataset_params.sequence_length`: The length of your time series sequences that you pass to the model

The rest of the parameters should follow the normal FF format:

In [None]:
the_config = {                 
   "model_name": "CustomTransformerDecoder",
   "n_targets": 12,
   "model_type": "PyTorch",
    "model_params": {
      "n_time_series":19,
      "seq_length":26,
      "output_seq_length": 1, 
      "output_dim":12,
      "n_layers_encoder": 6
     }, 
    "dataset_params":
    { "class": "GeneralClassificationLoader",
      "n_classes": 9,
       "training_path": "train.csv",
       "validation_path": "train.csv",
       "test_path": "test.csv",
       "sequence_length":26,
       "batch_size":4,
       "forecast_history":26,
       "train_end": 4500,
       "valid_start":4501,
       "valid_end": 7000,
       "target_col": ["labels"],
       "relevant_cols": ["labels"] + df.columns.tolist()[:19],
       "scaler": "StandardScaler", 
       "interpolate": False
    },

    "training_params":
    {
       "criterion":"CrossEntropyLoss",
       "optimizer": "Adam",
       "optim_params":
       {},
       "lr": 0.3,
       "epochs": 4,
       "batch_size":4
    },
    "GCS": False,
   
    "wandb": {
       "name": "flood_forecast_circleci",
       "tags": ["dummy_run", "circleci", "multi_head", "classification"],
       "project": "repo-flood_forecast"
    },
   "forward_params":{},
   "metrics":["CrossEntropyLoss"]
}
    


In [None]:
train_function("PyTorch", the_config)

In [None]:
the_config

In [None]:
df.columns.tolist()[:19]

In [None]:
len(df)