# Create a trained model

This notebook shows the necessary steps you need to take in order to train and save a model if you already know the most appropriate parameters.  
At the end of the notebook, you will have a trained model which can be delivered to the AI Inference Server in a properly formed _'Edge configuration package'_.

In [None]:
import sys
sys.path.insert(0, "../src")

### Load the training data

The `../data/raw/example.zip` is an example dataset for training a state identifier model. It contains `json` files holding labeled timeseries data in batches of 300.

The `../data/processed/` folder is a convenient place to upack the dataset, and then load it into a list of dataframes.

In [None]:
import shutil
import zipfile

from pathlib import Path

data_path = Path('../data/processed/example')

shutil.rmtree(data_path, ignore_errors=True)
with zipfile.ZipFile("../data/raw/example.zip", 'r') as zip_file:
    zip_file.extractall(data_path)

As the model works with numpy arrays, the incoming data and the expected predictions from the `DataFrame`s have to be transformed.

In [None]:
import json
import pandas

import numpy as np

from pathlib import Path

data_path = Path('../data/processed/example')

dataframes = []
for json_file in data_path.glob('*.json'):
    with open(json_file) as f:    
        data = json.load(f)
        dataframe = pandas.json_normalize(data, "measurements", ["class"])
        dataframes.append(dataframe)

X = np.array([x[["ph1","ph2","ph3"]].values for x in dataframes])
Y = np.array([y["class"].values[0] for y in dataframes])
print("Shape:", X.shape)
print("Dimensions:", X.ndim)
print("Labels:", Y)

Use a list of feature extractors to preprocess the input data, extracting the required features for the `KNeighborsClassifier` classifier.

The parameters may need to be tuned for the current usecase.

In [None]:
import tsfresh.feature_extraction.feature_calculators as fc
from si.preprocessing import positive_sum_of_changes, negative_sum_of_changes, FillMissingValues, SumColumnsTransformer

weighted_feature_list = [
    (2, [ fc.maximum, fc.minimum, fc.mean ]),
    (1, [ fc.variance, fc.standard_deviation ]),
    (1, [ fc.sum_values ]),
    (1, [ fc.absolute_sum_of_changes ]),
    (1, [ positive_sum_of_changes, negative_sum_of_changes ]),
    (1, [ fc.count_above_mean, fc.longest_strike_above_mean,  fc.longest_strike_below_mean ])
]

With the feature extractors, create a `scikit-learn` pipeline defining two steps:
- `preprocess`:
  - fills missing values, 
  - summarizes the three input feature, 
  - extracts the required features of time window
  - normalizes the extracted features
- `classify`:
  - use the trained model to classify incoming data

In [None]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler
from sklearn.neighbors import KNeighborsClassifier
from si.pipeline import FeatureTransformer

model = Pipeline([
    ('preprocess', Pipeline([
        ('fillmissing', FillMissingValues('ffill')),
        ('summarize', SumColumnsTransformer()), # summarizes the variables into one variable
        ('featurize', FeatureTransformer(function_list=weighted_feature_list)),
        ('scale', MinMaxScaler(feature_range=(0, 1)))])),
    ('classify', KNeighborsClassifier(n_neighbors=3)),
])

Train the two pipeline steps.

In [None]:
model["preprocess"].fit(X)
y = model["preprocess"].transform(X)
model["classify"].fit(y, Y)

Let's try out the classification!

In [None]:
prediction = model["classify"].predict(model["preprocess"].transform(X))
print("Training labels:")
print(Y)
print("Prediction:")
print(prediction)

### Save the model

If you are satisfied with the result, you can save the trained model as a joblib file.  
You will need this later on to create a pipeline configuration package.

In [None]:
import joblib

model_path = f"../models/bsi-model.joblib"
with open(model_path, 'wb') as fh:
    joblib.dump(model, model_path, compress=9)

### Subsequent notebooks

Notebook [20-CreateInferenceWrapper](20-CreateInferenceWrapper.ipynb) shows you how to create a Python wrapper around the model.  
Notebook [30-CreatePipelinePackage](30-CreatePipelinePackage.ipynb) demonstrates how to create the pipeline configuration package. 