```
     _                                     ____             _       _   ____                                    
    / \   _ __ ___   __ _ _______  _ __   / ___|  ___   ___(_) __ _| | |  _ \ _ __ ___   __ _ _ __ ___  ___ ___ 
   / _ \ | '_ ` _ \ / _` |_  / _ \| '_ \  \___ \ / _ \ / __| |/ _` | | | |_) | '__/ _ \ / _` | '__/ _ \/ __/ __|
  / ___ \| | | | | | (_| |/ / (_) | | | |  ___) | (_) | (__| | (_| | | |  __/| | | (_) | (_| | | |  __/\__ \__ \
 /_/   \_\_| |_| |_|\__,_/___\___/|_| |_| |____/ \___/ \___|_|\__,_|_| |_|   |_|  \___/ \__, |_|  \___||___/___/
                                                                                        |___/                   
 Supervised Support vector machines (SVMs)
```

### Module
__ExtraTreesClassifier__ implements a meta estimator that fits a number of randomized decision trees (a.k.a. extra-trees) on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.

### Goal
Ceate a model that combine the predictions of several decision trees estimators.

### Tools
1. Pandas
2. scikit-learn
3. SVM

### Requirement
1. File Definition
2. Data Preparation
3. hotspot_spi.csv generated
 
### Data Source
__${WORKDIR}__/data/ouptut/hotspot_spi.csv

In [1]:
import pandas as pd

import functions as func
from  load_dataset import LoadDataset

from sklearn import svm

from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

## Get the data

In [2]:
load_dataset = LoadDataset()
X, y = load_dataset.return_X_y()

### Split dataset into train and test sets

In [3]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

print("X_train.shape:", X_train.shape, "y_train.shape:", y_train.shape)
print("X_test.shape:", X_test.shape, "y_test.shape:", y_test.shape)

X_train.shape: (1737, 49) y_train.shape: (1737,)
X_test.shape: (579, 49) y_test.shape: (579,)


## Modeling

### Getting Best Hyperparameter Optimization

*Note: The execution of the code below may take a few minutes or hours.*

*Uncomment and run it when you need to optimize hyperparameters.*

In [4]:
# space = dict()
# space['kernel'] = ['linear', 'poly', 'rbf', 'sigmoid', 'precomputed']
# space['splitter'] = ["best", "random"]
# space['n_estimators'] = [n for n in range(100)]
# space['random_state'] = [n for n in range(10)]
# space['max_depth'] = [n for n in range(20)]

# func.show_best_hyperparameter_optimization(
#     svm.SVC(), 
#     space, 
#     X_train, 
#     y_train
# )

### Building, train and predict model

In [5]:
classifier = svm.SVC(decision_function_shape="ovo")
pipeline = make_pipeline(
    StandardScaler(),
    classifier
)

_ = pipeline.fit(X_train, y_train)

### Check the most relevant features for the training model

In [6]:
# func.get_feature_importances(classifier, X_train)

### Predict and show model result

In [7]:
y_predict = pipeline.predict(X_test)
func.show_model_result(pipeline, X, y, y_test, y_predict)


Computing cross-validated metrics
----------------------------------------------------------------------
Scores: [0.4375     0.47300216 0.46868251 0.47300216 0.44492441]
Mean = 0.46 / Standard Deviation = 0.02

Confunsion Matrix
----------------------------------------------------------------------
[[100   0  67   0   0]
 [  5   2  81   0  13]
 [ 45   0 115   0   5]
 [ 48   0   4  12   1]
 [  7   6  33   0  35]]

Classification Report
----------------------------------------------------------------------
              precision    recall  f1-score   support

           0       0.49      0.60      0.54       167
           1       0.25      0.02      0.04       101
           2       0.38      0.70      0.49       165
           3       1.00      0.18      0.31        65
           4       0.65      0.43      0.52        81

    accuracy                           0.46       579
   macro avg       0.55      0.39      0.38       579
weighted avg       0.50      0.46      0.41       579

