```
     _                                     ____             _       _   ____                                    
    / \   _ __ ___   __ _ _______  _ __   / ___|  ___   ___(_) __ _| | |  _ \ _ __ ___   __ _ _ __ ___  ___ ___ 
   / _ \ | '_ ` _ \ / _` |_  / _ \| '_ \  \___ \ / _ \ / __| |/ _` | | | |_) | '__/ _ \ / _` | '__/ _ \/ __/ __|
  / ___ \| | | | | | (_| |/ / (_) | | | |  ___) | (_) | (__| | (_| | | |  __/| | | (_) | (_| | | |  __/\__ \__ \
 /_/   \_\_| |_| |_|\__,_/___\___/|_| |_| |____/ \___/ \___|_|\__,_|_| |_|   |_|  \___/ \__, |_|  \___||___/___/
                                                                                        |___/                   
 Supervised Ensemble method - Extremely Randomized Trees
```

### Module
__ExtraTreesClassifier__ implements a meta estimator that fits a number of randomized decision trees (a.k.a. extra-trees) on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.

### Goal
Ceate a model that combine the predictions of several decision trees estimators.

### Tools
1. Pandas
2. scikit-learn
2. RandomForestClassifier ensemble method

### Requirement
1. File Definition
2. Data Preparation
3. hotspot_spi.csv generated
 
### Data Source
__${WORKDIR}__/data/ouptut/hotspot_spi.csv

In [1]:
import os
import pandas as pd

import functions as func
from  load_dataset import LoadDataset


from sklearn.ensemble import ExtraTreesClassifier



from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import (cross_validate, train_test_split)

## Get the data

In [2]:
load_dataset = LoadDataset()
X, y = load_dataset.return_X_y()

### Split dataset into train and test sets

In [3]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3)

print("X_train.shape:", X_train.shape, "y_train.shape:", y_train.shape)
print("X_test.shape:", X_test.shape, "y_test.shape:", y_test.shape)

X_train.shape: (1621, 49) y_train.shape: (1621,)
X_test.shape: (695, 49) y_test.shape: (695,)


## Modeling

### Getting Best Hyperparameter Optimization

*Note: The execution of the code below may take a few minutes or hours.*

*Uncomment and run it when you need to optimize hyperparameters.*

In [4]:
# space = dict()
# # space['criterion'] = ["gini", "entropy"]
# # space['splitter'] = ["best", "random"]
# # space['n_estimators'] = [n for n in range(100)]
# space['random_state'] = [n for n in range(10)]
# space['max_depth'] = [n for n in range(20)]

# func.show_best_hyperparameter_optimization(
#     ExtraTreesClassifier(), 
#     space, 
#     X_train, 
#     y_train
# )

### Building, train and predict model

In [13]:
classifier = ExtraTreesClassifier(random_state=0)
pipeline = make_pipeline(
    StandardScaler(),
    classifier
)

_ = pipeline.fit(X_train, y_train)



### Check the most relevant features for the training model

In [6]:
# func.get_feature_importances(classifier, X_train)

### Predict and show model result

In [14]:
y_predict = pipeline.predict(X_test)
func.show_model_result(pipeline, X, y, y_test, y_predict)


Computing cross-validated metrics
----------------------------------------------------------------------




Scores: [0.44827586 0.45788337 0.49460043 0.52051836 0.46868251]
Mean = 0.48 / Standard Deviation = 0.03

Confunsion Matrix
----------------------------------------------------------------------
[[116   5  62   7   3]
 [ 16  21  54   0  17]
 [ 75  28 103   1  13]
 [ 40   0   7  23   1]
 [  7  11  30   0  55]]

Classification Report
----------------------------------------------------------------------
              precision    recall  f1-score   support

           0       0.46      0.60      0.52       193
           1       0.32      0.19      0.24       108
           2       0.40      0.47      0.43       220
           3       0.74      0.32      0.45        71
           4       0.62      0.53      0.57       103

    accuracy                           0.46       695
   macro avg       0.51      0.42      0.44       695
weighted avg       0.47      0.46      0.45       695

----------------------------------------------------------------------
Accuracy: 0.46
Precicion: 0.46
Sens

