```
     _                                     ____             _       _   ____                                    
    / \   _ __ ___   __ _ _______  _ __   / ___|  ___   ___(_) __ _| | |  _ \ _ __ ___   __ _ _ __ ___  ___ ___ 
   / _ \ | '_ ` _ \ / _` |_  / _ \| '_ \  \___ \ / _ \ / __| |/ _` | | | |_) | '__/ _ \ / _` | '__/ _ \/ __/ __|
  / ___ \| | | | | | (_| |/ / (_) | | | |  ___) | (_) | (__| | (_| | | |  __/| | | (_) | (_| | | |  __/\__ \__ \
 /_/   \_\_| |_| |_|\__,_/___\___/|_| |_| |____/ \___/ \___|_|\__,_|_| |_|   |_|  \___/ \__, |_|  \___||___/___/
                                                                                        |___/                   
```

### Module
__VotingClassifier__ Soft Voting/Majority Rule classifier for unfitted estimators.

### Goal
Investigating the relationship between independent variables or features and a dependent variable or outcome.

### Tools
1. Pandas
2. scikit-learn
3. VotingClassifier

### Requirement
1. File Definition
2. Data Preparation
3. hotspot_spi.csv generated
 
### Data Source
__${WORKDIR}__/data/ouptut/hotspot_spi.csv

In [1]:
import os
import sys

supervised_dir = os.path.normpath(os.getcwd() + os.sep + os.pardir)
sys.path.append(supervised_dir)
sys.path

['/home/fausto/Development/workspace/amazon-social-progress/ml_models/supervised/classifier',
 '/opt/anaconda3/lib/python39.zip',
 '/opt/anaconda3/lib/python3.9',
 '/opt/anaconda3/lib/python3.9/lib-dynload',
 '',
 '/opt/anaconda3/lib/python3.9/site-packages',
 '/home/fausto/Development/workspace/amazon-social-progress/ml_models/supervised']

In [2]:
import pandas as pd
import numpy as np

import functions_classifier as func
from  load_dataset import LoadDataset, SpiType

from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier

from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler, scale
from sklearn.model_selection import train_test_split

## Get the data

In [3]:
load_dataset = LoadDataset()
X, y = load_dataset.return_X_y_clf()

X = scale(X)

In [4]:
print("X.shape:", X.shape, "y.shape:", y.shape)

X.shape: (2313, 49) y.shape: (2313,)


### Split dataset into train and test sets

In [5]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3, random_state=42)

print("X_train.shape:", X_train.shape, "y_train.shape:", y_train.shape)
print("X_test.shape:", X_test.shape, "y_test.shape:", y_test.shape)

X_train.shape: (1619, 49) y_train.shape: (1619,)
X_test.shape: (694, 49) y_test.shape: (694,)


## Modeling

### Building, train and predict model

In [6]:
tree_params = {
    "criterion": "entropy", 
    "max_depth": 100, 
    "min_samples_leaf": 40, 
    "min_samples_split": 2, 
    "splitter": "best"
}
tree = DecisionTreeClassifier(**tree_params)


svc_params = {
    "C": 3.0, 
    "degree": 3, 
    "gamma": "auto", 
    "tol": 1e-3
}
svc = SVC(**svc_params)

random_params = {
    "criterion": "gini", 
    "max_depth": 100, 
    "min_samples_leaf": 20, 
    "min_samples_split": 3, 
    "n_estimators": 200
}
random = RandomForestClassifier(**random_params)

classifier = VotingClassifier(
    estimators=[
        ('tree', tree), 
        ('svc', svc), 
        ('random_forrest', random)
    ],
    voting='hard')

pipeline = make_pipeline(
    StandardScaler(),
    classifier,
    random
)
_ = pipeline.fit(X_train, y_train)
y_predict = pipeline.predict(X_test)

### Predict and show model result

In [7]:
func.show_model_result(pipeline, X, y, y_test, y_predict)


Computing cross-validated metrics
----------------------------------------------------------------------
Scores: [0.58315335 0.59827214 0.59179266 0.53679654 0.53246753]
Mean = 0.57 / Standard Deviation = 0.03

Confunsion Matrix
----------------------------------------------------------------------
[[ 91  62  12   3]
 [ 28  77  45  10]
 [  4  59  86  29]
 [  0  19  34 135]]

Classification Report
----------------------------------------------------------------------
              precision    recall  f1-score   support

          Q1       0.74      0.54      0.63       168
          Q2       0.35      0.48      0.41       160
          Q3       0.49      0.48      0.48       178
          Q4       0.76      0.72      0.74       188

    accuracy                           0.56       694
   macro avg       0.59      0.56      0.56       694
weighted avg       0.59      0.56      0.57       694

----------------------------------------------------------------------
Accuracy: 0.56
Precici

### Show Individual Model Accuray

In [8]:
models = [tree, svc, random, pipeline]
models_names = ['DecisionTreeClassifier', 'SupportVectorClassification', 'RandomForestClassifier', 'Ensemble']
func.get_ensemble_model_accuracy(models, models_names, X, y)

Accuracy: 0.48 (+/- 0.04) [DecisionTreeClassifier]
Accuracy: 0.57 (+/- 0.02) [SupportVectorClassification]
Accuracy: 0.54 (+/- 0.02) [RandomForestClassifier]
Accuracy: 0.57 (+/- 0.03) [Ensemble]
