<a href="https://colab.research.google.com/github/hc0rd31r0/Bootcamp_Data_Science/blob/main/projeto-final/projeto_final_hiperparametros.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img src="https://github.com/hc0rd31r0/Bootcamp_Data_Science/blob/main/projeto-final/img/Banner_Bootcamp.png?raw=true">

#**Projeto Final de Conclusão de Curso**
###Bootcamp Data Science Aplicada 2 by [Alura](https://www.alura.com.br/) 
####Autor: Helton Cordeiro
e-mail: heltoncordeiro@gmail.com

Junho-Agosto/2021.



---

# **Objetivo desse notebook**

Após os dados terem sido disponibilizados pelo Hospital Sírio Libânes e tratado no notebook [projeto_final_tratamento_dados](https://github.com/hc0rd31r0/Bootcamp_Data_Science/blob/main/projeto-final/projeto_final_tratamento_dados.ipynb) e termos trabalhando com alguns modelos de Machine Learning com seus valores default, vamos explorar os hiperparametros nesse notebook. 

Será utilizando um dicionário com o range de valores e parâmetros  que serão passados para o método **RandomizedSearchCV** utilizando a função *executa_modelos_RandomizedSearchCV* que está no arquivo funcoes.py.

A proposta da separação do [notebook de análise](https://github.com/hc0rd31r0/Bootcamp_Data_Science/blob/main/projeto-final/Bootcamp_DataScience_projeto_final.ipynb) se deve ao tempo de processamento necessário para testar todos os modelos, além de facilitar futuros ajustes.

---


##O que é um hiperparâmetro?

Os hiperparâmetros contêm os dados que controlam o próprio processo de treinamento.

Seu aplicativo de treinamento lida com três categorias de dados durante o treinamento do modelo:

* Os *dados de entrada*, também chamados de dados de treinamento, formam uma coleção de registros individuais (instâncias) com as características que são importantes para o problema de machine learning. Esses dados configuram o modelo durante o treinamento para fazer predições precisas sobre novas instâncias de dados semelhantes. No entanto, os valores nos dados de entrada nunca se tornam diretamente parte do modelo.

* Os *parâmetros* do modelo são as variáveis que a técnica de machine learning escolhida usa para ajustar os dados. Por exemplo, uma rede neural profunda (DNN, na sigla em inglês) é composta por nós de processamento (neurônios), cada um com uma operação realizada nos dados enquanto eles trafegam pela rede. Quando a DNN é treinada, cada node tem um valor de peso que informa ao modelo o impacto que ele tem na predição final. Esses pesos são um exemplo dos parâmetros do modelo. De muitas formas, esses parâmetros são o modelo. Ou seja, são eles que diferenciam seu modelo específico de outros modelos do mesmo tipo que trabalham com dados semelhantes.

* Os *hiperparâmetros* são variáveis que controlam o próprio processo de treinamento. Por exemplo, faz parte da configuração de uma rede neural profunda decidir quantas camadas ocultas de nós precisam ser usadas entre a camada de entrada e a camada de saída, bem como quantos nós cada camada precisa usar. Essas variáveis não estão diretamente relacionadas aos dados de treinamento. Elas são variáveis de configuração. Os parâmetros mudam durante um job de treinamento, enquanto os hiperparâmetros geralmente permanecem constantes durante um job.


Os parâmetros do modelo são otimizados (ou seja, "ajustados") pelo processo de treinamento. Você executa os dados por meio das operações do modelo, compara a predição resultante com o valor real de cada instância de dados, avalia a precisão e ajusta até encontrar os melhores valores. Os hiperparâmetros são ajustados por meio da execução de todo o job de treinamento, a observação da precisão agregada e o ajuste. Nos dois casos, você está modificando a composição do modelo tentando encontrar a melhor combinação para lidar com o problema.

Fonte: [Visão geral do ajuste de hiperparâmetros](https://cloud.google.com/ai-platform/training/docs/hyperparameter-tuning-overview?hl=pt-br)


##Importação de Bibliotecas

In [1]:
import pandas as pd
import numpy as np
import random
import pickle
from scipy.stats import randint

#import warnings
#warnings.simplefilter(action='ignore')

In [2]:
rstate = 73246
np.random.seed(rstate)

## Importando as funções auxiliares

* Arquivo funcoes.py contém as funções python que serão utilizados pelo projeto.

In [17]:
import funcoes
from importlib import reload
reload(funcoes)

<module 'funcoes' from '/content/funcoes.py'>

In [18]:
from funcoes import executa_modelos_RandomizedSearchCV, listar_parametros

In [5]:
help(executa_modelos_RandomizedSearchCV)

Help on function executa_modelos_RandomizedSearchCV in module funcoes:

executa_modelos_RandomizedSearchCV(names, models, dados, n_splits, n_repeats, param_distributions, n_iter, showMsg=True)
    Função que recebe parâmetros para a execução de teste de hiperparametros dos modelos.
    Utiliza a função roda_modelo_RandomizedSearchCV()
    
    Parâmetros
    ----------
      names: array com os nomes dos modelos. É usado como o index do dataFrame de retorno.
               ex.: names = [ "KNeighbors", "Gaussian" ]
      models: array com a instância do modelo a ser testado.
               ex.: classes = [ KNeighborsClassifier(), 
                                GaussianProcessClassifier() ]
      dados: dataFrame com os dados
      n_splits: parâmetro utilizado pelo roda_modelo_RandomizedSearchCV()
      n_repeats: parâmetro utilizado pelo roda_modelo_RandomizedSearchCV()
      param_distributions: dicionário com parâmetros a serem testados
              ex: hiperparams = { 
          

## Carregando os dados
Vamos carregar o arquivo de dados tratados e sem as colunas com alta correção.

O processo de tratamento dos dados estão nesse notebook [projeto_final_tratamento_dados.ipynb](https://github.com/hc0rd31r0/Bootcamp_Data_Science/blob/main/projeto-final/projeto_final_tratamento_dados.ipynb).


### Realizando a carga de dados

In [6]:
url='https://github.com/hc0rd31r0/Bootcamp_Data_Science/blob/main/projeto-final/dados/Kaggle_Sirio_Libanes_ICU_Prediction-tratado-sem-corr.xls?raw=true'
dados_raw = pd.read_excel(url, index_col=0)
dados = dados_raw.copy()



---


#Modelos de Machine Learning
Relação dos Modelos que serão testados nesse projeto, foram selecionados por resolverem problemas de classificação binária e aprendizado supervisionado.


1. [KNeighborsClassifier](
https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html)

2. [SVC](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html)

3. [GaussianProcessClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.GaussianProcessClassifier.html)

4. [DecisionTreeClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier)

5. [RandomForestClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)

6. [Neural Net (MLP)](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html)

7. [AdaBoostClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html)

8. [GradientBoostingClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html)

9. [ExtraTreesClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesClassifier.html)

10. [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html)


---



##Modelos

Criando as instâncias dos modelos para processamento.

In [7]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.ensemble import AdaBoostClassifier, GradientBoostingClassifier
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier
from sklearn.linear_model import LogisticRegression

names = ["KNeighbors", "SVC", "Gaussian",
         "DecisionTree", "RandomForest", "NeuralMLP", "AdaBoost",
         "Gradient", "ExtraTrees", "LogisticRegression" ]

classes = [
    KNeighborsClassifier(),
    SVC(probability=True, random_state=rstate),
    GaussianProcessClassifier(random_state=rstate, warm_start=True),
    DecisionTreeClassifier(random_state=rstate),
    RandomForestClassifier(random_state=rstate, warm_start=True),
    MLPClassifier(random_state=rstate, warm_start=True),
    AdaBoostClassifier(random_state=rstate),
    GradientBoostingClassifier(random_state=rstate, warm_start=True),
    ExtraTreesClassifier(random_state=rstate, warm_start=True),
    LogisticRegression(random_state=rstate, warm_start=True)
]

##Testando Hiperparâmetros

Definindo o dicionário com os parâmetros que serão testados.

In [8]:
hiperparams = {
    "KNeighbors" : {
        "n_neighbors" : randint(1, 30),
        "weights" : ["uniform", "distance"],
        "leaf_size" : randint(30, 300),
        "algorithm" : ["auto", "ball_tree", "kd_tree", "brute"]
    },
    "SVC" : {
        "C": np.random.uniform(low=0.8, high=2, size=5),
        "kernel" : ["linear", "poly", "rbf", "sigmoid"],
        "gamma" : ["scale", "auto"]
    },
    "Gaussian" : {
        "n_restarts_optimizer" : randint(0, 100),
        "max_iter_predict" : randint(50,2000),
    },
    "DecisionTree" : {
        "criterion" : ["gini", "entropy"],
        "max_depth" : randint(1, 50),
        "min_samples_leaf" : randint(1, 20),
        "max_features": ["sqrt", "log2"]
    },
    "RandomForest" : {
        "n_estimators" :randint(90, 500),
        "criterion" : ["gini", "entropy"],
        "max_depth" : randint(1, 100),
        "min_samples_leaf" : randint(1, 20),
        "max_features": ["sqrt", "log2"]
    },
    "NeuralMLP" : {
        "activation" : ["identity", "logistic", "tanh", "relu"],
        "solver" : ["lbfgs", "sgd", "adam"],
        "alpha" : np.random.uniform(low=0.0001, high=0.01, size=10),
        "learning_rate" : ["constant", "invscaling", "adaptive"],
        "learning_rate_init" : np.random.uniform(low=0.001, high=0.1, size=10),
    },
    "AdaBoost" : {
        "n_estimators" : randint(50, 500),
        "learning_rate": np.random.uniform(low=0.8, high=5, size=10),
        "algorithm": ["SAMME", "SAMME.R"]
    },
    "Gradient" : {
        "loss": ["deviance", "exponential"],
        "n_estimators" : randint(100, 1000),
        "subsample":  np.random.uniform(low=0.1, high=1, size=10),
        "criterion" : ["friedman_mse", "mse"],
        "min_samples_leaf" : randint(1, 20),
        "max_depth" : randint(2, 6)        
    },
    "ExtraTrees" : {
        "n_estimators" :randint(100, 1000),
        "criterion" : ["gini", "entropy"],
        "max_depth" : randint(1, 20),
        "min_samples_leaf" : randint(1, 20),
    },
    "LogisticRegression" : {
        "C": np.random.uniform(low=0.1, high=3, size=10),
        "tol": np.random.uniform(low=0.0001, high=0.01, size=10),
        "solver"   : ["newton-cg", "lbfgs", "liblinear"], 
        "max_iter" : randint(1000,5000)
    }
}

## Processando

Vamos processar as modelos utilizando 5 splits de dados, com 10 repetições e 20 iterações.

In [9]:
dfmodelosRand = executa_modelos_RandomizedSearchCV(names, classes, dados, 5, 10, hiperparams, 20)

Modelo: KNeighbors 	 tempo: 13 segundos
Modelo: SVC 	 tempo: 35 segundos
Modelo: Gaussian 	 tempo: 26 segundos
Modelo: DecisionTree 	 tempo: 5 segundos
Modelo: RandomForest 	 tempo: 482 segundos
Modelo: NeuralMLP 	 tempo: 233 segundos
Modelo: AdaBoost 	 tempo: 467 segundos
Modelo: Gradient 	 tempo: 588 segundos
Modelo: ExtraTrees 	 tempo: 686 segundos
Modelo: LogisticRegression 	 tempo: 18 segundos


In [10]:
dfmodelosRand

Unnamed: 0_level_0,Modelo,AUC,Train AUC,Std AUC,Best Params,Tempo,objRandomizedSearchCV
Nome,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
KNeighbors,"KNeighborsClassifier(algorithm='kd_tree', leaf...",0.700238,1.0,0.061331,"{'algorithm': 'kd_tree', 'leaf_size': 292, 'n_...",13,RandomizedSearchCV(cv=RepeatedStratifiedKFold(...
SVC,"SVC(C=1.4048681047470488, break_ties=False, ca...",0.735974,0.851874,0.059672,"{'kernel': 'linear', 'gamma': 'scale', 'C': 1....",35,RandomizedSearchCV(cv=RepeatedStratifiedKFold(...
Gaussian,"GaussianProcessClassifier(copy_X_train=True, k...",0.714948,0.950495,0.066657,"{'max_iter_predict': 1173, 'n_restarts_optimiz...",26,RandomizedSearchCV(cv=RepeatedStratifiedKFold(...
DecisionTree,"DecisionTreeClassifier(ccp_alpha=0.0, class_we...",0.674503,0.846614,0.074437,"{'criterion': 'gini', 'max_depth': 17, 'max_fe...",5,RandomizedSearchCV(cv=RepeatedStratifiedKFold(...
RandomForest,"(DecisionTreeClassifier(ccp_alpha=0.0, class_w...",0.778388,0.89729,0.058186,"{'criterion': 'gini', 'max_depth': 99, 'max_fe...",482,RandomizedSearchCV(cv=RepeatedStratifiedKFold(...
NeuralMLP,"MLPClassifier(activation='identity', alpha=0.0...",0.743558,0.837538,0.058227,"{'solver': 'sgd', 'learning_rate_init': 0.0338...",233,RandomizedSearchCV(cv=RepeatedStratifiedKFold(...
AdaBoost,"(DecisionTreeClassifier(ccp_alpha=0.0, class_w...",0.717203,0.998454,0.063172,"{'algorithm': 'SAMME', 'learning_rate': 0.8438...",467,RandomizedSearchCV(cv=RepeatedStratifiedKFold(...
Gradient,"([DecisionTreeRegressor(ccp_alpha=0.0, criteri...",0.761938,1.0,0.060551,"{'criterion': 'friedman_mse', 'loss': 'devianc...",588,RandomizedSearchCV(cv=RepeatedStratifiedKFold(...
ExtraTrees,"(ExtraTreeClassifier(ccp_alpha=0.0, class_weig...",0.755712,0.934751,0.067471,"{'criterion': 'gini', 'max_depth': 6, 'min_sam...",686,RandomizedSearchCV(cv=RepeatedStratifiedKFold(...
LogisticRegression,"LogisticRegression(C=0.5633575507490969, class...",0.75013,0.834809,0.059367,"{'C': 0.5633575507490969, 'max_iter': 3043, 's...",18,RandomizedSearchCV(cv=RepeatedStratifiedKFold(...


Na coluna 'Modelo' temos o resultado do best_estimator_, e no campo 'objRandomizedSearchCV' temos uma cópia do objeto resultante do processamento, o retorno da função **RandomizedSearchCV**.

In [11]:
# O Tempo do processamento está em segundos.
dfmodelosRand['Tempo'].sum() / 60

42.55

# Resultado

Após o processamentos temos a lista dos melhores modelos e parâmetros de acordo com as configurações apresentadas no dicionário ```hiperparams```.

## ```best_params_```

In [12]:
for indice in names:
  print(100*'_')
  print(indice)
  print(len(indice)*'¨')
  display(dfmodelosRand.at[indice,'Best Params'])
print(100*'_')

____________________________________________________________________________________________________
KNeighbors
¨¨¨¨¨¨¨¨¨¨


{'algorithm': 'kd_tree',
 'leaf_size': 292,
 'n_neighbors': 24,
 'weights': 'distance'}

____________________________________________________________________________________________________
SVC
¨¨¨


{'C': 1.4048681047470488, 'gamma': 'scale', 'kernel': 'linear'}

____________________________________________________________________________________________________
Gaussian
¨¨¨¨¨¨¨¨


{'max_iter_predict': 1173, 'n_restarts_optimizer': 97}

____________________________________________________________________________________________________
DecisionTree
¨¨¨¨¨¨¨¨¨¨¨¨


{'criterion': 'gini',
 'max_depth': 17,
 'max_features': 'sqrt',
 'min_samples_leaf': 12}

____________________________________________________________________________________________________
RandomForest
¨¨¨¨¨¨¨¨¨¨¨¨


{'criterion': 'gini',
 'max_depth': 99,
 'max_features': 'sqrt',
 'min_samples_leaf': 18,
 'n_estimators': 443}

____________________________________________________________________________________________________
NeuralMLP
¨¨¨¨¨¨¨¨¨


{'activation': 'identity',
 'alpha': 0.0028123770434647292,
 'learning_rate': 'adaptive',
 'learning_rate_init': 0.03386779256083157,
 'solver': 'sgd'}

____________________________________________________________________________________________________
AdaBoost
¨¨¨¨¨¨¨¨


{'algorithm': 'SAMME',
 'learning_rate': 0.8438427789494884,
 'n_estimators': 266}

____________________________________________________________________________________________________
Gradient
¨¨¨¨¨¨¨¨


{'criterion': 'friedman_mse',
 'loss': 'deviance',
 'max_depth': 5,
 'min_samples_leaf': 6,
 'n_estimators': 160,
 'subsample': 0.49677434214589056}

____________________________________________________________________________________________________
ExtraTrees
¨¨¨¨¨¨¨¨¨¨


{'criterion': 'gini',
 'max_depth': 6,
 'min_samples_leaf': 4,
 'n_estimators': 917}

____________________________________________________________________________________________________
LogisticRegression
¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨


{'C': 0.5633575507490969,
 'max_iter': 3043,
 'solver': 'lbfgs',
 'tol': 0.00983631058928578}

____________________________________________________________________________________________________


## ```best_estimator_```

In [13]:
for indice in names:
  print(100*'_')
  print(indice)
  print(len(indice)*'¨')
  display(dfmodelosRand.at[indice,'Modelo'])
print(100*'_')

____________________________________________________________________________________________________
KNeighbors
¨¨¨¨¨¨¨¨¨¨


KNeighborsClassifier(algorithm='kd_tree', leaf_size=292, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=24, p=2,
                     weights='distance')

____________________________________________________________________________________________________
SVC
¨¨¨


SVC(C=1.4048681047470488, break_ties=False, cache_size=200, class_weight=None,
    coef0=0.0, decision_function_shape='ovr', degree=3, gamma='scale',
    kernel='linear', max_iter=-1, probability=True, random_state=73246,
    shrinking=True, tol=0.001, verbose=False)

____________________________________________________________________________________________________
Gaussian
¨¨¨¨¨¨¨¨


GaussianProcessClassifier(copy_X_train=True, kernel=None, max_iter_predict=1173,
                          multi_class='one_vs_rest', n_jobs=None,
                          n_restarts_optimizer=97, optimizer='fmin_l_bfgs_b',
                          random_state=73246, warm_start=True)

____________________________________________________________________________________________________
DecisionTree
¨¨¨¨¨¨¨¨¨¨¨¨


DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',
                       max_depth=17, max_features='sqrt', max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=12, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort='deprecated',
                       random_state=73246, splitter='best')

____________________________________________________________________________________________________
RandomForest
¨¨¨¨¨¨¨¨¨¨¨¨


RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=99, max_features='sqrt',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=18, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=443,
                       n_jobs=None, oob_score=False, random_state=73246,
                       verbose=0, warm_start=True)

____________________________________________________________________________________________________
NeuralMLP
¨¨¨¨¨¨¨¨¨


MLPClassifier(activation='identity', alpha=0.0028123770434647292,
              batch_size='auto', beta_1=0.9, beta_2=0.999, early_stopping=False,
              epsilon=1e-08, hidden_layer_sizes=(100,),
              learning_rate='adaptive', learning_rate_init=0.03386779256083157,
              max_fun=15000, max_iter=200, momentum=0.9, n_iter_no_change=10,
              nesterovs_momentum=True, power_t=0.5, random_state=73246,
              shuffle=True, solver='sgd', tol=0.0001, validation_fraction=0.1,
              verbose=False, warm_start=True)

____________________________________________________________________________________________________
AdaBoost
¨¨¨¨¨¨¨¨


AdaBoostClassifier(algorithm='SAMME', base_estimator=None,
                   learning_rate=0.8438427789494884, n_estimators=266,
                   random_state=73246)

____________________________________________________________________________________________________
Gradient
¨¨¨¨¨¨¨¨


GradientBoostingClassifier(ccp_alpha=0.0, criterion='friedman_mse', init=None,
                           learning_rate=0.1, loss='deviance', max_depth=5,
                           max_features=None, max_leaf_nodes=None,
                           min_impurity_decrease=0.0, min_impurity_split=None,
                           min_samples_leaf=6, min_samples_split=2,
                           min_weight_fraction_leaf=0.0, n_estimators=160,
                           n_iter_no_change=None, presort='deprecated',
                           random_state=73246, subsample=0.49677434214589056,
                           tol=0.0001, validation_fraction=0.1, verbose=0,
                           warm_start=True)

____________________________________________________________________________________________________
ExtraTrees
¨¨¨¨¨¨¨¨¨¨


ExtraTreesClassifier(bootstrap=False, ccp_alpha=0.0, class_weight=None,
                     criterion='gini', max_depth=6, max_features='auto',
                     max_leaf_nodes=None, max_samples=None,
                     min_impurity_decrease=0.0, min_impurity_split=None,
                     min_samples_leaf=4, min_samples_split=2,
                     min_weight_fraction_leaf=0.0, n_estimators=917,
                     n_jobs=None, oob_score=False, random_state=73246,
                     verbose=0, warm_start=True)

____________________________________________________________________________________________________
LogisticRegression
¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨


LogisticRegression(C=0.5633575507490969, class_weight=None, dual=False,
                   fit_intercept=True, intercept_scaling=1, l1_ratio=None,
                   max_iter=3043, multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=73246, solver='lbfgs', tol=0.00983631058928578,
                   verbose=0, warm_start=True)

____________________________________________________________________________________________________


##Lista de parâmetros testados

Lista de parâmetros pela ordem de teste.

In [14]:
help(listar_parametros)

Help on function listar_parametros in module funcoes:

listar_parametros(names, models)
    Função que montar uma tabela com os parametros resultantes do processamento de
    otimização dos modelos, destacando o rank_test_score = 1.
    
    Parâmetros
    ----------
      names: array com o nome dos modelos
      models: array com a instancia do Modelo de Machine Learning.
    
    Retorno
    -------
      Nenhum retorno



In [19]:
listar_parametros(names, dfmodelosRand)

________________________________________________________________________________________________________________________________________________________________________________________________________
KNeighbors
¨¨¨¨¨¨¨¨¨¨


Unnamed: 0,rank_test_score,media_AUC_teste,media_AUC_treino,algorithm,leaf_size,n_neighbors,weights
0,14,0.679854,0.821291,brute,133,7,uniform
1,7,0.693589,0.746585,ball_tree,299,22,uniform
2,2,0.695407,0.794489,kd_tree,254,11,uniform
3,17,0.602504,1.0,kd_tree,175,2,distance
4,15,0.664438,1.0,kd_tree,174,5,distance
5,1,0.700238,1.0,kd_tree,292,24,distance
6,19,0.541158,1.0,ball_tree,90,1,uniform
7,10,0.692173,0.753016,kd_tree,45,19,uniform
8,6,0.693686,0.800566,auto,160,10,uniform
9,7,0.693589,0.746585,ball_tree,93,22,uniform


________________________________________________________________________________________________________________________________________________________________________________________________________
SVC
¨¨¨


Unnamed: 0,rank_test_score,media_AUC_teste,media_AUC_treino,kernel,gamma,C
0,4,0.732537,0.853318,linear,auto,1.712323
1,14,0.638923,0.670771,sigmoid,auto,0.873736
2,6,0.729724,0.791484,rbf,scale,1.718927
3,9,0.727531,0.785101,rbf,scale,1.404868
4,5,0.732438,0.853428,linear,auto,1.718927
5,10,0.727332,0.812007,poly,scale,0.873736
6,19,0.605388,0.627737,sigmoid,scale,1.404868
7,1,0.735974,0.851874,linear,scale,1.404868
8,15,0.638922,0.67079,sigmoid,auto,1.712323
9,17,0.605463,0.62774,sigmoid,scale,1.561812


________________________________________________________________________________________________________________________________________________________________________________________________________
Gaussian
¨¨¨¨¨¨¨¨


Unnamed: 0,rank_test_score,media_AUC_teste,media_AUC_treino,max_iter_predict,n_restarts_optimizer
0,1,0.714948,0.950495,1173,97
1,1,0.714948,0.950495,1689,38
2,1,0.714948,0.950495,1424,89
3,1,0.714948,0.950495,1343,85
4,1,0.714948,0.950495,1460,96
5,1,0.714948,0.950495,1308,98
6,1,0.714948,0.950495,1412,17
7,1,0.714948,0.950495,915,31
8,1,0.714948,0.950495,560,14
9,1,0.714948,0.950495,1946,2


________________________________________________________________________________________________________________________________________________________________________________________________________
DecisionTree
¨¨¨¨¨¨¨¨¨¨¨¨


Unnamed: 0,rank_test_score,media_AUC_teste,media_AUC_treino,criterion,max_depth,max_features,min_samples_leaf
0,15,0.613118,0.856645,entropy,34,log2,7
1,8,0.64637,0.783659,gini,26,log2,15
2,11,0.633804,0.963201,gini,33,sqrt,3
3,8,0.64637,0.783659,gini,18,log2,15
4,10,0.637365,0.735816,gini,3,sqrt,13
5,1,0.674503,0.846614,gini,17,sqrt,12
6,6,0.664329,0.902284,gini,11,sqrt,6
7,18,0.566118,0.606144,entropy,1,sqrt,16
8,5,0.666502,0.901849,gini,19,sqrt,6
9,3,0.668956,0.86957,entropy,33,sqrt,10


________________________________________________________________________________________________________________________________________________________________________________________________________
RandomForest
¨¨¨¨¨¨¨¨¨¨¨¨


Unnamed: 0,rank_test_score,media_AUC_teste,media_AUC_treino,criterion,max_depth,max_features,min_samples_leaf,n_estimators
0,9,0.775906,0.963301,entropy,98,log2,7,440
1,11,0.775317,0.906695,entropy,14,log2,15,476
2,1,0.778388,0.89729,gini,99,sqrt,18,443
3,15,0.771844,0.999348,entropy,15,sqrt,3,486
4,2,0.778177,0.924127,gini,17,sqrt,12,416
5,14,0.774051,0.960644,gini,7,log2,6,150
6,3,0.777514,0.904179,gini,97,sqrt,16,332
7,13,0.774822,0.968828,gini,89,log2,6,395
8,18,0.768276,0.912106,gini,3,log2,7,99
9,8,0.776123,0.917112,entropy,65,log2,13,483


________________________________________________________________________________________________________________________________________________________________________________________________________
NeuralMLP
¨¨¨¨¨¨¨¨¨


Unnamed: 0,rank_test_score,media_AUC_teste,media_AUC_treino,solver,learning_rate_init,learning_rate,alpha,activation
0,17,0.623357,0.973589,lbfgs,0.094845,constant,0.004704,relu
1,10,0.670329,0.675869,sgd,0.010498,invscaling,0.002812,identity
2,13,0.629382,1.0,lbfgs,0.010498,constant,0.004704,logistic
3,16,0.625211,0.976463,lbfgs,0.068171,adaptive,0.006185,relu
4,7,0.726346,0.858934,adam,0.014358,constant,0.003231,relu
5,12,0.633093,1.0,lbfgs,0.094845,adaptive,0.00366,logistic
6,19,0.405881,0.420473,adam,0.083122,invscaling,0.004231,tanh
7,13,0.629382,1.0,lbfgs,0.006816,adaptive,0.004704,logistic
8,4,0.736191,0.796302,sgd,0.076961,constant,0.000688,identity
9,2,0.742678,0.805349,sgd,0.033868,adaptive,0.006185,logistic


________________________________________________________________________________________________________________________________________________________________________________________________________
AdaBoost
¨¨¨¨¨¨¨¨


Unnamed: 0,rank_test_score,media_AUC_teste,media_AUC_treino,algorithm,learning_rate,n_estimators
0,13,0.601838,0.68477,SAMME.R,3.863665,153
1,19,,,SAMME,4.164368,319
2,7,0.673451,1.0,SAMME.R,0.843843,274
3,3,0.709728,1.0,SAMME,1.877586,388
4,12,0.602014,0.685169,SAMME.R,3.863665,465
5,17,,,SAMME,3.553974,436
6,20,,,SAMME,3.553974,194
7,4,0.70509,1.0,SAMME,1.877586,312
8,10,0.636731,0.745642,SAMME.R,2.318862,495
9,8,0.652098,0.999902,SAMME,1.969554,274


________________________________________________________________________________________________________________________________________________________________________________________________________
Gradient
¨¨¨¨¨¨¨¨


Unnamed: 0,rank_test_score,media_AUC_teste,media_AUC_treino,criterion,loss,max_depth,min_samples_leaf,n_estimators,subsample
0,14,0.747365,0.999883,mse,exponential,5,7,450,0.288813
1,3,0.754421,1.0,mse,exponential,4,3,836,0.702668
2,7,0.753423,1.0,friedman_mse,deviance,3,2,515,0.680104
3,17,0.743107,1.0,friedman_mse,deviance,2,17,872,0.702668
4,1,0.761938,1.0,friedman_mse,deviance,5,6,160,0.496774
5,10,0.751809,1.0,friedman_mse,deviance,5,16,854,0.7746
6,9,0.752524,1.0,friedman_mse,exponential,3,4,917,0.496774
7,19,0.735997,0.999986,friedman_mse,exponential,4,10,675,0.300687
8,6,0.753677,1.0,friedman_mse,exponential,2,10,845,0.490013
9,5,0.753888,1.0,friedman_mse,deviance,4,3,599,0.57897


________________________________________________________________________________________________________________________________________________________________________________________________________
ExtraTrees
¨¨¨¨¨¨¨¨¨¨


Unnamed: 0,rank_test_score,media_AUC_teste,media_AUC_treino,criterion,max_depth,min_samples_leaf,n_estimators
0,13,0.74619,0.828596,entropy,2,8,778
1,11,0.747495,0.835663,gini,14,15,486
2,3,0.75259,0.985091,gini,11,3,438
3,16,0.743443,0.811067,entropy,2,15,972
4,15,0.744267,0.826459,gini,13,17,872
5,12,0.747145,0.859308,entropy,7,11,874
6,2,0.755502,0.967882,entropy,6,1,836
7,18,0.741983,0.816029,gini,16,19,118
8,1,0.755712,0.934751,gini,6,4,917
9,8,0.748531,0.839676,gini,3,10,778


________________________________________________________________________________________________________________________________________________________________________________________________________
LogisticRegression
¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨


Unnamed: 0,rank_test_score,media_AUC_teste,media_AUC_treino,C,max_iter,solver,tol
0,2,0.749676,0.835375,0.596783,1865,liblinear,0.000454
1,6,0.744415,0.846414,1.156744,3030,liblinear,0.008709
2,9,0.741451,0.851384,1.56459,3530,liblinear,0.002733
3,18,0.737657,0.85659,2.321059,2439,liblinear,0.008328
4,10,0.740837,0.853353,1.779216,2980,newton-cg,0.008328
5,13,0.739245,0.854764,1.996132,4844,liblinear,0.00027
6,5,0.747076,0.84249,0.872431,4005,newton-cg,0.008709
7,15,0.739093,0.854665,1.996132,1606,liblinear,0.00346
8,19,0.735185,0.859551,2.908518,4507,lbfgs,0.003578
9,17,0.737757,0.856668,2.321059,1992,liblinear,0.000454


________________________________________________________________________________________________________________________________________________________________________________________________________


#Salvando nosso dataFrame

In [16]:
output = open('/content/drive/MyDrive/databases/dfmodelosHP', 'wb')
pickle.dump(dfmodelosRand, output)
output.close()

---

# Referências

* [Kaggle - COVID-19 - Clinical Data to assess diagnosis - Sírio Libanês](https://www.kaggle.com/S%C3%ADrio-Libanes/covid19)
* [Pandas](https://pandas.pydata.org/pandas-docs/stable/index.html)
* [SciKit Learn](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.linear_model)
* [Configurar o treinamento do AutoML com Python](https://docs.microsoft.com/pt-br/azure/machine-learning/how-to-configure-auto-train)
* [Visão geral do ajuste de hiperparâmetros](https://cloud.google.com/ai-platform/training/docs/hyperparameter-tuning-overview?hl=pt-br)
* [Sklearn.RandomizedSearchCV()](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html)
* [willianrocha notebook](https://github.com/willianrocha/COVID-19_clinical_data_assess_diagnosis)