# 4 - Otimização de modelos

Concluída a fase de modelagem, vimos que dois modelos, regressão logística e XGBoost, preveem adequadamente os resultados de testes de COVID. 

Vamos agora tentar otimizá-los.

Nessa fase, buscaremos os melhores parâmetros para utilizar com estes modelos, a fim de determinar o modelo final a ser utilizado para predição de novos dados.

In [2]:
# para carregar a base de dadas limpa
import pickle
from typing import Tuple, List

import numpy as np
import pandas as pd
from scipy import stats as spst
from tqdm import tqdm

# gráficos
import seaborn as sns
from matplotlib import rcParams, pyplot as plt

# parâmetros do matplotlib
# essencialmente, para deixar os gráficos maiores por padrão
rcParams['figure.dpi'] = 120
rcParams['figure.figsize'] = (10, 8)

# warnings
import warnings
warnings.filterwarnings("ignore")

# pacote com funções para análise desse projeto
import os
cwd = os.getcwd()
os.chdir("../")
import scripts.plots as splt, scripts.metrics as smetrics
os.chdir(cwd)

## Importação e checagem de integridade

Vamos importar os dados limpos e verificar a integridade dos mesmos.

## Importação dos modelos fitados

Vamos importar os modelos já fitados porém com parâmetros padrão.

In [8]:
with open(r'../models/modelo_default.model', 'rb') as modelfile:
    pickler = pickle.Unpickler(file = modelfile)
    modelos_import = pickler.load()

In [10]:
modelos = modelos_import['modelo']

X_train, X_test, y_train, y_test = modelos_import['train_test_split']

In [11]:
covid = modelos_import['base']

covid.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 50000 entries, 485731 to 53605
Data columns (total 17 columns):
 #   Column               Non-Null Count  Dtype   
---  ------               --------------  -----   
 0   sex                  50000 non-null  category
 1   patient_type         50000 non-null  category
 2   pneumonia            50000 non-null  category
 3   age                  50000 non-null  int8    
 4   pregnancy            50000 non-null  category
 5   diabetes             50000 non-null  category
 6   copd                 50000 non-null  category
 7   asthma               50000 non-null  category
 8   inmsupr              50000 non-null  category
 9   hypertension         50000 non-null  category
 10  other_disease        50000 non-null  category
 11  cardiovascular       50000 non-null  category
 12  obesity              50000 non-null  category
 13  renal_chronic        50000 non-null  category
 14  tobacco              50000 non-null  category
 15  contact_other_

In [12]:
covid.head()

Unnamed: 0_level_0,sex,patient_type,pneumonia,age,pregnancy,diabetes,copd,asthma,inmsupr,hypertension,other_disease,cardiovascular,obesity,renal_chronic,tobacco,contact_other_covid,covid_res
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
485731,1,1,0,57,0,0,0,0,0,1,0,0,0,0,0,1,0
278436,0,0,0,87,0,0,0,0,0,1,1,0,0,0,0,-1,0
372264,1,1,0,28,0,0,0,0,0,0,0,0,0,0,0,1,0
321682,1,1,0,48,0,0,0,0,0,0,0,0,0,0,0,0,0
247228,1,1,0,9,0,0,0,0,0,0,0,0,0,0,0,1,0


In [None]:
# parâmetros do xgboost
# criterion: friedman_mse, mae
# max_features: log2, sqrt
# learning_rate: 0.01, 0.05, 0.1, 0.5, 1,
# max_depth = 3,4,5
# n_estimators: 5 10, 15, 20