# Gradient Boosting Decision Trees

Neste notebook iremos implementar o algoritmo GDBT para a classificação de nódulos.


Importação das bibliotecas e dos dados necessários para a implementação do modelo:

In [2]:
from sklearn.ensemble import GradientBoostingClassifier 
from sklearn.model_selection import GridSearchCV
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('final_features.csv')

# definir a variável target (y) e dividir os dados em conjunto de treino e teste
y = data["malignancy"]
X = data.drop(columns=['malignancy', 'case_id'])

## Hiperparâmetros

Utilização da GridSearchCV para descobrir quais os melhores hiperparâmetros para o treino do modelo:

In [None]:
gbdt_model = GradientBoostingClassifier()
# hiperparâmetros possíveis
param_grid_gbdt = {
    'n_estimators': [100, 200],
    'max_depth': [3, 6, 10],
    'learning_rate': [0.1, 0.2],
    'subsample': [0.8, 1.0]
}

# grid search com cross-validation
grid_search_gbdt = GridSearchCV(estimator=gbdt_model, param_grid=param_grid_gbdt, cv=10, scoring='accuracy', verbose=2, n_jobs=-1)
grid_search_gbdt.fit(X, y) 
print(f"Best Hyperparameters: {grid_search_gbdt.best_params_}")

Fitting 5 folds for each of 24 candidates, totalling 120 fits
Best Hyperparameters: {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 100, 'subsample': 1.0}
Best CV Score: 0.8817539901866788


## Avaliação do Modelo

10-fold cross-validation para testar o nosso modelo no que toca a precisão, f1, roc_auc e accuracy:

In [None]:
from sklearn.model_selection import cross_val_score
import numpy as np

# modelo
model = GradientBoostingClassifier(learning_rate=0.1, max_depth=3, n_estimators=100, subsample=1) # best_gbdt_model = grid_search_gbdt.best_estimator_

scores = [0,0,0,0] 
# [0] -> precision
# [1] -> f1
# [2] -> roc_auc
# [3] -> accuracy

# avaliação usando 10-fold cross validation
scores[0] = np.mean(cross_val_score(model, X, y, cv=10, scoring = "precision"))
scores[1] = np.mean(cross_val_score(model, X, y, cv=10, scoring = "f1"))
scores[2] = np.mean(cross_val_score(model, X, y, cv=10, scoring = "roc_auc"))
scores[3] = np.mean(cross_val_score(model, X, y, cv=10, scoring = "accuracy"))


print(f'Precision Score: {scores[0]:.2f}')
print(f'F1 Score: {scores[1]:.2f}')
print(f'ROC_AUC: {scores[2]:.2f}')
print(f'Accuracy Score: {scores[3]:.2f}')

KeyboardInterrupt: 