<a href="https://colab.research.google.com/github/mfavaits/YouTube-Series-on-Machine-Learning/blob/master/GB_Breast_Cancer_Coding.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In the model the building part, you can use the cancer dataset, which is a very famous multi-class classification problem. This dataset is computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image.

The dataset comprises 30 features (mean radius, mean texture, mean perimeter, mean area, mean smoothness, mean compactness, mean concavity, mean concave points, mean symmetry, mean fractal dimension, radius error, texture error, perimeter error, area error, smoothness error, compactness error, concavity error, concave points error, symmetry error, fractal dimension error, worst radius, worst texture, worst perimeter, worst area, worst smoothness, worst compactness, worst concavity, worst concave points, worst symmetry, and worst fractal dimension) and a target (type of cancer).
This data has two types of cancer classes: malignant (harmful) and benign (not harmful). Here, you can build a model to classify the type of cancer. The dataset is available in the scikit-learn library or you can also download it from the UCI Machine Learning Library.


In [0]:
import numpy as np #linear algebra library of Python
from sklearn import datasets


In [0]:
cancer = datasets.load_breast_cancer()

In [0]:
cancer.data.shape

(569, 30)

In [0]:
print(cancer.data[0:5])

In [0]:
from sklearn.model_selection import train_test_split #method to split training and testing data sets
X_train, X_test, y_train, y_test=train_test_split(cancer.data, cancer.target, test_size=0.3, random_state=109)

In [0]:
import xgboost as xgb
from xgboost.sklearn import XGBClassifier
import matplotlib.pyplot as plt

In [0]:
pip install bayesian-optimization

In [0]:
import xgboost as xgb
from xgboost.sklearn import XGBClassifier
import bayes_opt
from bayes_opt import BayesianOptimization
from sklearn.model_selection import cross_val_score

In [0]:
pbounds = {'n_estimators': (50, 1000), 'eta': (0.01, 3), 'max_depth': (1,32), 'gamma':(0,5), 'min_child_weight': (2,20), 'subsample':(0.5,1), 'colsample_bytree':(0.5,1)}

In [0]:
model_tuning = XGBClassifier(n_jobs=-1)

In [0]:
def xgboostcv(eta, n_estimators, max_depth, gamma, min_child_weight, subsample, colsample_bytree):
    return np.mean(cross_val_score(model_tuning, X_train, y_train, cv=5, scoring='accuracy'))

In [0]:
optimizer = BayesianOptimization(
    f=xgboostcv,
    pbounds=pbounds,
    random_state=1)

In [0]:
optimizer.maximize(
    init_points=2,
    n_iter=3)
print(optimizer.max)

|   iter    |  target   | colsam... |    eta    |   gamma   | max_depth | min_ch... | n_esti... | subsample |
-------------------------------------------------------------------------------------------------------------
| [0m 1       [0m | [0m 0.9622  [0m | [0m 0.7085  [0m | [0m 2.164   [0m | [0m 0.000571[0m | [0m 10.37   [0m | [0m 4.642   [0m | [0m 137.7   [0m | [0m 0.5931  [0m |
| [0m 2       [0m | [0m 0.9622  [0m | [0m 0.6728  [0m | [0m 1.196   [0m | [0m 2.694   [0m | [0m 14.0    [0m | [0m 14.33   [0m | [0m 244.2   [0m | [0m 0.9391  [0m |
| [0m 3       [0m | [0m 0.9622  [0m | [0m 0.824   [0m | [0m 1.385   [0m | [0m 1.402   [0m | [0m 9.156   [0m | [0m 2.095   [0m | [0m 999.8   [0m | [0m 0.8966  [0m |
| [0m 4       [0m | [0m 0.9622  [0m | [0m 0.51    [0m | [0m 1.916   [0m | [0m 1.044   [0m | [0m 31.53   [0m | [0m 19.96   [0m | [0m 998.3   [0m | [0m 0.5124  [0m |
| [0m 5       [0m | [0m 0.9622  [0m | [0m 0.609

In [0]:
model = XGBClassifier(eta=2, n_estimators=137, max_depth=10, min_child_weight=5, gamma=0, subsample=0.6, colsample_bytree=0.7 )
model.fit(X_train, y_train)

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=0.7, eta=2, gamma=0,
              learning_rate=0.1, max_delta_step=0, max_depth=10,
              min_child_weight=5, missing=None, n_estimators=137, n_jobs=1,
              nthread=None, objective='binary:logistic', random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
              silent=None, subsample=0.6, verbosity=1)

In [0]:
y_pred = model.predict(X_test)
from sklearn import metrics
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))

Accuracy: 0.9824561403508771


In [0]:
from sklearn.metrics import confusion_matrix
y_pred=model.predict(X_test)
confusion_matrix(y_test,y_pred)

array([[ 60,   3],
       [  0, 108]])