### XG Boost



In this notebook, I will be looking at the famous breastcancer dataset. This dataset is a multi-class classification problem, where I need to predict the correct target for each observation from a range of possible classes. We will attempt to predict the proper target class using this model, given the feature of each type of class, I often reuse this dataset between my tree-based notebooks. Using the same dataset makes it very easy to compare and contrast the performance of different tree-based models, and keep the trees a reasonable size. 

**Dataset**

Breast Cancer Dataset: https://www.kaggle.com/hdza1991/breast-cancer-wisconsin-data-set

### Import Preliminaries

In [1]:
%matplotlib inline
%config InlineBackend.figure_format='retina'
 
# Import modules
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np
import pandas as pd 
import seaborn
import warnings

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from xgboost import XGBClassifier

# Set pandas options
pd.set_option('max_columns',1000)
pd.set_option('max_rows',30)
pd.set_option('display.float_format', lambda x: '%.3f' % x)

# Set plotting options
mpl.rcParams['figure.figsize'] = (9.0, 3.0)

# Set warning options
warnings.filterwarnings('ignore');

### Import Data

In [2]:
# Import Breast Cancer data
breast_cancer = load_breast_cancer()
X, y = breast_cancer.data, breast_cancer.target

# Conduct a train-test split on the data
train_x, test_x, train_y, test_y = train_test_split(X,y)

# View the training dataframe
pd.DataFrame(train_x, columns=breast_cancer['feature_names']).head(5)

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,radius error,texture error,perimeter error,area error,smoothness error,compactness error,concavity error,concave points error,symmetry error,fractal dimension error,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
0,12.34,22.22,79.85,464.5,0.101,0.102,0.054,0.028,0.155,0.068,0.295,1.656,1.955,21.55,0.011,0.032,0.031,0.011,0.019,0.005,13.58,28.68,87.36,553.0,0.145,0.234,0.169,0.082,0.227,0.091
1,12.67,17.3,81.25,489.9,0.103,0.077,0.032,0.021,0.171,0.06,0.21,0.951,1.566,17.61,0.007,0.01,0.013,0.006,0.021,0.002,13.71,21.1,88.7,574.4,0.138,0.121,0.102,0.056,0.269,0.069
2,13.28,20.28,87.32,545.2,0.104,0.144,0.098,0.062,0.197,0.068,0.37,0.825,2.427,31.33,0.005,0.021,0.022,0.01,0.017,0.003,17.38,28.0,113.1,907.2,0.153,0.372,0.366,0.149,0.374,0.103
3,12.98,19.35,84.52,514.0,0.096,0.113,0.071,0.029,0.176,0.065,0.268,0.566,2.465,20.65,0.006,0.033,0.044,0.01,0.028,0.005,14.42,21.95,99.21,634.3,0.129,0.325,0.344,0.099,0.36,0.092
4,10.18,17.53,65.12,313.1,0.106,0.085,0.018,0.019,0.191,0.069,0.247,1.217,1.641,15.05,0.008,0.014,0.009,0.008,0.026,0.004,11.17,22.84,71.94,375.6,0.141,0.144,0.066,0.056,0.305,0.088


### General Notes

- You are building many trees and average the results out
- Trees are build in sequentially, so it takes longer to train
- 