# Evaluating XGBoost models

In [1]:
from xgboost import XGBClassifier
import numpy as np
from sklearn.cross_validation import train_test_split
from sklearn.metrics import accuracy_score
import pandas as pd



In [4]:
data = pd.read_csv("pima-indians-diabetes.data.csv", header=None)
features = data.loc[:,0:7]
labels = data.loc[:,8]

Using K-fold cross validation to get a more accurate evaluation of the model:

In [5]:
model = XGBClassifier()

In [7]:
from sklearn.cross_validation import KFold
from sklearn.cross_validation import cross_val_score

kfold = KFold(n=len(features), n_folds=10, random_state=7)

In [10]:
results = cross_val_score(model, features, labels, cv=kfold)
print("Accuracy: %.2f%%, Standard deviation: (%.2f%%)" % (results.mean()*100, results.std()*100))

Accuracy: 76.69%, Standard deviation: (7.11%)


"If you have many classes for a classification type predictive modeling problem or the classes are imbalanced (there are a lot more instances for one class than another), it can be a good idea to create stratified folds when performing cross validation.

This has the effect of enforcing the same distribution of classes in each fold as in the whole training dataset when performing the cross validation evaluation. The scikit-learn library provides this capability in the StratifiedKFold class."

In [13]:
from sklearn.cross_validation import StratifiedKFold

kfold = StratifiedKFold(labels, n_folds=10, random_state=7)
results = cross_val_score(model, features, labels, cv=kfold)
print("Accuracy: %.2f%%, Standard deviation: (%.2f%%)" % (results.mean()*100, results.std()*100))

Accuracy: 76.95%, Standard deviation: (5.88%)


## What techniques to use when

* Generally k-fold cross validation is the gold-standard for evaluating the performance of a machine learning algorithm on unseen data with k set to 3, 5, or 10.
* Use stratified cross validation to enforce class distributions when there are a large number of classes or an imbalance in instances for each class.
* Using a train/test split is good for speed when using a slow algorithm and produces performance estimates with lower bias when using large datasets.