In [20]:
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
import xgboost as xgb

In [2]:
iris = load_iris()
len(iris['data'])

150

In [3]:
num_samples, num_features = iris.data.shape
num_samples, num_features

(150, 4)

In [4]:
print(iris.target_names)

['setosa' 'versicolor' 'virginica']


Let's divide our data into 20% reserved for testing our model, and the remaining 80% to train it with. By withholding our test data, we can make sure we're evaluating its results based on new flowers it hasn't seen before. Typically we refer to our features (in this case, the petal sizes) as X, and the labels (in this case, the species) as y.

In [5]:
X_train, X_test, y_train, y_test = train_test_split(iris.data,
                                                    iris.target,
                                                    test_size=0.2,
                                                   random_state=0)
len(X_train), len(X_test), len(y_train), len(y_test)

(120, 30, 120, 30)

Now, we'll load up XBoost, and covert our data into the DMatrix format it expects. One for the training data, and one for the test data

In [13]:
train = xgb.DMatrix(X_train, label=y_train)
test = xgb.DMatrix(X_test, label=y_test)
train, test

(<xgboost.core.DMatrix at 0x163e3f070>, <xgboost.core.DMatrix at 0x106e153d0>)

Now we'll define our hyperparameters.  We're choosing softmax since this is a multiple classification problem, but the other parameters should ideally be turned through experimentation.

In [16]:
param = {
    'max_depth': 4,
    'eta': 0.3,
    'objective': 'multi:softmax',
    'num_class': 3
}

epochs = 10

Let's go ahead and train our model using these parameters as a first guess.

In [17]:
model = xgb.train(param, train, epochs)
model

<xgboost.core.Booster at 0x106e15a30>

Now we'll use the trained model to predict classifications for the data we set aside for testing.  Each classification number we get back corresponds to a specific species of Iris.

In [18]:
predictions = model.predict(test)
predictions

array([2., 1., 0., 2., 0., 2., 0., 1., 1., 1., 2., 1., 1., 1., 1., 0., 1.,
       1., 0., 0., 2., 1., 0., 0., 2., 0., 0., 1., 1., 0.], dtype=float32)

In [21]:
accuracy_score(y_test, predictions)

1.0