# XGBoost Classification in Python

## Setup

If you are running Databricks Runtime, uncomment the appropriate line in Cmd 3 to install the xgboost library.  
If you are running Databricks Runtime ML, xgboost is already installed. Skip to Cmd 4.

In [0]:
# If you are running Databricks Runtime 7.1 or above, uncomment this line and run this cell:
#%pip install xgboost

# If you are running Databricks Runtime 6.4 to 7.0, uncomment this line and run this cell:
#dbutils.library.installPyPI("xgboost")

## Prepare data

In [0]:
import pandas as pd
import xgboost as xgb

In [0]:
raw_input = pd.read_csv("/dbfs/databricks-datasets/Rdatasets/data-001/csv/datasets/iris.csv",
                        header = 0,
                       names=["item","sepal length","sepal width", "petal length", "petal width","class"])
new_input = raw_input.drop(columns=["item"])
new_input["class"] = new_input["class"].astype('category')
new_input["classIndex"] = new_input["class"].cat.codes
print(new_input)

In [0]:
from sklearn.model_selection import train_test_split
# Split to train/test
training_df, test_df = train_test_split(new_input)

## Train XGBoost Model with Pandas DataFrames

In [0]:
dtrain = xgb.DMatrix(training_df[["sepal length","sepal width", "petal length", "petal width"]], label=training_df["classIndex"])

In [0]:
param = {'max_depth': 2, 'eta': 1, 'silent': 1, 'objective': 'multi:softmax'}
param['nthread'] = 4
param['eval_metric'] = 'auc'
param['num_class'] = 6

In [0]:
num_round = 10
bst = xgb.train(param, dtrain, num_round)

## Prediction

In [0]:
dtest = xgb.DMatrix(test_df[["sepal length","sepal width", "petal length", "petal width"]])
ypred = bst.predict(dtest)

In [0]:
from sklearn.metrics import precision_score

pre_score = precision_score(test_df["classIndex"],ypred, average='micro')

print("xgb_pre_score:",pre_score)