# Crush Rig - XGBoost
Written by Matt MacDonald for CIGITI at the Hospital for Sick Children Toronto
***

All tools to manipulate data will be obtained from the crush_plot.py file. The objective of this notebook is to predict the histological targets from the force/position crush data using xgboost.

In [None]:
%load_ext autoreload
%autoreload 2
%matplotlib notebook

In [None]:
from crush_read import *

The crush data must be collected using the crush rig and crush.py and stored in the expected folder structure at the root directory indicated by PATH.

In [None]:
# PATH = Path('')
# Default in crush_plot.py
PATH

Load all data and modify as needed.

In [None]:
study = study_outline(PATH)
targets = study_targets(PATH)
crushes = study_data(study)
crushes = split(crushes)
crushes = modify(crushes)
crushes = calculate(crushes)

Prepare data for xgboost.

In [None]:
X, y, legend = prep(crushes, targets)
print('Reference for categorical features:')
legend

In [None]:
X.shape

Build xgboost model.

In [None]:
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X.values, y.values, test_size=0.2, random_state=42)

In [None]:
X_train.shape

In [None]:
clf = XGBClassifier()
clf.fit(X_train, y_train)

In [None]:
y_pred = clf.predict(X_test)
y_corr = y_pred == y_test

In [None]:
y_pred_train = clf.predict(X_train)
y_corr_train = y_pred_train == y_train

In [None]:
print(f"test acc = {sum(y_corr) / len(y_corr)}")
print(f"train acc = {sum(y_corr_train) / len(y_corr_train)}")

In [None]:
from xgboost import plot_tree
plot_tree(clf, rankdir='LR', num_trees=3)

In [None]:
# feature numbers legend
for i, feat in enumerate(X.columns):
    print(f"f{i} = {feat}")