## Training (XGBoost edition)
After loading and preprocessing the data, we can now train the model.

### First things first
Importing libraries. Make sure you have them installed (check the instructions in the `README.md`)
And then, splitting 

In [None]:
import pandas as pd
import torch
import torch.nn as nn
import matplotlib.pyplot as plt
import numpy as np
from sklearn.model_selection import train_test_split

import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, roc_auc_score

# Load the processed data
df = pd.read_csv('data/' + 'train_processed.csv')

# Split features and labels
X = df.iloc[:, :-1]
y = df.iloc[:, -1]

# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=42)

model = xgb.XGBClassifier(
    objective='binary:logistic',
    eval_metric='auc',
    n_estimators=100,
    learning_rate=0.1,
    max_depth=6,
    random_state=42
)

model.fit(
    X_train, y_train,
    eval_set=[(X_test, y_test)],
    verbose=True
)

y_pred = model.predict(X_test)
y_proba = model.predict_proba(X_test)[:, 1]

print('Accuracy:', accuracy_score(y_test, y_pred))
print('ROC AUC:', roc_auc_score(y_test, y_proba))



[0]	validation_0-auc:0.86671
[1]	validation_0-auc:0.86793
[2]	validation_0-auc:0.86901
[3]	validation_0-auc:0.87074
[4]	validation_0-auc:0.86990
[5]	validation_0-auc:0.87480
[6]	validation_0-auc:0.87561
[7]	validation_0-auc:0.87772
[8]	validation_0-auc:0.87888
[9]	validation_0-auc:0.87926
[10]	validation_0-auc:0.87920
[11]	validation_0-auc:0.87996
[12]	validation_0-auc:0.88076
[13]	validation_0-auc:0.88142
[14]	validation_0-auc:0.88273
[15]	validation_0-auc:0.88368
[16]	validation_0-auc:0.88343
[17]	validation_0-auc:0.88343
[18]	validation_0-auc:0.88297
[19]	validation_0-auc:0.88456
[20]	validation_0-auc:0.88509
[21]	validation_0-auc:0.88700
[22]	validation_0-auc:0.88753
[23]	validation_0-auc:0.88809
[24]	validation_0-auc:0.88827
[25]	validation_0-auc:0.88917
[26]	validation_0-auc:0.88923
[27]	validation_0-auc:0.89018
[28]	validation_0-auc:0.89070
[29]	validation_0-auc:0.89136
[30]	validation_0-auc:0.89171
[31]	validation_0-auc:0.89157
[32]	validation_0-auc:0.89209
[33]	validation_0-au

Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


[96]	validation_0-auc:0.89393
[97]	validation_0-auc:0.89396
[98]	validation_0-auc:0.89396
[99]	validation_0-auc:0.89419
Accuracy: 0.8021851638872916
ROC AUC: 0.8941868124890873


An accuracy of .8 is really good! The XGBoost is clearly better, at least compared to the MLP (~.55 after optimizations).