## Simple XGBoost Example

In this notebook, we show a very simple use pattern for XGBoost.  To run this, you need to 'pip install XGBoost' into your Python environment.

author: Keith Chugg (chugg@usc.edu)

ChatGPT was used in the generation of this code.

In [2]:
import xgboost as xgb
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

## Breast Cancer Dataset

* Source: Wisconsin Diagnostic Breast Cancer (WDBC) dataset
* Task: Binary classification (malignant vs. benign breast cancer)
* Features: 30 numerical features computed from digitized images of fine needle aspirates of breast masses
* Samples: 569 instances
* Classes:
- 0 = Malignant (cancerous)
- 1 = Benign (non-cancerous)

More details:  https://scikit-learn.org/stable/datasets/toy_dataset.html#breast-cancer-dataset

In [3]:
# Load dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [None]:
print(f'X_train: shape: {X_train.shape}')
print(f'X_test: shape: {X_test.shape}\n')

print(f'y_train: shape: {y_train.shape}')
print(f'y_test: shape: {y_test.shape}\n')

print(f'Classses:  {set(y_train)}')
print(f'Class 1 (Benign) examples in train: {np.sum(y_train)}  or {100 * np.mean(y_train) : 2.2f}% ')
print(f'Class 1 (Benign) examples in test: {np.sum(y_test)} or {100 * np.mean(y_test) : 2.2f}% ')

X_train: shape: (455, 30)
X_test: shape: (114, 30)


y_train: shape: (455,)
y_test: shape: (114,)

Classses:  {0, 1}
Class 1 (Benign) examples in train: 286  or  62.86% 
Class 1 (Benign) examples in test: 71 or  62.28% 


In [None]:
# Create and train an XGBoost classifier
model = xgb.XGBClassifier(eval_metric="logloss")
model.fit(X_train, y_train)


In [None]:
# Make predictions
y_pred = model.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")
