# Tutorial 01: Basic GBDT Training

ðŸŸ¢ **Beginner** â€” No prior boosting experience needed

In this tutorial, you'll learn how to train your first Gradient Boosted Decision Tree (GBDT) model with boosters.

## What you'll learn

1. Create a dataset from NumPy arrays
2. Configure and train a GBDT model
3. Make predictions
4. Evaluate model performance

## Setup

First, let's install and import the required packages:

In [None]:
import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

import boosters

## Generate Sample Data

We'll use scikit-learn to generate a synthetic regression dataset:

In [None]:
# Generate synthetic regression data
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"Training samples: {len(X_train)}")
print(f"Test samples: {len(X_test)}")
print(f"Features: {X_train.shape[1]}")

## Create a Dataset

boosters uses a `Dataset` object to wrap your data for efficient training:

In [None]:
# Create boosters Dataset objects
train_data = boosters.Dataset(X_train, y_train)
test_data = boosters.Dataset(X_test, y_test)

print(f"Train dataset: {train_data}")
print(f"Test dataset: {test_data}")

## Configure the Model

Create a configuration for your GBDT model:

In [None]:
# Configure the GBDT model
config = boosters.GBDTConfig(
    n_estimators=100,      # Number of trees
    max_depth=6,           # Maximum tree depth
    learning_rate=0.1,     # Learning rate (shrinkage)
    objective=boosters.Objective.squared(),  # Regression objective (L2 loss)
)

print("Configuration created!")
print(config)

## Train the Model

Train the model using the `GBDTModel.train()` method:

In [None]:
# Train the model
model = boosters.GBDTModel.train(train_data, config=config)

print(f"Model trained!")
print(f"Number of trees: {model.n_trees}")
print(f"Number of features: {model.n_features}")

## Make Predictions

Use the trained model to predict on the test set:

In [None]:
# Make predictions - need to wrap in Dataset for core API
y_pred = model.predict(boosters.Dataset(X_test))

print(f"Predictions shape: {y_pred.shape}")
print(f"First 5 predictions: {y_pred[:5].flatten()}")

## Evaluate Performance

Calculate standard regression metrics:

In [None]:
# Calculate metrics - flatten predictions for sklearn metrics
y_pred_flat = y_pred.flatten()
mse = mean_squared_error(y_test, y_pred_flat)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred_flat)

print(f"Mean Squared Error: {mse:.4f}")
print(f"Root Mean Squared Error: {rmse:.4f}")
print(f"RÂ² Score: {r2:.4f}")

## Summary

In this tutorial, you learned how to:

1. âœ… Create datasets from NumPy arrays
2. âœ… Configure a GBDT model with basic hyperparameters
3. âœ… Train the model
4. âœ… Make predictions and evaluate performance

## Next Steps

- [Tutorial 02: sklearn Integration](02-sklearn-integration.ipynb) â€” Use boosters with sklearn pipelines
- [Tutorial 03: Classification](03-classification.ipynb) â€” Train classification models