### Gradient Boosting Machine (GBM)

`GBM` is an ensemble machine learning technique that builds models sequentially. Each new model tries to correct errors made by previous models by focusing on residual errors.

- Key features:
  - Combines weak learners (typically decision trees) sequentially
  - Optimizes a loss function using gradient descent
  - Effective for regression and classification tasks
  - Often delivers superior performance over base models such as decision trees or random forests

Below is a GBM implementation using scikit-learn's `GradientBoostingRegressor`.

In [3]:
import pandas as pd
import numpy as np
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import mean_absolute_error, mean_squared_error

# Load data
dataset = 'C:/Users/win10/Desktop/ImpactSense-Intern-project/data/preprocessed_earthquake_data.csv'
data = pd.read_csv(dataset)

# Define target and categorical features
target = 'Magnitude'
categorical_cols = ['Type', 'Magnitude Type', 'Source', 'Status']

# Prepare features and target variable
X = data.drop(columns=[target])
y = data[target]

# One-hot encode categorical columns
X = pd.get_dummies(X, columns=categorical_cols, drop_first=True)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize GBM model
gbm = GradientBoostingRegressor(
    n_estimators=100,
    learning_rate=0.1,
    max_depth=3,
    random_state=42
)

# Train model
gbm.fit(X_train, y_train)

# Predict on test data
y_pred = gbm.predict(X_test)

# Evaluate performance
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)

print(f"Mean Absolute Error (MAE): {mae:.4f}")
print(f"Mean Squared Error (MSE): {mse:.4f}")

# Cross-validation for robustness
cv_scores = cross_val_score(gbm, X, y, cv=5, scoring='neg_mean_absolute_error')
print(f"Cross-validated MAE: {-np.mean(cv_scores):.4f}")

Mean Absolute Error (MAE): 0.6810
Mean Squared Error (MSE): 0.8941
Cross-validated MAE: 0.7183


- **Task:** Understand and apply Gradient Boosting Machine to predict a categorical target variable, evaluate performance, and interpret results and notedown your observations.

In [None]:
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score

target = ['Status_Reviewed']
categorical_cols = ['Type', 'Magnitude Type', 'Source', 'Status']
X = data.drop(columns=target)
y = data[target]

# Encode the categorical features
from sklearn.preprocessing import LabelEncoder
for col in categorical_cols:
  if X[col].dtype == 'object':
    le = LabelEncoder()
    X[col] = le.fit_transform(X[col])

# Split dataset into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

gbm = GradientBoostingClassifier(
    n_estimators=100,
    learning_rate=0.1,
    max_depth=3,
    random_state=42
)

gbm.fit(X_train, y_train)

y_pred = gbm.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")
