# Data Preprocessing

This notebook handles train-test splitting, feature scaling, and class imbalance treatment using SMOTE. All transformations are applied carefully to avoid data leakage.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from imblearn.over_sampling import SMOTE

In [None]:
df = pd.read_csv('../data/raw/creditcard.csv')
X = df.drop('Class', axis=1)
y = df['Class']

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=42
)

## Feature Scaling

Standardization ensures features contribute equally to model training.

In [None]:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

## Handling Class Imbalance with SMOTE

SMOTE is applied **only on the training set** to prevent information leakage.

In [None]:
smote = SMOTE(random_state=42)
X_train_resampled, y_train_resampled = smote.fit_resample(X_train_scaled, y_train)