## Feature Engineering

**Objective:**  
- Enhance dataset with new features.  
  
- Transform existing features to improve model performance.  
  
- Prepare data for modeling.  
  

### Import Libraries and Cleaned Data

In [33]:
import pandas as pd

df = pd.read_csv('../data/processed/cleaned_data.csv')

### Create New Features

**Total Number of Past Due Events:**

In [34]:
df['TotalPastDue'] = (df['NumberOfTime30-59DaysPastDueNotWorse'] +
                      df['NumberOfTime60-89DaysPastDueNotWorse'] +
                      df['NumberOfTimes90DaysLate'])

**Age Groups:**

In [35]:
bins = [0,30,50,120]
labels = ['Young','Adult','Senior']
df['AgeGroup'] = pd.cut(df['age'], bins=bins, labels=labels)

### Encode Categorical Variables

In [36]:
df = pd.get_dummies(df, columns=['AgeGroup'], drop_first=True)

### Feature Scaling

In [37]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

# Include all numerical features
numerical_features = [
    'RevolvingUtilizationOfUnsecuredLines',
    'age',
    'DebtRatio',
    'MonthlyIncome',
    'NumberOfTime30-59DaysPastDueNotWorse',
    'NumberOfOpenCreditLinesAndLoans',
    'NumberOfTimes90DaysLate',
    'NumberRealEstateLoansOrLines',
    'NumberOfTime60-89DaysPastDueNotWorse',
    'NumberOfDependents',
    'TotalPastDue'
]

df[numerical_features] = scaler.fit_transform(df[numerical_features])

### Prepare Final Dataset

In [38]:
# Isolate Feature and Target Variable
X = df.drop('SeriousDlqin2yrs', axis=1)
y = df['SeriousDlqin2yrs']

# Save Datasets
X.to_csv('../data/processed/X_features.csv', index=False)
y.to_csv('../data/processed/y_target.csv', index=False)

### Saving Scalar

In [40]:
import joblib

# Save the scaler object
joblib.dump(scaler, '../models/scaler.pkl')

['../models/scaler.pkl']