## Data Preprocessing


In this section, we shall consider the following:

* **Importing Libraries**
* **Missing Values:** How they were handled.
* **Encoding:** Convert categorical variables to numeric.
* **Feature Engineering:** Add new features or remove redundant ones.
* **Scaling:** Apply normalization/standardization.

### Importing Libraries

In [1]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline


### Handle Missing Values

In [None]:
data_credit_card.fillna(0, inplace=True)  # Replace NaNs with 0 (modify as needed)

### Separate Features and Target

In [None]:
X = data_credit_card.drop('default.payment.next.month', axis=1)
y = data_credit_card['default.payment.next.month']


### Encode Categorical Variables

In [None]:
categorical_features = ['SEX', 'EDUCATION', 'MARRIAGE']
numerical_features = [col for col in X.columns if col not in categorical_features]

preprocessor = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), numerical_features),
        ('cat', OneHotEncoder(), categorical_features)
    ]
)

### Split Data

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=42)

### Feature Preprocessing

In [None]:
X_train = preprocessor.fit_transform(X_train)
X_test = preprocessor.transform(X_test)