# 02 - Feature Engineering

Transform raw/clean data into model-ready features.

## Objectives
- Separate features and target
- Build preprocessing pipelines
- Save processed datasets or fitted transformers

> **Learner task:** Add domain-specific feature logic (aggregations, date features, interactions).

In [None]:
# Imports
import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, StandardScaler

In [None]:
# Load cleaned data
# TODO: Replace with your processed/cleaned dataset
df = pd.read_csv("data/processed/sample_clean.csv")

target_col = "target"  # TODO: update target column
X = df.drop(columns=[target_col])
y = df[target_col]

X.head()

In [None]:
# Identify feature types
numeric_features = X.select_dtypes(include=["number"]).columns.tolist()
categorical_features = X.select_dtypes(exclude=["number"]).columns.tolist()

print("Numeric features:", numeric_features)
print("Categorical features:", categorical_features)

In [None]:
# Build preprocessing pipeline
numeric_pipeline = Pipeline(
    steps=[
        ("imputer", SimpleImputer(strategy="median")),
        ("scaler", StandardScaler()),
    ]
)

categorical_pipeline = Pipeline(
    steps=[
        ("imputer", SimpleImputer(strategy="most_frequent")),
        ("encoder", OneHotEncoder(handle_unknown="ignore")),
    ]
)

preprocessor = ColumnTransformer(
    transformers=[
        ("num", numeric_pipeline, numeric_features),
        ("cat", categorical_pipeline, categorical_features),
    ]
)

print(preprocessor)

In [None]:
# Fit-transform features
X_transformed = preprocessor.fit_transform(X)
print("Transformed shape:", X_transformed.shape)

# TODO: Save transformed features if needed
# Example:
# import joblib
# joblib.dump(preprocessor, "models/preprocessor.joblib")

## Feature Engineering Notes
- New engineered features:
- Features dropped and why:
- Assumptions made: