# Data Preprocessing

## Project: E-commerce Product Delivery Prediction

The preprocessing pipeline is designed to be consistent, reusable, and
safe from data leakage.

### Preprocessing Steps

- Separate the target variable from input features
- Perform a stratified trainâ€“test split to preserve class distribution
- Handle missing values using appropriate imputation strategies
- Encode categorical features using One-Hot Encoding
- Scale numerical features using StandardScaler
- Apply all transformations using a unified preprocessing pipeline

These steps ensure that the dataset is transformed into a model-ready format
and that the same preprocessing logic can be consistently applied during
model training and evaluation.


### Import Libraries

In [3]:
# Core libraries
import pandas as pd

# Scikit-learn utilities
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.impute import SimpleImputer

import sys
from pathlib import Path

# Add project root to Python path
PROJECT_ROOT = Path().resolve().parents[1]
sys.path.append(str(PROJECT_ROOT))


In [4]:
# Project configuration
from src.config import DATA_FILE, TARGET_COL, TEST_SIZE, RANDOM_STATE

ModuleNotFoundError: No module named 'src'