# Clothes Size Predictor 🧥

## Feature Engineering

In [1]:
# Import necessary libraries
import os
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import logging
logging.basicConfig(level=logging.INFO, format='[%(levelname)s] %(message)s')
logger = logging.getLogger(__name__)

# Get the current working directory
current_dir = os.getcwd()

# Navigate to the project root
project_root = os.path.abspath(os.path.join(current_dir, '..'))

# Import from /src
sys.path.append(os.path.join(project_root))
logger.info(f"✅ Libraries Uploaded")

[INFO] ✅ Libraries Uploaded


In [2]:
# --- Import from /src/pipelines
from src.pipeline.feature_engineering import FeatureEngineer
logger.info(f"✅ Libraries Uploaded")

[INFO] ✅ Libraries Uploaded


## ☛ Import DataSet Processed

In [3]:
# Load the cleaned dataset
file_path = os.path.abspath(os.path.join(project_root, 'data', 'processed', 'clothes_processed.csv'))

# Load the CSV into a DataFrame
try:
    clothes_df = pd.read_csv(file_path)
    logger.info(f"✅ Data successfully loaded: {clothes_df.shape[0]} rows, {clothes_df.shape[1]} columns.")
except Exception as e:
    logger.error(f"❌ Error loading data: {e}")

[INFO] ✅ Data successfully loaded: 26351 rows, 4 columns.


In [4]:
clothes_df.head()

Unnamed: 0,weight,age,height,size
0,62,28.0,172.72,XL
1,59,36.0,167.64,L
2,61,34.0,165.1,M
3,65,27.0,175.26,L
4,62,45.0,172.72,M


## Initialize the Feature Engineering

This pipeline will follow these steps ✎

1. Generate new derived variables (e.g. BMI, ratios, interactions).
2. Encode _**Size**_ to numeric or One-Hot values as needed.
3. Normalize or standardize numeric variables (optional).
4. Save the processed dataset with the new columns.

In [5]:
# Initialize
fe = FeatureEngineer(clothes_df, target_col="Size")

In [6]:
# Run full pipeline
features_df = fe.run_all(scale=True, interactions=True)

🚀 Running full Feature Engineering pipeline...
🧮 Feature 'BMI' created.
⚙️ Interaction features created: weight_age_ratio, height_age_ratio.
⚠️ Target column 'Size' not found.
📏 Scaled numeric features: ['weight', 'age', 'height', 'BMI', 'weight_age_ratio', 'height_age_ratio']
✅ Feature engineering pipeline completed.


In [7]:
# Save result
fe.save_features("clothes_features.csv")

✅ Saved feature-engineered dataset to results/features\clothes_features.csv


'results/features\\clothes_features.csv'

In [8]:
# Check preview
features_df.head()

Unnamed: 0,weight,age,height,size,BMI,weight_age_ratio,height_age_ratio
0,-0.084191,-0.78447,0.860731,XL,-0.550174,0.523921,0.794193
1,-0.355933,-0.013125,0.229692,L,-0.495748,-0.386032,-0.205802
2,-0.174771,-0.205961,-0.085828,M,-0.138852,-0.137729,-0.075048
3,0.187552,-0.880888,1.176251,L,-0.452578,0.831234,1.007114
4,-0.084191,0.854639,0.860731,M,-0.550174,-0.799308,-0.750362


### Just a quick explanation ⚑

- Weight = -0.08 → slightly below average weight
- Height = 0.86 → above average height
- Age = -0.78 → slightly younger than average