# Feature Engineering

Apply featurizer and save features to CSV for training.

# 02 — Feature Engineering

This notebook transforms raw CO₂RR data into machine-learning–ready features.

Goals:
- Construct physically meaningful features
- Apply scaling and transformations
- Generate train/test datasets for modeling


In [1]:
# ---- Project path setup (DO NOT REMOVE) ----
import sys
from pathlib import Path

PROJECT_ROOT = Path().resolve().parents[0]
if str(PROJECT_ROOT) not in sys.path:
    sys.path.insert(0, str(PROJECT_ROOT))

print("Project root:", PROJECT_ROOT)


Project root: C:\Users\mhendy\Desktop\ML_projects\project1\ml-surrogate-co2rr-binding-energies


In [2]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

from src.features.build_features import featurize


In [3]:
data_path = PROJECT_ROOT / "data" / "01_raw" / "dataset.csv"
df = pd.read_csv(data_path)

df.head()

Unnamed: 0,catalyst,facet,adsorbate,d_band_center,pauling_en,atomic_radius,valence_electrons,adsorption_energy
0,Cu,111,CO*,-1.81,1.9,128,11,-0.67
1,Cu,111,COOH*,-1.65,1.9,128,11,-0.42
2,Cu,111,OCHO*,-1.72,1.9,128,11,-0.55
3,Cu,100,CO*,-1.92,1.9,128,11,-0.71
4,Cu,100,COOH*,-1.74,1.9,128,11,-0.48


In [4]:
X, y = featurize(df, target_col="adsorption_energy")

print("Feature matrix shape:", X.shape)
print("Target shape:", y.shape)


Feature matrix shape: (78, 6)
Target shape: (78,)


In [5]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

X_train.shape, X_test.shape


((62, 6), (16, 6))

In [6]:
scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


In [7]:
output_dir = PROJECT_ROOT / "data" / "features"
output_dir.mkdir(parents=True, exist_ok=True)

np.save(output_dir / "X_train.npy", X_train_scaled)
np.save(output_dir / "X_test.npy", X_test_scaled)
np.save(output_dir / "y_train.npy", y_train.values)
np.save(output_dir / "y_test.npy", y_test.values)

print("Saved features to:", output_dir)


Saved features to: C:\Users\mhendy\Desktop\ML_projects\project1\ml-surrogate-co2rr-binding-energies\data\features


## Summary

- Feature engineering completed
- Train/test datasets created
- Scaled features saved for modeling

Next notebook: **03_Modeling.ipynb**
