**What is Feature Engineering?**
 
            Feature engineering is the process of creating new features or modifying existing ones to improve the performance of machine learning models. It involves techniques like feature extraction, transformation, encoding, and scaling to make data more useful for predictions.
 
 **Why Do We Need Feature Engineering?**

1.**Improves Model Performance** – Good features help models make better predictions.
 
2.**Reduces Overfitting** – Helps eliminate noise and irrelevant data.
 
3.**Handles Missing Data** – Creates meaningful replacements for missing values.
 
4.**Enables Better Interpretability** – Makes features more understandable and useful.

5.**Reduces Dimensionality** – Helps remove unnecessary data points, making the model efficient.

In [17]:
#Extract Date & Time Features 
import pandas as pd 
#Sample dataset 
df = pd.DataFrame({'TransactionDate': pd.to_datetime(['2025-02-05 14:30:00', '2025-02-06 18:45:00'])}) 
# Extract date-related features 
df['DayOfWeek'] = df ['TransactionDate'].dt.dayofweek 
# Monday=0, Sunday=6 
df['Hour'] = df['TransactionDate'].dt.hour # Extract hour 
df['IsWeekend'] = df["DayOfWeek"].apply(lambda x: 1 if x >= 5 else 0) # Weekend flag 
df 
#Why? Helps capture behavioral trends (e.g., shopping habits on weekends vs. weekdays).

Unnamed: 0,TransactionDate,DayOfWeek,Hour,IsWeekend
0,2025-02-05 14:30:00,2,14,0
1,2025-02-06 18:45:00,3,18,0


In [16]:
#Aggregated Features 
#Find average transaction amount per user: 
df_transactions = pd.DataFrame({ 
"UserID": [101, 102, 101, 103, 102], 
'TransactionAmount': [500, 300, 700, 1000, 400] 
}) 
df_user_avg = df_transactions.groupby("UserID")['TransactionAmount'].mean().reset_index() 
df_user_avg.rename(columns={'TransactionAmount': 'AvgTransactionAmount'}, inplace=True) 
df_user_avg
#Why? Identifies high-value customers and spending patterns. 

Unnamed: 0,UserID,AvgTransactionAmount
0,101,600.0
1,102,350.0
2,103,1000.0


In [15]:
#Encoding Categorical Variables 
#Convert Product Category (Electronics, Clothing) into numerical form: 
from sklearn.preprocessing import OneHotEncoder 
df = pd.DataFrame({'ProductCategory': ['Electronics', 'Clothing', 'Clothing', 'Grocery']}) 
encoder = OneHotEncoder(sparse_output=False) 
encoded_features = encoder.fit_transform(df[['ProductCategory']]) 
df_encoded = pd.DataFrame(encoded_features, columns=encoder.get_feature_names_out()) 
df_encoded 
#Why? Converts non-numeric categories into a format suitable for ML models.

Unnamed: 0,ProductCategory_Clothing,ProductCategory_Electronics,ProductCategory_Grocery
0,0.0,1.0,0.0
1,1.0,0.0,0.0
2,1.0,0.0,0.0
3,0.0,0.0,1.0


In [25]:
#Log Transformation for Skewed Data 
#If TransactionAmount has outliers, apply Log transformation: 
import numpy as np 
df = pd.DataFrame({'TransactionAmount': [100, 200, 5000, 10000, 20000]}) 
df['LogTransactionAmount'] = np.log1p(df['TransactionAmount']) # Log1p avoids Log(0) issues 
df
#why? Reduces skewness and impact of outliers.

Unnamed: 0,TransactionAmount,LogTransactionAmount
0,100,4.615121
1,200,5.303305
2,5000,8.517393
3,10000,9.21044
4,20000,9.903538


In [28]:
#Feature Scaling 
from sklearn.preprocessing import MinMaxScaler, StandardScaler 
scaler = MinMaxScaler() 
df['Normalized TransactionAmount'] = scaler.fit_transform(df [['TransactionAmount']]) 
standard_scaler = StandardScaler() 
df['Standardized TransactionAmount'] = standard_scaler.fit_transform(df[['TransactionAmount']]) 
df
#why? Ensures all features have the same scale, preventing bias in ML models.

Unnamed: 0,TransactionAmount,LogTransactionAmount,Normalized TransactionAmount,Standardized TransactionAmount
0,100,4.615121,0.0,-0.93707
1,200,5.303305,0.005025,-0.923606
2,5000,8.517393,0.246231,-0.277351
3,10000,9.21044,0.497487,0.395831
4,20000,9.903538,1.0,1.742196
