Objective:

Apply all feature engineering techniques on a real dataset and prepare for analysis or modeling.



1️⃣ Business Understanding
Goal: Identify high-value customers, profitable products, and seasonal trends
to improve revenue and optimize marketing/discount strategies.


In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

pd.set_option('display.max_columns', None)



2️⃣ Load Dataset


In [None]:
df = pd.read_csv("retail_sales_cleaned.csv")



3️⃣ Feature Creation


In [None]:
df['Total_Revenue'] = df['Sales']*df['Quantity']
df['Profit_Margin'] = df['Profit']/df['Sales']
df['High_Value_Order'] = df['Sales'].apply(lambda x: 1 if x>1000 else 0)



4️⃣ Categorical Binning


In [None]:
top_categories = df['Product_Category'].value_counts().nlargest(3).index
df['Product_Category_Binned'] = df['Product_Category'].apply(lambda x: x if x in top_categories else 'Other')



5️⃣ Date-Time Features


In [None]:
df['Order_Date'] = pd.to_datetime(df['Order_Date'])
df['Order_Month'] = df['Order_Date'].dt.month
df['Is_Weekend'] = df['Order_Date'].dt.dayofweek.apply(lambda x: 1 if x>=5 else 0)



6️⃣ Aggregated Customer Features


In [None]:
df['Customer_Total_Revenue'] = df.groupby('Customer_ID')['Total_Revenue'].transform('sum')
df['Customer_Frequency'] = df.groupby('Customer_ID')['Order_ID'].transform('count')
df['Customer_Recency'] = (df['Order_Date'].max() - df.groupby('Customer_ID')['Order_Date'].transform('max')).dt.days



7️⃣ Visual Checks


In [None]:
sns.histplot(df['Total_Revenue'], bins=30)
plt.title("Total Revenue Distribution")
plt.show()

sns.boxplot(x='Product_Category_Binned', y='Profit')
plt.title("Profit by Product Category")
plt.show()



8️⃣ Insights & Recommendations
1. Customers with high frequency & high revenue are top targets for promotions.
2. Technology category generates highest profit and revenue.
3. Discounts during holidays reduce profit margins.
4. Recency and frequency features can help in customer segmentation.