# Feature Engineering for Shopping Trends

This notebook demonstrates feature engineering techniques to enhance customer segmentation using the `shopping_trends.csv` dataset.

## Feature Engineering Process Outline

- Create new features to capture customer value, engagement, demographics, price sensitivity, preferences, and satisfaction.
- Transform categorical variables into meaningful numerical or grouped features.

## Step 1: Import Required Libraries

We use pandas for data manipulation and numpy for numerical operations.

In [1]:
import pandas as pd
import numpy as np

## Step 2: Load the Dataset

Read the CSV file and display the first few rows.

In [2]:
df = pd.read_csv('shopping_trends.csv')
df.head()

Unnamed: 0,Customer ID,Age,Gender,Item Purchased,Category,Purchase Amount (USD),Location,Size,Color,Season,Review Rating,Subscription Status,Payment Method,Shipping Type,Discount Applied,Promo Code Used,Previous Purchases,Preferred Payment Method,Frequency of Purchases
0,1,55,Male,Blouse,Clothing,53,Kentucky,L,Gray,Winter,3.1,Yes,Credit Card,Express,Yes,Yes,14,Venmo,Fortnightly
1,2,19,Male,Sweater,Clothing,64,Maine,L,Maroon,Winter,3.1,Yes,Bank Transfer,Express,Yes,Yes,2,Cash,Fortnightly
2,3,50,Male,Jeans,Clothing,73,Massachusetts,S,Maroon,Spring,3.1,Yes,Cash,Free Shipping,Yes,Yes,23,Credit Card,Weekly
3,4,21,Male,Sandals,Footwear,90,Rhode Island,M,Maroon,Spring,3.5,Yes,PayPal,Next Day Air,Yes,Yes,49,PayPal,Weekly
4,5,45,Male,Blouse,Clothing,49,Oregon,M,Turquoise,Spring,2.7,Yes,Cash,Free Shipping,Yes,Yes,31,PayPal,Annually


## Step 3: Drop Irrelevant Columns

Remove columns like `Customer ID` that do not contribute to feature engineering.

In [3]:
df = df.drop(columns=['Customer ID'], errors='ignore')

## Step 4: Engineer Customer Lifetime Value (CLV) Proxy

Estimate total customer value by multiplying `Purchase Amount (USD)` by `Previous Purchases`.

In [4]:
df['CLV'] = df['Purchase Amount (USD)'] * df['Previous Purchases']

## Step 5: Engineer Purchase Frequency Score

Map categorical `Frequency of Purchases` to a numerical score to quantify engagement.

In [5]:
freq_mapping = {
    'Weekly': 7,
    'Fortnightly': 6,
    'Bi-Weekly': 5,
    'Monthly': 4,
    'Every 3 Months': 3,
    'Quarterly': 2,
    'Annually': 1
}
df['Purchase_Frequency_Score'] = df['Frequency of Purchases'].map(freq_mapping)

## Step 6: Create Age Group Feature

Bin the `Age` column into three groups: Young, Middle, Senior.

In [6]:
bins = [18, 30, 50, 100]
labels = ['Young', 'Middle', 'Senior']
df['Age_Group'] = pd.cut(df['Age'], bins=bins, labels=labels, include_lowest=True)

## Step 7: Engineer Discount Sensitivity Feature

Create a binary feature indicating if both `Discount Applied` and `Promo Code Used` are 'Yes'.

In [7]:
df['Discount_Sensitivity'] = ((df['Discount Applied'] == 'Yes') & (df['Promo Code Used'] == 'Yes')).astype(int)

## Step 8: Category Preference Feature

Assume each row's `Category` is the dominant category for the purchase. For multiple purchases per customer, aggregation would be needed.

In [8]:
df['Dominant_Category'] = df['Category']

## Step 9: Engineer Seasonal Spending Feature

Create a binary feature for purchases made in Winter or Spring.

In [9]:
df['Winter_Spring_Buyer'] = df['Season'].isin(['Winter', 'Spring']).astype(int)

## Step 10: Review Rating Category Feature

Bin `Review Rating` into Low (<3), Medium (3–4), and High (>4) categories.

In [10]:
rating_bins = [0, 3, 4, 5]
rating_labels = ['Low', 'Medium', 'High']
df['Review_Rating_Category'] = pd.cut(df['Review Rating'], bins=rating_bins, labels=rating_labels, include_lowest=True)

## Step 11: Validate New Features

Display the first few rows of the new features to verify correctness.

In [11]:
print("New Features Added:\n")
print(df[['CLV', 'Purchase_Frequency_Score', 'Age_Group', 'Discount_Sensitivity', 
         'Dominant_Category', 'Winter_Spring_Buyer', 'Review_Rating_Category']].head())

New Features Added:

    CLV  Purchase_Frequency_Score Age_Group  Discount_Sensitivity  \
0   742                         6    Senior                     1   
1   128                         6     Young                     1   
2  1679                         7    Middle                     1   
3  4410                         7     Young                     1   
4  1519                         1    Middle                     1   

  Dominant_Category  Winter_Spring_Buyer Review_Rating_Category  
0          Clothing                    1                 Medium  
1          Clothing                    1                 Medium  
2          Clothing                    1                 Medium  
3          Footwear                    1                 Medium  
4          Clothing                    1                    Low  


## Step 12: Save Dataset with Engineered Features

Export the enhanced dataset for further analysis.

In [12]:
df.to_csv('shopping_trends_engineered.csv', index=False)
print("Dataset with engineered features saved to 'shopping_trends_engineered.csv'")

Dataset with engineered features saved to 'shopping_trends_engineered.csv'


## Step 13: Summary of New Features

Show summary statistics and distributions for the engineered features.

In [13]:
print("\nSummary of New Features:\n")
print(df[['CLV', 'Purchase_Frequency_Score', 'Discount_Sensitivity', 'Winter_Spring_Buyer']].describe())
print("\nAge Group Distribution:\n", df['Age_Group'].value_counts())
print("\nReview Rating Category Distribution:\n", df['Review_Rating_Category'].value_counts())
print("\nDominant Category Distribution:\n", df['Dominant_Category'].value_counts())


Summary of New Features:

               CLV  Purchase_Frequency_Score  Discount_Sensitivity  \
count  3900.000000               3900.000000           3900.000000   
mean   1517.876923                  3.954359              0.430000   
std    1116.943053                  1.996527              0.495139   
min      21.000000                  1.000000              0.000000   
25%     619.000000                  2.000000              0.000000   
50%    1278.000000                  4.000000              0.000000   
75%    2211.750000                  6.000000              1.000000   
max    5000.000000                  7.000000              1.000000   

       Winter_Spring_Buyer  
count          3900.000000  
mean              0.505128  
std               0.500038  
min               0.000000  
25%               0.000000  
50%               1.000000  
75%               1.000000  
max               1.000000  

Age Group Distribution:
 Age_Group
Senior    1476
Middle    1475
Young      949


## Explanation of Feature Engineering Steps

- **CLV Proxy:** Estimates total customer value by multiplying purchase amount by previous purchases.
- **Purchase Frequency Score:** Converts purchase frequency to a numerical scale for engagement quantification.
- **Age Group:** Bins age into Young, Middle, Senior for demographic analysis.
- **Discount Sensitivity:** Flags customers who use both discounts and promo codes.
- **Category Preference:** Uses the purchase category as a proxy for customer preference.
- **Seasonal Spending:** Identifies customers who buy in Winter/Spring.
- **Review Rating Category:** Groups review ratings into satisfaction levels.

The enhanced dataset is saved for use in clustering and further analysis.