# 📊 Customer Spending Pattern Analysis & Credit Card Recommendation

**Objective**:  
This notebook identifies specific customer segments based on their historical spending patterns to recommend targeted cashback credit cards.

**Workflow Overview**:
- Start with transaction data for customers identified as suitable prospects (**new addition**).
- Categorize customer transactions using Merchant Category Codes (**MCCs**).
- Create distinct customer segments using unsupervised clustering (KMeans).
- Determine dominant spending categories per segment.
- Recommend tailored cashback credit cards based on spending patterns.
- Apply a threshold to ensure recommendations target customers with meaningful affinity.

**🔑 Key Change from Earlier Version**:  
Instead of using all customers, this notebook specifically analyzes only those customers already identified by our preliminary binary classification model (`recommend_card`) as suitable for credit card recommendations.


## 🗂️ Phase 0: Loading Transaction Data and Filtering Prospects

**Purpose**:  
Load raw transaction data and filter it down to include **only customers flagged** by our previous supervised binary model as strong candidates for credit card offerings.

**Steps**:
- Load raw transaction dataset (`transaction_data.csv`).
- Load results from binary classification model (`customer_recommendation_flags.csv`), which identifies customers as either `1` (recommended for a credit card) or `0` (not recommended).
- Filter the transaction dataset to retain only transactions related to customers flagged as suitable (`recommend_card == 1`).

**Reasoning**:  
Focusing only on recommended customers improves targeting effectiveness, reducing resource wastage on unlikely prospects.


## 🏷️ Phase 1: Mapping MCC Codes to Spend Categories

**Purpose**:  
Translate detailed Merchant Category Codes (**MCCs**) from transaction data into broader, meaningful spending categories (e.g., Fuel, Travel, Entertainment).

**Steps**:
- Define a clear mapping (`credit_card_categories`) from specific MCC codes to high-level spending categories.
- Create an inverse mapping (`mcc_to_cat`) for quick MCC-to-category lookups.
- Map each individual transaction's MCC code to the corresponding spending category. Transactions with unknown MCCs are labeled as `UNMAPPED`.

**Reasoning**:  
Grouping specific transactions into broad categories makes it easier to interpret customer spending habits clearly and intuitively.


## 📈 Phase 2: Aggregating Spending Patterns by Customer

**Purpose**:  
Summarize each customer's transactions to understand their spending patterns clearly across different categories.

**Steps**:
- Aggregate transaction amounts per customer, grouped by each spend category.
- Transform the aggregated data into a customer-level wide format (`spend_<CATEGORY>`) showing total spend per category.
- Calculate the percentage (`spend_<CATEGORY>_pct`) of each category relative to a customer's total spending.

**Reasoning**:  
Using relative percentages helps identify the most important spending category per customer, making segments directly comparable regardless of absolute spending levels.


## 🧮 Phase 3: Feature Selection and Data Standardization

**Purpose**:  
Prepare data for clustering by selecting relevant features and standardizing them to ensure each category has equal influence.

**Steps**:
- Select spending percentage columns (`spend_<CATEGORY>_pct`) as features.
- Standardize these features (mean = 0, variance = 1) using StandardScaler.

**Reasoning**:  
Standardization ensures all categories contribute equally to clustering, preventing biases toward categories with inherently larger numerical values.


## 🎯 Phase 4: Dimensionality Reduction (PCA) and Optimal Cluster Determination

**Purpose**:  
Visualize and determine the optimal number of customer segments.

**Steps**:
- Perform Principal Component Analysis (**PCA**) for 2-dimensional visualization of customer segments.
- Calculate clustering quality metrics (Silhouette Score and Davies–Bouldin Index) for different numbers of clusters (k = 2 to 8).
- Choose the number of clusters (**k**) that maximizes the silhouette score (higher = better) and minimizes Davies–Bouldin (lower = better).

**Reasoning**:  
Choosing the right number of clusters ensures meaningful segments, balancing interpretability and statistical performance.



## 🔍 Phase 5: Final Customer Clustering and Segment Profiling

**Purpose**:  
Cluster customers into segments using the optimal number of clusters and identify each segment's key spending characteristics.

**Steps**:
- Apply KMeans clustering with the chosen optimal number of clusters (**k=7**).
- Label each customer with their assigned cluster.
- Calculate the average spending percentage for each category within every cluster to create meaningful segment profiles.

**Reasoning**:  
Profiling each segment helps clarify which customers share similar spending habits and allows clear targeting for tailored marketing strategies.


## 💳 Phase 6: Credit Card Recommendation Mapping and Customer Affinity Scoring

**Purpose**:  
Match each customer segment's dominant spending category to a suitable cashback credit card product and measure individual customer affinity toward that recommended product.

**Steps**:
- Identify each cluster's dominant spending category (the category with the highest average spend percentage).
- Map these dominant categories to specific cashback card products (e.g., "Fuel Rewards Card" for the "Fuel" category).
- Calculate an affinity score (`recommendation_score`) for each customer, representing their individual percentage spend in the segment's dominant category.

**Reasoning**:  
Affinity scores quantify the strength of a customer's spending alignment with recommended credit cards, enabling prioritized marketing efforts.


## 🚦 Phase 7: Applying a Threshold to Recommendations (40% Cutoff)

**Purpose**:  
Ensure the final credit card recommendations target only those customers with a strong spending alignment to the recommended card product.

**Steps**:
- Define a minimum affinity threshold (**30%**).
- Recommend the card product to customers meeting or exceeding this threshold.
- Label customers below the threshold as "No Targeted Offer" to indicate they should not receive a specialized cashback offer.

**Reasoning**:  
Applying a threshold improves the targeting efficiency and conversion rates by focusing marketing resources only on customers highly likely to respond positively.
