### **Customer Segmentation using Clustering methods**

---

#### **1. Problem Formulation**

The goal of this task is to use unsupervised learning algorithms to perform **market segmentation** on a dataset of 9,000 active credit card users. The aim is to group the customers based on their usage behavior and provide recommendations for financial products like **saving plans, loans, and wealth management**. These customer groups will be targeted differently based on their behavior patterns, helping to craft personalized recommendations.

The dataset consists of **18 behavioral variables** including customer information such as balance, frequency of purchases, cash advances, and payment history. Each of these features is used to cluster customers into distinct groups, which will aid in targeted marketing and personalized recommendations.

#### **2. Data Preprocessing and Sampling Approach**

The dataset consists of 8,950 records with 18 attributes, including numeric variables such as balances, payment history, and purchase frequencies. The following preprocessing steps were applied before training the models:

- **Handling Missing Values**: Missing data was handled by removing the records with missing values.
  
- **Normalization**: The dataset was normalized to ensure that each variable contributed equally to the clustering process. This was done by **scaling** the data using the range of each attribute.

- **Data Splitting**: As the task uses unsupervised learning, there was no explicit training and test split. Instead, the entire dataset was used to fit the unsupervised learning models. We used **K-Means clustering** and **Hierarchical clustering**, which are typically applied to the entire dataset without the need for a traditional training-test split.

#### **3. Approach and Model Selection**

##### **Clustering Algorithms Used**

The primary algorithms used for market segmentation are:

- **K-Means Clustering**:
  - K-Means is a centroid-based clustering algorithm. It is efficient and widely used for segmenting a large dataset into `k` clusters. The number of clusters, `k`, is a key hyperparameter that influences the quality of the segmentation.
  
  - **Hyperparameters**:
    - **Number of clusters (k)**: We used **k=2** and **k=3** based on domain knowledge, aiming to segment the data into two or three distinct groups (e.g., high spenders vs. low spenders).
    - **Initialization method**: Random initialization with **nstart=1** was used for simplicity. In more complex scenarios, `nstart` can be increased to improve the initialization of centroids.

- **Hierarchical Clustering**:
  - Hierarchical clustering builds a tree of clusters by either successively merging (agglomerative) or splitting (divisive) them. For this task, **Ward’s method** (minimizing variance) was used to produce hierarchical clustering.
  
  - **Hyperparameters**:
    - **Linkage Method**: We used **Ward’s method** for hierarchical clustering, which is effective in finding spherical clusters and minimizing within-cluster variance.
    - The number of clusters was determined using the **dendrogram** or by cutting the tree at an appropriate level.

##### **Model Selection**

The model selection method used is **elbow method** and **silhouette analysis** to find the optimal number of clusters (k):

- **Elbow Method**: This method involves plotting the **within-cluster sum of squares** (WSS) for different values of `k` and identifying the point where the reduction in WSS starts to flatten, indicating the optimal number of clusters.

- **Silhouette Score**: The silhouette score measures how similar an object is to its own cluster compared to other clusters. A higher silhouette score indicates better-defined clusters.

#### **4. Hyperparameter Selection and Justification**

- **K-Means Clustering**: The value of `k` was selected based on both the **elbow method** and **silhouette analysis**, with `k=2` and `k=3` as the most suitable cluster counts based on the resulting evaluation metrics. These cluster counts were validated visually through the silhouette score and cluster quality.

- **Hierarchical Clustering**: The number of clusters was determined from the dendrogram and the silhouette score. Initially, the data was visualized using the dendrogram, and then clusters were cut at `k=2` and `k=3` levels to assess the outcomes.

#### **5. Evaluation Metrics**

The following evaluation metrics were used for assessing the clustering performance:

- **Silhouette Score**: This metric evaluates how well-separated the clusters are. A higher silhouette score means that the data points are well clustered. It is especially useful when the ground truth is unknown.

- **Cluster Cohesion and Separation**: This is measured through **within-cluster sum of squares (WSS)** and visual inspection of the clustering results (e.g., 2D visualizations using PCA).

- **Cluster Distribution**: The size of each cluster was checked using the **cluster indicator vector** (from both K-Means and Hierarchical clustering).

#### **6. Results and Summary**

- **K-Means Clustering Results**: 
  - For `k=2`, the silhouette score was relatively high, indicating that the data could be divided into two well-separated clusters.
  - The cluster distribution for `k=2` showed that one cluster was composed of high-credit limit customers, and the other consisted of low-credit limit customers.

  - For `k=3`, the clusters had a lower silhouette score but still provided meaningful segmentation. This model split the customers into three groups: high spenders, moderate spenders, and low spenders.

- **Hierarchical Clustering Results**: 
  - The dendrogram and silhouette analysis both suggested `k=2` as the most appropriate choice, where customers were divided into two groups: one with high usage and the other with low usage.

#### **7. Discussion and Limitations**

- **Strengths**:
  - Both **K-Means** and **Hierarchical clustering** performed well for segmenting the data. The silhouette scores were high for `k=2`, suggesting that these models formed meaningful and distinct clusters.
  
- **Limitations**:
  - **Choice of `k`**: The performance of the model highly depends on the choice of `k`, and the ideal value of `k` can vary based on domain knowledge and data characteristics. While the elbow method and silhouette score are helpful, they are not foolproof.
  
  - **Scalability of K-Means**: Although K-Means is efficient, it may not perform well with non-spherical clusters or with data that has outliers. Hierarchical clustering, while more informative, may not scale well with large datasets.

  - **Handling of Outliers**: Neither method handles outliers robustly. Future improvements might involve implementing algorithms that can deal better with anomalies.
---