# AllLife Bank Customer Segmentation

The Bank is running new marketing campaign for next year. The marketing team suggested to do a customer segmentation to focused those marketing to each customer groups in order to target on new customers as well as existing customers. The Operation teams wants to ensure that customer queries are resolved faster.

### Objective
Identify different segmentation in the existing customer, based on theri spending patterns as well as past interaction with the bank
* Using clustering algoritms
* Provide recommendations to the bank on how to better market to and service these customers.

## Key Questions

- How many different types (clusters/segments) of bank customers can be found from the data?
- How do these different groups of customer differ from each other?
- How to perform clustering using the components obtained from PCA?

## Data Description
Financial attributes of Bank Customers: credit limit, total number of credit cards, different contact channels

**Data Dictionary**
- Sl_No: Primary key of the records
- Customer Key: Customer identification number
- Average Credit Limit: Average credit limit of each customer for all credit cards
- Total credit cards: Total number of credit cards possessed by the customer
- Total visits bank: Total number of visits that customer made (yearly) personally to the bank
- Total visits online: Total number of visits or online logins made by the customer (yearly)
- Total calls made: Total number of calls made by the customer to the bank or its customer service department (yearly)

In [2]:
# this will help in making the Python code more structured automatically (good coding practice)
%load_ext nb_black
# Import data manipulation libraries
import pandas as pd
import numpy as np

# Libraries to help with data visualization
import matplotlib.pyplot as plt
import seaborn as sns

# to scale the data using z-score
from sklearn.preprocessing import StandardScaler

# to compute distances
from scipy.spatial.distance import cdist

# to perform k-means clustering and compute silhouette scores
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

# to visualize the elbow curve and silhouette scores
from yellowbrick.cluster import KElbowVisualizer, SilhouetteVisualizer

# to compute distances
from scipy.spatial.distance import pdist

# to perform hierarchical clustering, compute cophenetic correlation, and create dendrograms
from sklearn.cluster import AgglomerativeClustering
from scipy.cluster.hierarchy import dendrogram, linkage, cophenet

# to perform PCA
from sklearn.decomposition import PCA

<IPython.core.display.Javascript object>

In [8]:
## Read Data 
## S1_No can be use as index column
data = pd.read_excel("CreditCardCustomerData.xlsx", index_col="Sl_No")

<IPython.core.display.Javascript object>

Define the problem and perform an Exploratory Data Analysis
Problem definition, questions to be answered - Data background and contents - Univariate analysis - Bivariate analysis - Insights based on EDA

Data preprocessing
Prepare the data for analysis - Feature engineering - Missing value treatment - Outlier treatment - Duplicate observations check

Applying K-means Clustering
Apply K-means Clustering - Plot the Elbow curve - Check Silhouette Scores - Figure out appropriate number of clusters - Cluster Profiling


Applying Hierarchical Clustering
Apply Hierarchical clustering with different linkage methods - Plot dendrograms for each linkage method - Check cophenetic correlation for each linkage method - Figure out appropriate number of clusters - Cluster Profiling

K-means vs Hierarchical Clustering
Compare clusters obtained from K-means and Hierarchical clustering techniques

Actionable Insights & Recommendations
Conclude with the key takeaways for the business - What would be your recommendations to the business?