**In this project we are going to try to develop customer segmentation based on credit card usage see if we can define a marketing strategy.
The data set prisents the usage behavior of about 9000 active credit card during the last 6 months. 
The file contains 18 different columns.**

Column explanation:

1. CUSTID : Identification of Credit Card holder (Categorical)
2. BALANCE : Balance amount left in their account to make purchases (
3. BALANCEFREQUENCY : How frequently the Balance is updated, score between 0 and 1 (1 = frequently updated, 0 = not frequently updated)
4. PURCHASES : Amount of purchases made from account
5. ONEOFFPURCHASES : Maximum purchase amount done in one-go
6. INSTALLMENTSPURCHASES : Amount of purchase done in installment
7. CASHADVANCE : Cash in advance given by the user
8. PURCHASESFREQUENCY : How frequently the Purchases are being made, score between 0 and 1 (1 = frequently purchased, 0 = not frequently purchased)
9. ONEOFFPURCHASESFREQUENCY : How frequently Purchases are happening in one-go (1 = frequently purchased, 0 = not frequently purchased)
10. PURCHASESINSTALLMENTSFREQUENCY : How frequently purchases in installments are being done (1 = frequently done, 0 = not frequently done)
11. CASHADVANCEFREQUENCY : How frequently the cash in advance being paid
12. CASHADVANCETRX : Number of Transactions made with "Cash in Advanced"
13. PURCHASESTRX : Numbe of purchase transactions made
14. CREDITLIMIT : Limit of Credit Card for user
15. PAYMENTS : Amount of Payment done by user
16. MINIMUM_PAYMENTS : Minimum amount of payments made by user
17. PRCFULLPAYMENT : Percent of full payment paid by user
18. TENURE : Tenure of credit card service for user

**Importing libraries to start the project**

In [1]:
import pandas as pd
import numpy as np
import math
import xlrd
import matplotlib.pyplot as plt
import scipy.stats
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_squared_error
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler, normalize
from sklearn.cluster import KMeans

**Loading the data set**

In [1]:
data=pd.read_csv("../input/CC GENE.csv")

**Looking the head of the data set**

In [1]:
data.head()

**Looking dimension**

In [1]:
data.shape

**Looking the summary of data set**

In [1]:
data.describe()

**Looking is there a missing values**

In [1]:
data.isnull().sum()

**Filling the missing values with mean**

In [1]:
data.MINIMUM_PAYMENTS=data.MINIMUM_PAYMENTS.fillna(data.MINIMUM_PAYMENTS.mean())

**Filling the missing values with mean**

In [1]:
data.CREDIT_LIMIT=data.CREDIT_LIMIT.fillna(data.CREDIT_LIMIT.mean())

** Looking is there a missing values**

In [1]:
data.isnull().sum()

**Removing Cust id variable**

In [1]:
data.drop("CUST_ID", axis=1, inplace=True)

**Looking dimension**

In [1]:
data.shape

**Looking the heatmap for correlation**

In [1]:
plt.figure(figsize=(9,7))
sns.heatmap(data.corr(),cmap='coolwarm')
plt.title('Correlation Matrix')
plt.show()

**Looking the pair plot for variables in data set**

In [1]:
#sns.pairplot(data)
#plt.show()

**Looking the number if clusters in data set**

In [1]:
wcss = []
K = range(1,30)
for k in K:
    kmeanModel = KMeans(n_clusters=k)
    kmeanModel.fit(data)
    wcss.append(kmeanModel.inertia_)

**Using elbow method to see the number of cluster**

In [1]:
plt.figure(figsize=(16,8))
plt.plot(K, wcss, 'bx-')
plt.xlabel('Number of k')
plt.ylabel('WCSS')
plt.title('The Elbow Method showing the optimal number of k')
plt.show()

**Making the cluster model with 8 clusters**

In [1]:
Kmeans=KMeans(n_clusters=8)
Kmeans.fit(data)
y_Kmeans=Kmeans.predict(data)
data["Cluster"] = y_Kmeans

**Selecting most imported columns for final model**

In [1]:
data_model=["BALANCE", "PURCHASES", "CASH_ADVANCE","CREDIT_LIMIT", "PAYMENTS", "MINIMUM_PAYMENTS", "TENURE"]
data["Cluster"] = y_Kmeans
data_model.append("Cluster")
data[data_model].head()

**Looking pair plot for final model**

In [1]:
plt.figure(figsize=(25,25))
sns.pairplot( data[data_model], hue="Cluster")

![image.png](attachment:image.png)

After running the model, the cluster colour si changing for every run, but we can see some of the cluster groups :
 1. minimum biggest payments & lowest credit limit 
 2. big spenders with large payments  
 3. cash advance & large payments 
 4. group with high credt limit 
 5. group with cash advance & low payments 
 6. small spenders & low credit limit 
 7. group with largest minimum payments