## **Credit Card Customer Segmentation**

### **A. Introduction**

- **Name**  : Livia Amanda Annafiah
- **Dataset** : Credit Card Information

---------------------

**Problem Statement**

A bank company struggles to effectively target and understand its diverse customer base due to a lack of insights into their specific needs and preferences. Therefore, they wanted to do Customer Segmentation from a data credit information. This data contains information about the credit card user from the last 6 months, including balance, spending patterns, credit limits, payment behavior, and other details.

 To overcome this challenge, the company requires a model capable of clustering customers based on their credit-related activities and characteristics.

**Objective**

This project aims to develop a clustering model using `K-Means` and `K-Prototypes` algorithms to segment customers based on their credit-related activities and characteristics. The goal is to identify distinct customer groups with similar behaviors and preferences, enabling the bank company to tailor their services and marketing strategies more effectively.

This notebook focuses on clustering inference data.

### **B. Libraries**

The libraries used to test the model are as follows:

In [1]:
# Import Library
import pandas as pd
import pickle
import json

**Libraries Function**
- pandas: data manipulation
- pickle: loading model
- json: reading json files

### **C. Data Loading**

The initial step involves loading the model and the inference data, which have been previously separated from the model training file.

In [2]:
# Load model and related files
with open('km.pkl', 'rb') as model_file:
    km = pickle.load(model_file)

with open('scaler.pkl', 'rb') as minmax_file:
    scaler = pickle.load(minmax_file)

with open('pca.pkl', 'rb') as pca_file:
    pca = pickle.load(pca_file)

with open('num.json', 'r') as num_std_file:
    num = json.load(num_std_file)

In [3]:
# Create a dictionary with the data
data = {'CUST_ID': [1],
        'BALANCE': [5000],
        'BALANCE_FREQUENCY': [0.95],
        'PURCHASES': [1200],
        'ONEOFF_PURCHASES': [800],
        'INSTALLMENTS_PURCHASES': [400],
        'CASH_ADVANCE': [300],
        'PURCHASES_FREQUENCY': [0.6],
        'ONEOFF_PURCHASES_FREQUENCY': [0.4],
        'PURCHASES_INSTALLMENTS_FREQUENCY': [0.2],
        'CASH_ADVANCE_FREQUENCY': [0.1],
        'CASH_ADVANCE_TRX': [1],
        'PURCHASES_TRX': [12],
        'CREDIT_LIMIT': [3000],
        'PAYMENTS': [1500],
        'MINIMUM_PAYMENTS': [1000],
        'PRC_FULL_PAYMENT': [0.3],
        'TENURE': [8]}

# Create the DataFrame
df_inf = pd.DataFrame(data)

# Print the DataFrame
df_inf

Unnamed: 0,CUST_ID,BALANCE,BALANCE_FREQUENCY,PURCHASES,ONEOFF_PURCHASES,INSTALLMENTS_PURCHASES,CASH_ADVANCE,PURCHASES_FREQUENCY,ONEOFF_PURCHASES_FREQUENCY,PURCHASES_INSTALLMENTS_FREQUENCY,CASH_ADVANCE_FREQUENCY,CASH_ADVANCE_TRX,PURCHASES_TRX,CREDIT_LIMIT,PAYMENTS,MINIMUM_PAYMENTS,PRC_FULL_PAYMENT,TENURE
0,1,5000,0.95,1200,800,400,300,0.6,0.4,0.2,0.1,1,12,3000,1500,1000,0.3,8


### **D. Data Splitting**

After loading the data, only the numerical columns are extracted and saved to a variable since K-Means only processes numerical data.

In [4]:
# Split between numerical and categorical column
df_inf_num = df_inf[num]
df_inf_num

Unnamed: 0,BALANCE,PURCHASES,ONEOFF_PURCHASES,INSTALLMENTS_PURCHASES,CASH_ADVANCE,CASH_ADVANCE_TRX,PURCHASES_TRX,CREDIT_LIMIT,PAYMENTS,MINIMUM_PAYMENTS,PRC_FULL_PAYMENT,TENURE
0,5000,1200,800,400,300,1,12,3000,1500,1000,0.3,8


### **E. Feature Engineering**

Before predicting the cluster, the data needs to be preprocessed, which includes scaling to standardize the values and applying PCA to reduce dimensions.

In [5]:
# Feature scaling
df_inf_scaled = scaler.transform(df_inf_num)
df_inf_scaled

array([[0.63963283, 0.27965509, 0.34769306, 0.21935357, 0.06585372,
        0.0625    , 0.18461538, 0.13947991, 0.23021505, 0.12792657,
        0.6       , 0.33333333]])

In [6]:
## Dimensionality reduction using PCA
data_inf_scaled_pca = pca.transform(df_inf_scaled)
data_inf_scaled_pca

array([[ 0.21598301, -0.0891054 ,  0.23058979, -0.10808656,  0.2235182 ,
         0.63001161]])

### **F. Cluster Prediction**

Finally, the prepared model can be applied to the processed inference data to generate predictions.

In [7]:
# Predict cluster
cluster_df_inf = km.predict(data_inf_scaled_pca)

if cluster_df_inf == 0:
    cluster_df_inf = 'Credit-Reliant Users'
elif cluster_df_inf == 1:
    cluster_df_inf = 'Minimalist Users'
elif cluster_df_inf == 2:
    cluster_df_inf = 'High Rollers Users'
elif cluster_df_inf == 3:
    cluster_df_inf = 'Financially Disciplined Users'
    
# Show result
print('Cluster:', cluster_df_inf)

Cluster: Financially Disciplined Users


### **G. Conclusion**

It can be inferred that the customer is part of Cluster 3, representing customer who exhibit financial discipline, paying off balances quickly or using credit cards sparingly to avoid debt. This demonstrates the successful prediction of the customer's cluster by the model.