# 1. Introduction: Business Goal & Problem Definition

IF YOU LIKE IT OR IF IT HELPS YOU SOMEHOW, COULD YOU PLEASE UPVOTE? THANK YOU VERY MUCH!!!

The goal of this project is to identify, study and analyze credit card holder´s clusters, so the business can have a better understanding of its customers segmentations and adapt different marketing strategies to each of them, increasing its revenue and market share. For that we´ll use the Credit Card Dataset for Clustering dataset available in Kaggle, containing 9000 active credit card holders. Each customer has the following attributes:

* CUSTID : Identification of Credit Card holder (Categorical)
* BALANCE : Balance amount left in their account to make purchases (
* BALANCEFREQUENCY : How frequently the Balance is updated, score between 0 and 1 (1 = frequently updated, 0 = not frequently updated)
* PURCHASES : Amount of purchases made from account
* ONEOFFPURCHASES : Maximum purchase amount done in one-go
* INSTALLMENTSPURCHASES : Amount of purchase done in installment
* CASHADVANCE : Cash in advance given by the user
* PURCHASESFREQUENCY : How frequently the Purchases are being made, score between 0 and 1 (1 = frequently purchased, 0 = not frequently purchased)
* ONEOFFPURCHASESFREQUENCY : How frequently Purchases are happening in one-go (1 = frequently purchased, 0 = not frequently purchased)
* PURCHASESINSTALLMENTSFREQUENCY : How frequently purchases in installments are being done (1 = frequently done, 0 = not frequently done)
* CASHADVANCEFREQUENCY : How frequently the cash in advance being paid
* CASHADVANCETRX : Number of Transactions made with "Cash in Advanced"
* PURCHASESTRX : Numbe of purchase transactions made
* CREDITLIMIT : Limit of Credit Card for user
* PAYMENTS : Amount of Payment done by user
* MINIMUM_PAYMENTS : Minimum amount of payments made by user
* PRCFULLPAYMENT : Percent of full payment paid by user
* TENURE : Tenure of credit card service for user

# 2. Importing Basic Libraries

In [None]:
import io
import openpyxl
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# 3. Data Collection

In [None]:
cc_ds = pd.read_csv('../input/ccdata/CC GENERAL.csv', encoding='latin1', sep=",")

cc_ds

# 4. Data Preliminary Exploration

In [None]:
#Checking a dataset sample

pd.set_option("display.max_rows", 100)
pd.set_option("display.max_columns", 100)
pd.options.display.float_format="{:,.2f}".format
cc_ds.sample(n=10, random_state=0)

In [None]:
#Checking dataset info by feature

cc_ds.info(verbose=True, null_counts=True)

In [None]:
#Checking the existence of zeros in rows

(cc_ds==0).sum(axis=0).to_excel("zeros_per_feature.xlsx")
(cc_ds==0).sum(axis=0)

In [None]:
#Checking the existence of duplicated rows

cc_ds.duplicated().sum()

In [None]:
#Checking basic statistical data by feature

cc_ds.describe(include="all")

# 5. Data Cleaning

    We´ll perform the following:
    
    
    1. Treat missing values:
        1.1 CREDIT_LIMIT: replace by mean
        2.2 MINIMUM_PAYMENTS: replace by mean
        
    
    2. Remove outliers:
        2.1 PURCHASES: 49039.57 (index 550): replace by mean
        2.2 INSTALLMENTS_PURCHASES 22500 (index 5260): replace by mean
        2.3 CASH_ADVANCE 47137.21176 (index 2159): replace by mean
        2.4 MINIMUM_PAYMENTS: 76406.20752 (index 4376): replace by mean

In [None]:
#1

cc_ds["CREDIT_LIMIT"].fillna(cc_ds["CREDIT_LIMIT"].mean(), inplace=True)
cc_ds["MINIMUM_PAYMENTS"].fillna(cc_ds["MINIMUM_PAYMENTS"].mean(), inplace=True)

#2

cc_ds.loc[550, "PURCHASES"] = cc_ds["PURCHASES"].mean()
cc_ds.loc[5260, "INSTALLMENTS_PURCHASES"] = cc_ds["INSTALLMENTS_PURCHASES"].mean()
cc_ds.loc[2159, "CASH_ADVANCE"] = cc_ds["CASH_ADVANCE"].mean()
cc_ds.loc[4376, "MINIMUM_PAYMENTS"] = cc_ds["MINIMUM_PAYMENTS"].mean()

cc_ds.to_excel("cc_ds_clean.xlsx")

# 6. Data Exploration

In [None]:
#Plotting Numerical Variables

fig, ax = plt.subplots(1, 3)
fig.suptitle("BALANCE Distribution", fontsize=15)
sns.distplot(cc_ds["BALANCE"], ax=ax[0])
sns.boxplot(cc_ds["BALANCE"], ax=ax[1])
sns.violinplot(cc_ds["BALANCE"], ax=ax[2])

fig, ax = plt.subplots(1, 3)
fig.suptitle("BALANCE_FREQUENCY Distribution", fontsize=15)
sns.distplot(cc_ds["BALANCE_FREQUENCY"], ax=ax[0])
sns.boxplot(cc_ds["BALANCE_FREQUENCY"], ax=ax[1])
sns.violinplot(cc_ds["BALANCE_FREQUENCY"], ax=ax[2])

fig, ax = plt.subplots(1, 3)
fig.suptitle("PURCHASES Distribution", fontsize=15)
sns.distplot(cc_ds["PURCHASES"], ax=ax[0])
sns.boxplot(cc_ds["PURCHASES"], ax=ax[1])
sns.violinplot(cc_ds["PURCHASES"], ax=ax[2])

fig, ax = plt.subplots(1, 3)
fig.suptitle("ONEOFF_PURCHASES Distribution", fontsize=15)
sns.distplot(cc_ds["ONEOFF_PURCHASES"], ax=ax[0])
sns.boxplot(cc_ds["ONEOFF_PURCHASES"], ax=ax[1])
sns.violinplot(cc_ds["ONEOFF_PURCHASES"], ax=ax[2])

fig, ax = plt.subplots(1, 3)
fig.suptitle("INSTALLMENTS_PURCHASES Distribution", fontsize=15)
sns.distplot(cc_ds["INSTALLMENTS_PURCHASES"], ax=ax[0])
sns.boxplot(cc_ds["INSTALLMENTS_PURCHASES"], ax=ax[1])
sns.violinplot(cc_ds["INSTALLMENTS_PURCHASES"], ax=ax[2])

fig, ax = plt.subplots(1, 3)
fig.suptitle("CASH_ADVANCE Distribution", fontsize=15)
sns.distplot(cc_ds["CASH_ADVANCE"], ax=ax[0])
sns.boxplot(cc_ds["CASH_ADVANCE"], ax=ax[1])
sns.violinplot(cc_ds["CASH_ADVANCE"], ax=ax[2])

fig, ax = plt.subplots(1, 3)
fig.suptitle("PURCHASES_FREQUENCY Distribution", fontsize=15)
sns.distplot(cc_ds["PURCHASES_FREQUENCY"], ax=ax[0])
sns.boxplot(cc_ds["PURCHASES_FREQUENCY"], ax=ax[1])
sns.violinplot(cc_ds["PURCHASES_FREQUENCY"], ax=ax[2])

fig, ax = plt.subplots(1, 3)
fig.suptitle("ONEOFF_PURCHASES_FREQUENCY Distribution", fontsize=15)
sns.distplot(cc_ds["ONEOFF_PURCHASES_FREQUENCY"], ax=ax[0])
sns.boxplot(cc_ds["ONEOFF_PURCHASES_FREQUENCY"], ax=ax[1])
sns.violinplot(cc_ds["ONEOFF_PURCHASES_FREQUENCY"], ax=ax[2])

fig, ax = plt.subplots(1, 3)
fig.suptitle("PURCHASES_INSTALLMENTS_FREQUENCY Distribution", fontsize=15)
sns.distplot(cc_ds["PURCHASES_INSTALLMENTS_FREQUENCY"], ax=ax[0])
sns.boxplot(cc_ds["PURCHASES_INSTALLMENTS_FREQUENCY"], ax=ax[1])
sns.violinplot(cc_ds["PURCHASES_INSTALLMENTS_FREQUENCY"], ax=ax[2])

fig, ax = plt.subplots(1, 3)
fig.suptitle("CASH_ADVANCE_FREQUENCY Distribution", fontsize=15)
sns.distplot(cc_ds["CASH_ADVANCE_FREQUENCY"], ax=ax[0])
sns.boxplot(cc_ds["CASH_ADVANCE_FREQUENCY"], ax=ax[1])
sns.violinplot(cc_ds["CASH_ADVANCE_FREQUENCY"], ax=ax[2])

fig, ax = plt.subplots(1, 3)
fig.suptitle("CASH_ADVANCE_TRX Distribution", fontsize=15)
sns.distplot(cc_ds["CASH_ADVANCE_TRX"], ax=ax[0])
sns.boxplot(cc_ds["CASH_ADVANCE_TRX"], ax=ax[1])
sns.violinplot(cc_ds["CASH_ADVANCE_TRX"], ax=ax[2])

fig, ax = plt.subplots(1, 3)
fig.suptitle("PURCHASES_TRX Distribution", fontsize=15)
sns.distplot(cc_ds["PURCHASES_TRX"], ax=ax[0])
sns.boxplot(cc_ds["PURCHASES_TRX"], ax=ax[1])
sns.violinplot(cc_ds["PURCHASES_TRX"], ax=ax[2])

fig, ax = plt.subplots(1, 3)
fig.suptitle("CREDIT_LIMIT Distribution", fontsize=15)
sns.distplot(cc_ds["CREDIT_LIMIT"], ax=ax[0])
sns.boxplot(cc_ds["CREDIT_LIMIT"], ax=ax[1])
sns.violinplot(cc_ds["CREDIT_LIMIT"], ax=ax[2])

fig, ax = plt.subplots(1, 3)
fig.suptitle("PAYMENTS Distribution", fontsize=15)
sns.distplot(cc_ds["PAYMENTS"], ax=ax[0])
sns.boxplot(cc_ds["PAYMENTS"], ax=ax[1])
sns.violinplot(cc_ds["PAYMENTS"], ax=ax[2])

fig, ax = plt.subplots(1, 3)
fig.suptitle("MINIMUM_PAYMENTS Distribution", fontsize=15)
sns.distplot(cc_ds["MINIMUM_PAYMENTS"], ax=ax[0])
sns.boxplot(cc_ds["MINIMUM_PAYMENTS"], ax=ax[1])
sns.violinplot(cc_ds["MINIMUM_PAYMENTS"], ax=ax[2])

fig, ax = plt.subplots(1, 3)
fig.suptitle("PRC_FULL_PAYMENT Distribution", fontsize=15)
sns.distplot(cc_ds["PRC_FULL_PAYMENT"], ax=ax[0])
sns.boxplot(cc_ds["PRC_FULL_PAYMENT"], ax=ax[1])
sns.violinplot(cc_ds["PRC_FULL_PAYMENT"], ax=ax[2])

# fig, ax = plt.subplots(1, 3)
# fig.suptitle("TENURE Distribution", fontsize=15)
# sns.distplot(cc_ds["TENURE"], ax=ax[0])
# sns.boxplot(cc_ds["TENURE"], ax=ax[1])
# sns.violinplot(cc_ds["TENURE"], ax=ax[2])

# 7. Correlations Analysis & Features Selection

In [None]:
#Deleting not relevant and original categorical columns

cc_ds2 = cc_ds.drop(["CUST_ID"], axis=1)

# #Plotting a Heatmap

# fig, ax = plt.subplots(1, figsize=(25,25))
# sns.heatmap(cc_ds2.corr(), annot=True, fmt=",.2f")
# plt.title("Heatmap Correlation", fontsize=20)
# plt.tick_params(labelsize=12)
# plt.xticks(rotation=90)
# plt.yticks(rotation=45)

# #Plotting a Pairplot

# sns.pairplot(cc_ds2)

# 8. Data Modelling

In [None]:
#Defining Xs

X_orig = cc_ds
X = cc_ds2

#Scaling all features

from sklearn.preprocessing import MinMaxScaler
sc_X = MinMaxScaler()
X_scaled = sc_X.fit_transform(X)
X_scaled = pd.DataFrame(X_scaled)

# 9. Machine Learning Algorithms Implementation & Assessment

# 9.1.1 K-means

In [None]:
#Creating a K-means model and checking its Metrics

from sklearn.cluster import KMeans

#Applying the Elbow Method to calculate distortion for a range of number of cluster

distortions = []
for i in range(1, 21):
    km = KMeans(n_clusters=i, init="random", n_init=10, max_iter=300, tol=1e-04, random_state=0)
    km.fit(X_scaled)
    distortions.append(km.inertia_)

#Plotting

plt.plot(range(1, 21), distortions, marker="o")
plt.xlabel("Number of clusters")
plt.ylabel("Distortion")
plt.show()

#Applying the Silhouette Method to interpret and validate of consistency within clusters of data

from sklearn.metrics import silhouette_score
silhouette_coefficients = []
for j in range(2, 21):
    km = KMeans(n_clusters=j, init="random", n_init=10, max_iter=300, tol=1e-04, random_state=0)
    km.fit(X_scaled)
    score = silhouette_score(X_scaled, km.labels_)
    silhouette_coefficients.append(score)

#Plotting

plt.style.use("fivethirtyeight")
plt.plot(range(2, 21), silhouette_coefficients)
plt.xticks(range(2, 21))
plt.xlabel("Number of Clusters")
plt.ylabel("Silhouette Coefficient")
plt.show()

#Choosing number of clusters

n_clusters = 2
print('Estimated number of clusters: %d' % n_clusters)
km = KMeans(n_clusters=n_clusters)
km.fit(X_scaled)
print("Silhouette Coefficient: %0.3f" % silhouette_score(X_scaled, km.fit(X_scaled).labels_))

#Plotting chosen number of clusters

from yellowbrick.cluster import silhouette_visualizer
silhouette_visualizer(KMeans(n_clusters=n_clusters, random_state=0), X_scaled)

#Visualizing clusters in the dataset
X_orig = pd.DataFrame(X_orig)
X_orig["cluster"] = km.labels_
X_orig.to_excel("model_km.xlsx")

# 9.1.2 Clusters exploration

In [None]:
print("Cluster 0")
X_orig.query("cluster == 0").describe(include="all")

In [None]:
print("Cluster 1")
X_orig.query("cluster == 1").describe(include="all")

In [None]:
#Plotting Numerical Variables

fig, ax = plt.subplots(1, len(X_orig["cluster"].unique()))
fig.suptitle("BALANCE Distribution", fontsize=15)
sns.distplot(X_orig.query("cluster == 0")["BALANCE"], label = "Cluster 0", ax=ax[0])
sns.distplot(X_orig.query("cluster == 1")["BALANCE"], label = "Cluster 1", ax=ax[1])

fig, ax = plt.subplots(1, len(X_orig["cluster"].unique()))
fig.suptitle("BALANCE_FREQUENCY Distribution", fontsize=15)
# sns.distplot(X_orig.query("cluster == 0")["BALANCE_FREQUENCY"], label = "Cluster 0", ax=ax[0])
# sns.distplot(X_orig.query("cluster == 1")["BALANCE_FREQUENCY"], label = "Cluster 1", ax=ax[1])

fig, ax = plt.subplots(1, len(X_orig["cluster"].unique()))
fig.suptitle("PURCHASES Distribution", fontsize=15)
sns.distplot(X_orig.query("cluster == 0")["PURCHASES"], label = "Cluster 0", ax=ax[0])
sns.distplot(X_orig.query("cluster == 1")["PURCHASES"], label = "Cluster 1", ax=ax[1])

fig, ax = plt.subplots(1, len(X_orig["cluster"].unique()))
fig.suptitle("ONEOFF_PURCHASES Distribution", fontsize=15)
sns.distplot(X_orig.query("cluster == 0")["ONEOFF_PURCHASES"], label = "Cluster 0", ax=ax[0])
sns.distplot(X_orig.query("cluster == 1")["ONEOFF_PURCHASES"], label = "Cluster 1", ax=ax[1])

fig, ax = plt.subplots(1, len(X_orig["cluster"].unique()))
fig.suptitle("INSTALLMENTS_PURCHASES Distribution", fontsize=15)
sns.distplot(X_orig.query("cluster == 0")["INSTALLMENTS_PURCHASES"], label = "Cluster 0", ax=ax[0])
sns.distplot(X_orig.query("cluster == 1")["INSTALLMENTS_PURCHASES"], label = "Cluster 1", ax=ax[1])

fig, ax = plt.subplots(1, len(X_orig["cluster"].unique()))
fig.suptitle("CASH_ADVANCE Distribution", fontsize=15)
sns.distplot(X_orig.query("cluster == 0")["CASH_ADVANCE"], label = "Cluster 0", ax=ax[0])
sns.distplot(X_orig.query("cluster == 1")["CASH_ADVANCE"], label = "Cluster 1", ax=ax[1])

fig, ax = plt.subplots(1, len(X_orig["cluster"].unique()))
fig.suptitle("PURCHASES_FREQUENCY Distribution", fontsize=15)
sns.distplot(X_orig.query("cluster == 0")["PURCHASES_FREQUENCY"], label = "Cluster 0", ax=ax[0])
sns.distplot(X_orig.query("cluster == 1")["PURCHASES_FREQUENCY"], label = "Cluster 1", ax=ax[1])

fig, ax = plt.subplots(1, len(X_orig["cluster"].unique()))
fig.suptitle("ONEOFF_PURCHASES_FREQUENCY Distribution", fontsize=15)
sns.distplot(X_orig.query("cluster == 0")["ONEOFF_PURCHASES_FREQUENCY"], label = "Cluster 0", ax=ax[0])
sns.distplot(X_orig.query("cluster == 1")["ONEOFF_PURCHASES_FREQUENCY"], label = "Cluster 1", ax=ax[1])

fig, ax = plt.subplots(1, len(X_orig["cluster"].unique()))
fig.suptitle("PURCHASES_INSTALLMENTS_FREQUENCY Distribution", fontsize=15)
sns.distplot(X_orig.query("cluster == 0")["PURCHASES_INSTALLMENTS_FREQUENCY"], label = "Cluster 0", ax=ax[0])
sns.distplot(X_orig.query("cluster == 1")["PURCHASES_INSTALLMENTS_FREQUENCY"], label = "Cluster 1", ax=ax[1])

fig, ax = plt.subplots(1, len(X_orig["cluster"].unique()))
fig.suptitle("CASH_ADVANCE_FREQUENCY Distribution", fontsize=15)
sns.distplot(X_orig.query("cluster == 0")["CASH_ADVANCE_FREQUENCY"], label = "Cluster 0", ax=ax[0])
sns.distplot(X_orig.query("cluster == 1")["CASH_ADVANCE_FREQUENCY"], label = "Cluster 1", ax=ax[1])

fig, ax = plt.subplots(1, len(X_orig["cluster"].unique()))
fig.suptitle("CASH_ADVANCE_TRX Distribution", fontsize=15)
sns.distplot(X_orig.query("cluster == 0")["CASH_ADVANCE_TRX"], label = "Cluster 0", ax=ax[0])
sns.distplot(X_orig.query("cluster == 1")["CASH_ADVANCE_TRX"], label = "Cluster 1", ax=ax[1])

fig, ax = plt.subplots(1, len(X_orig["cluster"].unique()))
fig.suptitle("PURCHASES_TRX Distribution", fontsize=15)
sns.distplot(X_orig.query("cluster == 0")["PURCHASES_TRX"], label = "Cluster 0", ax=ax[0])
sns.distplot(X_orig.query("cluster == 1")["PURCHASES_TRX"], label = "Cluster 1", ax=ax[1])

fig, ax = plt.subplots(1, len(X_orig["cluster"].unique()))
fig.suptitle("CREDIT_LIMIT Distribution", fontsize=15)
sns.distplot(X_orig.query("cluster == 0")["CREDIT_LIMIT"], label = "Cluster 0", ax=ax[0])
sns.distplot(X_orig.query("cluster == 1")["CREDIT_LIMIT"], label = "Cluster 1", ax=ax[1])

fig, ax = plt.subplots(1, len(X_orig["cluster"].unique()))
fig.suptitle("PAYMENTS Distribution", fontsize=15)
sns.distplot(X_orig.query("cluster == 0")["PAYMENTS"], label = "Cluster 0", ax=ax[0])
sns.distplot(X_orig.query("cluster == 1")["PAYMENTS"], label = "Cluster 1", ax=ax[1])

fig, ax = plt.subplots(1, len(X_orig["cluster"].unique()))
fig.suptitle("MINIMUM_PAYMENTS Distribution", fontsize=15)
sns.distplot(X_orig.query("cluster == 0")["MINIMUM_PAYMENTS"], label = "Cluster 0", ax=ax[0])
sns.distplot(X_orig.query("cluster == 1")["MINIMUM_PAYMENTS"], label = "Cluster 1", ax=ax[1])

fig, ax = plt.subplots(1, len(X_orig["cluster"].unique()))
fig.suptitle("PRC_FULL_PAYMENT Distribution", fontsize=15)
# sns.distplot(X_orig.query("cluster == 0")["PRC_FULL_PAYMENT"], label = "Cluster 0", ax=ax[0])
sns.distplot(X_orig.query("cluster == 1")["PRC_FULL_PAYMENT"], label = "Cluster 1", ax=ax[1])

fig, ax = plt.subplots(1, len(X_orig["cluster"].unique()))
fig.suptitle("TENURE Distribution", fontsize=15)
# sns.distplot(X_orig.query("cluster == 0")["TENURE"], label = "Cluster 0", ax=ax[0])
# sns.distplot(X_orig.query("cluster == 1")["TENURE"], label = "Cluster 1", ax=ax[1])

In [None]:
# #Plotting scatter graph per pair features

# #Mapping every individual cluster to a color

# colors = ['goldenrod', 'olive', 'navy']

# vectorizer = np.vectorize(lambda x: colors[x % len(colors)])

# #Plotting

# for i in range(0, X_scaled.shape[1]):
#     for j in range(1, X_scaled.shape[1]):
#         plt.scatter(X_scaled.iloc[:,i], X_scaled.iloc[:,j])
#         plt.xlabel(X.columns[i])
#         plt.ylabel(X.columns[j])
#         plt.show()

# 9.2 DBSCAN

In [None]:
#Creating a DBSCAN model and checking its Metrics
#OBS: we´re exploring DBSCAN only as a study exercise in this project - we´ll adopt K-Means

from sklearn.neighbors import NearestNeighbors

#We can calculate the distance from each point to its closest neighbour using the NearestNeighbors. The point itself is included in n_neighbors. The kneighbors method returns two arrays, one which contains the distance to the closest n_neighbors points and the other which contains the index for each of those points

neigh = NearestNeighbors(n_neighbors=2)
nbrs = neigh.fit(X_scaled)
distances, indices = nbrs.kneighbors(X_scaled)

#Soring and plotting results

distances = np.sort(distances, axis=0)
distances = distances[:,1]
plt.plot(distances)
plt.xlabel("Distances to the closest n_neighbors")
plt.ylabel("eps")
plt.show()

from sklearn.cluster import DBSCAN

#Selecting the best eps (the optimal value for epsilon will be found at the point of maximum curvature)

dbs = DBSCAN(eps=0.3)
dbs.fit(X_scaled)

#The labels_ property contains the list of clusters and their respective points

clusters = dbs.labels_

from sklearn import metrics

#Number of clusters in labels, ignoring noise (outlier) (-1) if present

n_clusters = len(set(clusters)) - (1 if -1 in clusters else 0)
n_noise_ = list(clusters).count(-1)
print('Estimated number of clusters: %d' % n_clusters)
print('Estimated number of noise points: %d' % n_noise_)
print("Silhouette Coefficient: %0.3f" % metrics.silhouette_score(X_scaled, clusters))

#Visualizing clusters in the dataset
X_orig = pd.DataFrame(X_orig)
X_orig["cluster"] = dbs.labels_
X_orig.to_excel("model_dbs.xlsx")

# 10. Conclusions

IF YOU LIKE IT OR IF IT HELPS YOU SOMEHOW, COULD YOU PLEASE UPVOTE? THANK YOU VERY MUCH!!!

In this exercise we went through all the process from collecting data, exploring features and distributions, treating data, understanding correlations, selecting relevant features, data modelling and presenting a clustering model, indicating groups of customers with similarities to explored, as explained below, so the credit card company can have a better understanding of its customers segmentations and adapt different marketing strategies to each of them, bringing more revenue and market share to the business.

First group of clients: the first group is composed by the more conservative customers. Those customers, when compared to the second group, have a 20% higher Balance, 82% lower purchases amount, 74% lower one off purchase amount and 93% lower installment purchase. At the same time, they have a 138% higher cash in advance. Those clients, although represent the minority, need to be kept, as they´re good payers and have their finances under control, but also they have the potential to grow when the right products are offered to them.

Second group of clients: the second group represents 85% of the purchases, and unlike the first group, although they have a lower balance, they have much higher purchases rates and lower cash in advance given. This group is more dynamic, has a more inconstant behavior, is the core for the business and can be constantly incentivized on new products, having the caution on their payment capacity.