# Segmenting Customers

You have been asked to review how the customer ratings data looks when modeled with 3 and 4 clusters.

Using the information contained in this notebook, apply the K-means algrothim to the service_ratings data using both 3 and 4 clusters to segment the customer information.

In [9]:
# Import the modules
import pandas as pd
from pathlib import Path
import hvplot.pandas

# Import the K-means algorithm
from sklearn.cluster import KMeans

In [10]:
# Read in the CSV file as a Pandas DataFrame
service_ratings_df = pd.read_csv(
    Path("../Resources/service_ratings.csv")
)

# Review the DataFrame
service_ratings_df.head()

Unnamed: 0,mobile_app_rating,personal_banker_rating
0,3.5,2.4
1,3.65,3.14
2,2.9,2.75
3,2.93,3.36
4,2.89,2.62


In [11]:
# Visualize a scatter plot of the data
service_ratings_df.hvplot.scatter(
    x="mobile_app_rating", 
    y="personal_banker_rating"
)

## Run the k-means model with 3 clusters

In [12]:
# Create and initialize the K-means model instance for 3 clusters
# Set the random_state variable to 1
k_model = KMeans(n_clusters=3,random_state=1)

# Print the model
print(k_model)

KMeans(n_clusters=3, random_state=1)


In [13]:
# Fit the data to the instance of the model
k_model.fit(service_ratings_df)

  super()._check_params_vs_input(X, default_n_init=10)


In [14]:
# Make predictions about the data clusters using the trained model
cluster_segment_3 = k_model.predict(service_ratings_df)

# Print the predictions
print(cluster_segment_3)

[0 2 1 1 1 1 2 2 1 1 2 0 2 2 0 2 1 2 1 1 2 2 1 2 1 0 1 2 1 0 2 1 2 0 2 1 1
 1 2 1 1 2 2 2 2 1 1 2 2 2 1 2 2 2 2 2 1 0 1 2 1 2 1 2 1 1 0 0 1 2 0 1 1 2
 1 2 0 1 2 2 2 0 1 2 2 2 1 2 0 1 2 1 2 2 0 2 0 2 1 2 0 1 2 1 2 2 1 1 2 2 1
 2 2 2 0 2 1 1 1 2 0 2 1 1 1 1 2 2 1 1 1 2 2 2 1 0 1 2 2 2 2 1 1 2 2 2 1 0
 2 1 1 1 0 2 2 2 0 2 2 2 1 2 0 1 2 1 1 1 1 0 2 1 0 2 2 1 2 2 2 1 2 0 2]


In [15]:
# Create a copy of the DataFrame and name it as service_ratings_predictions_df
service_ratings_predictions_df = service_ratings_df.copy()

# Add a column to the DataFrame that contains the customer_segment information
service_ratings_predictions_df['segment 3'] = cluster_segment_3

# Review the DataFrame
service_ratings_predictions_df.head()

Unnamed: 0,mobile_app_rating,personal_banker_rating,segment 3
0,3.5,2.4,0
1,3.65,3.14,2
2,2.9,2.75,1
3,2.93,3.36,1
4,2.89,2.62,1


In [16]:
# Plot the data points based on the customer rating
service_ratings_predictions_df.hvplot.scatter(
    x='mobile_app_rating', 
    y='personal_banker_rating',
    by='segment 3'
)

## Run the k-means model with 4 clusters

In [17]:
# Create and initialize the K-means model instance for 4 clusters
k_model = KMeans(n_clusters=4,random_state=1)

# Print the model
print(k_model)

KMeans(n_clusters=4, random_state=1)


In [18]:
# Fit the data to the instance of the model
k_model.fit(service_ratings_df)

  super()._check_params_vs_input(X, default_n_init=10)


In [19]:
# Make predictions about the data clusters using the trained model
cluster_segment_4 = k_model.predict(service_ratings_df)

# Print the predictions
print(cluster_segment_4)

[1 3 2 2 2 2 3 3 2 0 2 2 3 2 1 3 2 3 2 2 3 3 2 3 2 1 0 3 0 1 3 2 3 1 3 2 2
 2 3 0 2 3 3 2 3 2 2 3 3 3 2 3 3 3 2 3 2 1 2 2 2 3 2 3 2 2 1 1 2 2 1 0 2 3
 0 3 1 2 3 3 3 1 0 3 3 2 0 3 1 0 3 2 2 3 1 3 1 3 2 2 1 2 3 2 3 3 0 0 3 3 2
 2 3 3 1 3 0 0 2 3 1 2 2 2 2 2 3 3 2 2 2 3 3 3 2 1 2 3 3 2 3 2 2 3 3 3 2 2
 3 2 0 2 1 3 3 3 1 3 3 3 2 3 1 2 3 2 2 2 0 1 3 2 1 2 3 0 3 3 3 2 2 1 3]


In [20]:
# Add a column to the service_ratings_predictions_df DataFrame that contains the customer_segment information
service_ratings_predictions_df['segment 4'] = cluster_segment_4

# Review the DataFrame
service_ratings_predictions_df.head()

Unnamed: 0,mobile_app_rating,personal_banker_rating,segment 3,segment 4
0,3.5,2.4,0,1
1,3.65,3.14,2,3
2,2.9,2.75,1,2
3,2.93,3.36,1,2
4,2.89,2.62,1,2


In [21]:
# Plot the data points based on the customer rating
service_ratings_predictions_df.hvplot.scatter(
    x='mobile_app_rating', 
    y='personal_banker_rating',
    by='segment 4'
)

## Answer the following question

**Question:** Can any additional information be gleaned from the customer segmentation data when clusters of 3 and 4 are applied?

**Answers:** Additional clusters help further distinguish groups and determine additional opportunities to differentiate the services they would like to receive