# Segmenting Customers

You have been asked to review how the customer ratings data looks when modeled with 3 and 4 clusters.

Using the information contained in this notebook, apply the K-means algrothim to the service_ratings data using both 3 and 4 clusters to segment the customer information.

In [48]:

import pandas as pd
from pathlib import Path
import hvplot.pandas

# Import the K-means algorithm
from sklearn.cluster import KMeans

In [49]:
service_ratings_df = pd.read_csv(
    Path("../Resources/service_ratings.csv"))

service_ratings_df

Unnamed: 0,mobile_app_rating,personal_banker_rating
0,3.50,2.40
1,3.65,3.14
2,2.90,2.75
3,2.93,3.36
4,2.89,2.62
...,...,...
178,3.44,3.00
179,2.40,2.80
180,3.25,2.88
181,3.50,2.40


In [50]:
service_ratings_df.hvplot.scatter(
    x = "mobile_app_rating",
    y = "personal_banker_rating",
)

## Run the k-means model with 3 clusters

In [51]:
model = KMeans(n_clusters=3, random_state=1)

model

In [52]:
model.fit(service_ratings_df)



In [53]:
customer_segment_3 = model.predict(service_ratings_df) 

customer_segment_3

array([0, 1, 2, 2, 2, 2, 1, 1, 2, 2, 1, 0, 1, 1, 0, 1, 2, 1, 2, 2, 1, 1,
       2, 1, 2, 0, 2, 1, 2, 0, 1, 2, 1, 0, 1, 2, 2, 2, 1, 2, 2, 1, 1, 1,
       1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 0, 2, 1, 2, 1, 2, 1, 2, 2,
       0, 0, 2, 1, 0, 2, 2, 1, 2, 1, 0, 2, 1, 1, 1, 0, 2, 1, 1, 1, 2, 1,
       0, 2, 1, 2, 1, 1, 0, 1, 0, 1, 2, 1, 0, 2, 1, 2, 1, 1, 2, 2, 1, 1,
       2, 1, 1, 1, 0, 1, 2, 2, 2, 1, 0, 1, 2, 2, 2, 2, 1, 1, 2, 2, 2, 1,
       1, 1, 2, 0, 2, 1, 1, 1, 1, 2, 2, 1, 1, 1, 2, 0, 1, 2, 2, 2, 0, 1,
       1, 1, 0, 1, 1, 1, 2, 1, 0, 2, 1, 2, 2, 2, 2, 0, 1, 2, 0, 1, 1, 2,
       1, 1, 1, 2, 1, 0, 1], dtype=int32)

In [56]:
service_ratings_prediction_df = service_ratings_df.copy()

service_ratings_prediction_df['customer_segment_3'] = customer_segment_3

service_ratings_prediction_df

Unnamed: 0,mobile_app_rating,personal_banker_rating,customer_segment_3
0,3.50,2.40,0
1,3.65,3.14,1
2,2.90,2.75,2
3,2.93,3.36,2
4,2.89,2.62,2
...,...,...,...
178,3.44,3.00,1
179,2.40,2.80,2
180,3.25,2.88,1
181,3.50,2.40,0


In [59]:
service_ratings_prediction_df.hvplot.scatter(
    x="mobile_app_rating", 
    y="personal_banker_rating", 
    by="customer_segment_3"
)

## Run the k-means model with 4 clusters

In [63]:
model = KMeans(n_clusters=4, random_state=1)

model

In [64]:
model.fit(service_ratings_df)



In [66]:
customer_segment_4 = model.predict(service_ratings_df)

customer_segment_4

array([3, 1, 0, 0, 0, 0, 0, 1, 0, 2, 0, 0, 1, 0, 3, 1, 0, 1, 0, 0, 1, 1,
       0, 1, 0, 3, 2, 1, 2, 3, 1, 0, 1, 3, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0,
       1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 3, 0, 0, 0, 1, 0, 0, 0, 0,
       3, 3, 0, 0, 3, 2, 0, 1, 2, 1, 3, 0, 1, 1, 1, 3, 2, 1, 1, 0, 2, 1,
       3, 2, 1, 0, 0, 1, 3, 1, 3, 1, 0, 0, 3, 0, 1, 0, 1, 1, 2, 2, 1, 1,
       0, 0, 1, 1, 3, 1, 2, 2, 0, 1, 3, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1,
       1, 1, 0, 3, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 2, 0, 3, 1,
       1, 1, 3, 1, 1, 1, 0, 1, 3, 0, 1, 0, 0, 0, 2, 3, 1, 0, 3, 0, 0, 2,
       1, 3, 1, 0, 0, 3, 1], dtype=int32)

In [68]:
service_ratings_prediction_df['customer_segment_4'] = customer_segment_4

service_ratings_prediction_df

Unnamed: 0,mobile_app_rating,personal_banker_rating,customer_segment_3,customer_segment,customer_segment_4
0,3.50,2.40,0,3,3
1,3.65,3.14,1,1,1
2,2.90,2.75,2,0,0
3,2.93,3.36,2,0,0
4,2.89,2.62,2,0,0
...,...,...,...,...,...
178,3.44,3.00,1,1,1
179,2.40,2.80,2,0,0
180,3.25,2.88,1,0,0
181,3.50,2.40,0,3,3


In [69]:
service_ratings_prediction_df.hvplot.scatter(
    x="mobile_app_rating", 
    y="personal_banker_rating", 
    by="customer_segment_4"
)

## Answer the following question

**Question:** Can any additional information be gleaned from the customer segmentation data when clusters of 3 and 4 are applied?

**Answers:** # YOUR ANSWER HERE 