# Segmenting Customers

You have been asked to review how the customer ratings data looks when modeled with 3 and 4 clusters.

Using the information contained in this notebook, apply the K-means algrothim to the service_ratings data using both 3 and 4 clusters to segment the customer information.

In [15]:
# Import the modules
import pandas as pd
from pathlib import Path
import hvplot.pandas

# Import the K-means algorithm
from sklearn.cluster import KMeans

In [16]:
# Read in the CSV file as a Pandas DataFrame
service_ratings_df = pd.read_csv(
    Path("../Resources/service_ratings.csv")
)

# Review the DataFrame
service_ratings_df.head()

Unnamed: 0,mobile_app_rating,personal_banker_rating
0,3.5,2.4
1,3.65,3.14
2,2.9,2.75
3,2.93,3.36
4,2.89,2.62


In [17]:
# Visualize a scatter plot of the data
service_ratings_df.hvplot.scatter(
    x="mobile_app_rating", 
    y="personal_banker_rating"
)

## Run the k-means model with 3 clusters

In [18]:
# Create and initialize the K-means model instance for 3 clusters
# Set the random_state variable to 1
model = KMeans(n_clusters=3, random_state=1)
# Print the model
model

KMeans(n_clusters=3, random_state=1)

In [19]:
# Fit the data to the instance of the model
model.fit(service_ratings_df)

KMeans(n_clusters=3, random_state=1)

In [22]:
# Make predictions about the data clusters using the trained model
customer_segment3 = model.predict(service_ratings_df)

# Print the predictions
print(customer_segment3)

[0 1 2 2 2 2 1 1 2 2 1 0 1 1 0 1 2 1 2 2 1 1 2 1 2 0 2 1 2 0 1 2 1 0 1 2 2
 2 1 2 2 1 1 1 1 2 2 1 1 1 2 1 1 1 1 1 2 0 2 1 2 1 2 1 2 2 0 0 2 1 0 2 2 1
 2 1 0 2 1 1 1 0 2 1 1 1 2 1 0 2 1 2 1 1 0 1 0 1 2 1 0 2 1 2 1 1 2 2 1 1 2
 1 1 1 0 1 2 2 2 1 0 1 2 2 2 2 1 1 2 2 2 1 1 1 2 0 2 1 1 1 1 2 2 1 1 1 2 0
 1 2 2 2 0 1 1 1 0 1 1 1 2 1 0 2 1 2 2 2 2 0 1 2 0 1 1 2 1 1 1 2 1 0 1]


In [24]:
# Create a copy of the DataFrame and name it as service_ratings_predictions_df
service_ratings_predictions_df = service_ratings_df.copy()

# Add a column to the DataFrame that contains the customer_segment information
service_ratings_predictions_df['customer_segment3'] = customer_segment3

# Review the DataFrame
service_ratings_predictions_df

Unnamed: 0,mobile_app_rating,personal_banker_rating,customer_segment3
0,3.50,2.40,0
1,3.65,3.14,1
2,2.90,2.75,2
3,2.93,3.36,2
4,2.89,2.62,2
...,...,...,...
178,3.44,3.00,1
179,2.40,2.80,2
180,3.25,2.88,1
181,3.50,2.40,0


In [25]:
service_ratings_predictions_df.columns

Index(['mobile_app_rating', 'personal_banker_rating', 'customer_segment3'], dtype='object')

In [27]:
# Plot the data points based on the customer rating
service_ratings_predictions_df.hvplot.scatter(
    x='mobile_app_rating',
    y='personal_banker_rating',
    by='customer_segment3'

)

## Run the k-means model with 4 clusters

In [28]:
# Create and initialize the K-means model instance for 4 clusters
model = KMeans(n_clusters=4, random_state=1)

# Print the model
model

KMeans(n_clusters=4, random_state=1)

In [29]:
# Fit the data to the instance of the model
model.fit(service_ratings_df)

KMeans(n_clusters=4, random_state=1)

In [30]:
# Make predictions about the data clusters using the trained model
customer_segment4 = model.predict(service_ratings_df)

# Print the predictions
print(customer_segment4)

[3 1 0 0 0 0 0 1 0 2 0 0 1 0 3 1 0 1 0 0 1 1 0 1 0 3 2 1 2 3 1 0 1 3 1 0 0
 0 1 0 0 1 1 0 1 0 0 1 1 1 0 1 1 1 0 1 0 3 0 0 0 1 0 0 0 0 3 3 0 0 3 2 0 1
 2 1 3 0 1 1 1 3 2 1 1 0 2 1 3 2 1 0 0 1 3 1 3 1 0 0 3 0 1 0 1 1 2 2 1 1 0
 0 1 1 3 1 2 2 0 1 3 0 0 0 0 0 0 1 0 0 0 1 1 1 0 3 0 1 1 0 1 0 0 1 1 0 0 0
 1 0 2 0 3 1 1 1 3 1 1 1 0 1 3 0 1 0 0 0 2 3 1 0 3 0 0 2 1 3 1 0 0 3 1]


In [31]:
# Add a column to the service_ratings_predictions_df DataFrame that contains the customer_segment information
service_ratings_predictions_df['customer_segment4'] = customer_segment4

# Review the DataFrame
service_ratings_predictions_df

Unnamed: 0,mobile_app_rating,personal_banker_rating,customer_segment3,customer_segment4
0,3.50,2.40,0,3
1,3.65,3.14,1,1
2,2.90,2.75,2,0
3,2.93,3.36,2,0
4,2.89,2.62,2,0
...,...,...,...,...
178,3.44,3.00,1,1
179,2.40,2.80,2,0
180,3.25,2.88,1,0
181,3.50,2.40,0,3


In [32]:
service_ratings_predictions_df.columns

Index(['mobile_app_rating', 'personal_banker_rating', 'customer_segment3',
       'customer_segment4'],
      dtype='object')

In [35]:
# Plot the data points based on the customer rating
service_ratings_predictions_df.hvplot.scatter(
    x='mobile_app_rating',
    y='personal_banker_rating',
    by='customer_segment3'
    

)

In [36]:
service_ratings_predictions_df.hvplot.scatter(
    x='mobile_app_rating',
    y='personal_banker_rating',
    by='customer_segment4'
    

)

## Answer the following question

**Question:** Can any additional information be gleaned from the customer segmentation data when clusters of 3 and 4 are applied?

**Answers:** really shows the top left quadrant as people that really like personal banker instead of mobile apps

In [47]:
service_ratings_predictions_df.groupby('customer_segment4').describe()

Unnamed: 0_level_0,mobile_app_rating,mobile_app_rating,mobile_app_rating,mobile_app_rating,mobile_app_rating,mobile_app_rating,mobile_app_rating,mobile_app_rating,personal_banker_rating,personal_banker_rating,personal_banker_rating,personal_banker_rating,personal_banker_rating,customer_segment3,customer_segment3,customer_segment3,customer_segment3,customer_segment3,customer_segment3,customer_segment3,customer_segment3
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,...,75%,max,count,mean,std,min,25%,50%,75%,max
customer_segment4,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
0,76.0,2.952368,0.288078,2.0,2.83,3.0,3.2025,3.38,76.0,2.813026,...,3.0,3.59,76.0,1.697368,0.516907,0.0,1.0,2.0,2.0,2.0
1,67.0,3.757761,0.353488,3.34,3.495,3.67,3.915,4.67,67.0,2.950448,...,3.09,3.8,67.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0
2,15.0,2.106667,0.615347,0.88,1.875,2.17,2.58,2.89,15.0,3.694,...,4.03,4.83,15.0,2.0,0.0,2.0,2.0,2.0,2.0,2.0
3,25.0,3.6796,0.312976,3.19,3.46,3.67,3.96,4.27,25.0,1.944,...,2.33,2.45,25.0,0.04,0.2,0.0,0.0,0.0,0.0,1.0
