# Exploring Customer Segmentation

In this activity, you are tasked with profiling customer groups for a large telecommunications company.  The data provided contains information on customers purchasing and useage behavior with the telecom products.  Your goal is to use PCA and clustering to segment these customers into meaningful groups, and report back your findings.  

Because these results need to be interpretable, it is important to keep the number of clusters reasonable.  Think about how you might represent some of the non-numeric features so that they can be included in your segmentation models.  You are to report back your approach and findings to the class.  Be specific about what features were used and how you interpret the resulting clusters.

## Imports

In [168]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import plotly.express as px
import sklearn.cluster as cluster
from sklearn.decomposition import PCA

In [169]:
pd.set_option("display.max_columns", None)

## Data Load and Initial Display

In [170]:
df_in = pd.read_csv("./data/telco_churn_data.csv")

In [171]:
df_in.head()

Unnamed: 0,Customer ID,Referred a Friend,Number of Referrals,Tenure in Months,Offer,Phone Service,Avg Monthly Long Distance Charges,Multiple Lines,Internet Service,Internet Type,Avg Monthly GB Download,Online Security,Online Backup,Device Protection Plan,Premium Tech Support,Streaming TV,Streaming Movies,Streaming Music,Unlimited Data,Contract,Paperless Billing,Payment Method,Monthly Charge,Total Regular Charges,Total Refunds,Total Extra Data Charges,Total Long Distance Charges,Gender,Age,Under 30,Senior Citizen,Married,Dependents,Number of Dependents,City,Zip Code,Latitude,Longitude,Population,Churn Value,CLTV,Churn Category,Churn Reason,Total Customer Svc Requests,Product/Service Issues Reported,Customer Satisfaction
0,8779-QRDMV,No,0,1,,No,0.0,No,Yes,Fiber Optic,9,No,No,Yes,No,No,Yes,No,No,Month-to-Month,Yes,Bank Withdrawal,41.236,39.65,0.0,0.0,0.0,Male,78,No,Yes,No,No,0,Los Angeles,90022,34.02381,-118.156582,68701,1,5433,Competitor,Competitor offered more data,5,0,
1,7495-OOKFY,Yes,1,8,Offer E,Yes,48.85,Yes,Yes,Cable,19,No,Yes,No,No,No,No,No,No,Month-to-Month,Yes,Credit Card,83.876,633.3,0.0,120.0,390.8,Female,74,No,Yes,Yes,Yes,1,Los Angeles,90063,34.044271,-118.185237,55668,1,5302,Competitor,Competitor made better offer,5,0,
2,1658-BYGOY,No,0,18,Offer D,Yes,11.33,Yes,Yes,Fiber Optic,57,No,No,No,No,Yes,Yes,Yes,Yes,Month-to-Month,Yes,Bank Withdrawal,99.268,1752.55,45.61,0.0,203.94,Male,71,No,Yes,No,Yes,3,Los Angeles,90065,34.108833,-118.229715,47534,1,3179,Competitor,Competitor made better offer,1,0,
3,4598-XLKNJ,Yes,1,25,Offer C,Yes,19.76,No,Yes,Fiber Optic,13,No,Yes,Yes,No,Yes,Yes,No,No,Month-to-Month,Yes,Bank Withdrawal,102.44,2514.5,13.43,327.0,494.0,Female,78,No,Yes,Yes,Yes,1,Inglewood,90303,33.936291,-118.332639,27778,1,5337,Dissatisfaction,Limited range of services,1,1,2.0
4,4846-WHAFZ,Yes,1,37,Offer C,Yes,6.33,Yes,Yes,Cable,15,No,No,No,No,No,No,No,No,Month-to-Month,Yes,Bank Withdrawal,79.56,2868.15,0.0,430.0,234.21,Female,80,No,Yes,Yes,Yes,1,Whittier,90602,33.972119,-118.020188,26265,1,2793,Price,Extra data charges,1,0,2.0


In [172]:
df_in.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 46 columns):
 #   Column                             Non-Null Count  Dtype  
---  ------                             --------------  -----  
 0   Customer ID                        7043 non-null   object 
 1   Referred a Friend                  7043 non-null   object 
 2   Number of Referrals                7043 non-null   int64  
 3   Tenure in Months                   7043 non-null   int64  
 4   Offer                              3166 non-null   object 
 5   Phone Service                      7043 non-null   object 
 6   Avg Monthly Long Distance Charges  7043 non-null   float64
 7   Multiple Lines                     7043 non-null   object 
 8   Internet Service                   7043 non-null   object 
 9   Internet Type                      5517 non-null   object 
 10  Avg Monthly GB Download            7043 non-null   int64  
 11  Online Security                    7043 non-null   objec

In [173]:
df_in.describe()

Unnamed: 0,Number of Referrals,Tenure in Months,Avg Monthly Long Distance Charges,Avg Monthly GB Download,Monthly Charge,Total Regular Charges,Total Refunds,Total Extra Data Charges,Total Long Distance Charges,Age,Number of Dependents,Zip Code,Latitude,Longitude,Population,Churn Value,CLTV,Total Customer Svc Requests,Product/Service Issues Reported,Customer Satisfaction
count,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,1834.0
mean,1.951867,32.386767,22.958954,21.11089,65.5388,2280.381264,1.962182,278.499225,749.099262,46.509726,0.468692,93486.070567,36.197455,-119.756684,22139.603294,0.26537,4400.295755,1.338776,0.308107,3.005453
std,3.001199,24.542061,15.448113,20.948471,30.606805,2266.220462,7.902614,685.039625,846.660055,16.750352,0.962802,1856.767505,2.468929,2.154425,21152.392837,0.441561,1183.057152,1.430471,0.717514,1.256938
min,0.0,1.0,0.0,0.0,18.25,18.8,0.0,0.0,0.0,19.0,0.0,90001.0,32.555828,-124.301372,11.0,0.0,2003.0,0.0,0.0,1.0
25%,0.0,9.0,9.21,3.0,35.89,400.15,0.0,0.0,70.545,32.0,0.0,92101.0,33.990646,-121.78809,2344.0,0.0,3469.0,0.0,0.0,2.0
50%,0.0,29.0,22.89,17.0,71.968,1394.55,0.0,0.0,401.44,46.0,0.0,93518.0,36.205465,-119.595293,17554.0,0.0,4527.0,1.0,0.0,3.0
75%,3.0,55.0,36.395,28.0,90.65,3786.6,0.0,182.62,1191.1,60.0,0.0,95329.0,38.161321,-117.969795,36125.0,1.0,5380.5,2.0,0.0,4.0
max,11.0,72.0,49.99,94.0,123.084,8684.8,49.79,6477.0,3564.72,80.0,9.0,96150.0,41.962127,-114.192901,105285.0,1.0,6500.0,9.0,6.0,5.0


## Cleanup

### Drop Mostly Null and Redundant Columns

In [174]:
mostly_null_columns = df_in.loc[
    :, df_in.isnull().sum() / df_in.isnull().count() * 100.0 > 10.0
].columns.to_list()
redundant_columns = ["Under 30", "Senior Citizen", "Dependents", "Zip Code"]
df = df_in.drop(columns=mostly_null_columns + redundant_columns)
assert np.all(df.isnull().sum() == 0), "Some Nulls Remain"
# df.info()
df.head()

Unnamed: 0,Customer ID,Referred a Friend,Number of Referrals,Tenure in Months,Phone Service,Avg Monthly Long Distance Charges,Multiple Lines,Internet Service,Avg Monthly GB Download,Online Security,Online Backup,Device Protection Plan,Premium Tech Support,Streaming TV,Streaming Movies,Streaming Music,Unlimited Data,Contract,Paperless Billing,Payment Method,Monthly Charge,Total Regular Charges,Total Refunds,Total Extra Data Charges,Total Long Distance Charges,Gender,Age,Married,Number of Dependents,City,Latitude,Longitude,Population,Churn Value,CLTV,Total Customer Svc Requests,Product/Service Issues Reported
0,8779-QRDMV,No,0,1,No,0.0,No,Yes,9,No,No,Yes,No,No,Yes,No,No,Month-to-Month,Yes,Bank Withdrawal,41.236,39.65,0.0,0.0,0.0,Male,78,No,0,Los Angeles,34.02381,-118.156582,68701,1,5433,5,0
1,7495-OOKFY,Yes,1,8,Yes,48.85,Yes,Yes,19,No,Yes,No,No,No,No,No,No,Month-to-Month,Yes,Credit Card,83.876,633.3,0.0,120.0,390.8,Female,74,Yes,1,Los Angeles,34.044271,-118.185237,55668,1,5302,5,0
2,1658-BYGOY,No,0,18,Yes,11.33,Yes,Yes,57,No,No,No,No,Yes,Yes,Yes,Yes,Month-to-Month,Yes,Bank Withdrawal,99.268,1752.55,45.61,0.0,203.94,Male,71,No,3,Los Angeles,34.108833,-118.229715,47534,1,3179,1,0
3,4598-XLKNJ,Yes,1,25,Yes,19.76,No,Yes,13,No,Yes,Yes,No,Yes,Yes,No,No,Month-to-Month,Yes,Bank Withdrawal,102.44,2514.5,13.43,327.0,494.0,Female,78,Yes,1,Inglewood,33.936291,-118.332639,27778,1,5337,1,1
4,4846-WHAFZ,Yes,1,37,Yes,6.33,Yes,Yes,15,No,No,No,No,No,No,No,No,Month-to-Month,Yes,Bank Withdrawal,79.56,2868.15,0.0,430.0,234.21,Female,80,Yes,1,Whittier,33.972119,-118.020188,26265,1,2793,1,0


### Convert Logical Strings to Logical Values

In [175]:
def to_numeric_bool(series: pd.Series) -> pd.Series:
    if series.isin(["No", "Yes"]).all():
        return (series == "Yes").astype("int8")
    else:
        return series

In [176]:
df = df.apply(to_numeric_bool)

In [177]:
# df.info()
df.head()

Unnamed: 0,Customer ID,Referred a Friend,Number of Referrals,Tenure in Months,Phone Service,Avg Monthly Long Distance Charges,Multiple Lines,Internet Service,Avg Monthly GB Download,Online Security,Online Backup,Device Protection Plan,Premium Tech Support,Streaming TV,Streaming Movies,Streaming Music,Unlimited Data,Contract,Paperless Billing,Payment Method,Monthly Charge,Total Regular Charges,Total Refunds,Total Extra Data Charges,Total Long Distance Charges,Gender,Age,Married,Number of Dependents,City,Latitude,Longitude,Population,Churn Value,CLTV,Total Customer Svc Requests,Product/Service Issues Reported
0,8779-QRDMV,0,0,1,0,0.0,0,1,9,0,0,1,0,0,1,0,0,Month-to-Month,1,Bank Withdrawal,41.236,39.65,0.0,0.0,0.0,Male,78,0,0,Los Angeles,34.02381,-118.156582,68701,1,5433,5,0
1,7495-OOKFY,1,1,8,1,48.85,1,1,19,0,1,0,0,0,0,0,0,Month-to-Month,1,Credit Card,83.876,633.3,0.0,120.0,390.8,Female,74,1,1,Los Angeles,34.044271,-118.185237,55668,1,5302,5,0
2,1658-BYGOY,0,0,18,1,11.33,1,1,57,0,0,0,0,1,1,1,1,Month-to-Month,1,Bank Withdrawal,99.268,1752.55,45.61,0.0,203.94,Male,71,0,3,Los Angeles,34.108833,-118.229715,47534,1,3179,1,0
3,4598-XLKNJ,1,1,25,1,19.76,0,1,13,0,1,1,0,1,1,0,0,Month-to-Month,1,Bank Withdrawal,102.44,2514.5,13.43,327.0,494.0,Female,78,1,1,Inglewood,33.936291,-118.332639,27778,1,5337,1,1
4,4846-WHAFZ,1,1,37,1,6.33,1,1,15,0,0,0,0,0,0,0,0,Month-to-Month,1,Bank Withdrawal,79.56,2868.15,0.0,430.0,234.21,Female,80,1,1,Whittier,33.972119,-118.020188,26265,1,2793,1,0


In [178]:
df.describe()

Unnamed: 0,Referred a Friend,Number of Referrals,Tenure in Months,Phone Service,Avg Monthly Long Distance Charges,Multiple Lines,Internet Service,Avg Monthly GB Download,Online Security,Online Backup,Device Protection Plan,Premium Tech Support,Streaming TV,Streaming Movies,Streaming Music,Unlimited Data,Paperless Billing,Monthly Charge,Total Regular Charges,Total Refunds,Total Extra Data Charges,Total Long Distance Charges,Age,Married,Number of Dependents,Latitude,Longitude,Population,Churn Value,CLTV,Total Customer Svc Requests,Product/Service Issues Reported
count,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0
mean,0.457476,1.951867,32.386767,0.903166,22.958954,0.421837,0.783331,21.11089,0.286668,0.344881,0.343888,0.290217,0.384353,0.387903,0.353259,0.383927,0.592219,65.5388,2280.381264,1.962182,278.499225,749.099262,46.509726,0.483033,0.468692,36.197455,-119.756684,22139.603294,0.26537,4400.295755,1.338776,0.308107
std,0.498224,3.001199,24.542061,0.295752,15.448113,0.493888,0.412004,20.948471,0.452237,0.475363,0.475038,0.453895,0.486477,0.487307,0.478016,0.486375,0.491457,30.606805,2266.220462,7.902614,685.039625,846.660055,16.750352,0.499748,0.962802,2.468929,2.154425,21152.392837,0.441561,1183.057152,1.430471,0.717514
min,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,18.25,18.8,0.0,0.0,0.0,19.0,0.0,0.0,32.555828,-124.301372,11.0,0.0,2003.0,0.0,0.0
25%,0.0,0.0,9.0,1.0,9.21,0.0,1.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,35.89,400.15,0.0,0.0,70.545,32.0,0.0,0.0,33.990646,-121.78809,2344.0,0.0,3469.0,0.0,0.0
50%,0.0,0.0,29.0,1.0,22.89,0.0,1.0,17.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,71.968,1394.55,0.0,0.0,401.44,46.0,0.0,0.0,36.205465,-119.595293,17554.0,0.0,4527.0,1.0,0.0
75%,1.0,3.0,55.0,1.0,36.395,1.0,1.0,28.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,90.65,3786.6,0.0,182.62,1191.1,60.0,1.0,0.0,38.161321,-117.969795,36125.0,1.0,5380.5,2.0,0.0
max,1.0,11.0,72.0,1.0,49.99,1.0,1.0,94.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,123.084,8684.8,49.79,6477.0,3564.72,80.0,1.0,9.0,41.962127,-114.192901,105285.0,1.0,6500.0,9.0,6.0


## PCA

### Select Numeric Columns

In [179]:
object_cols = df.columns[df.dtypes == "object"].to_list()
df_numeric = df.drop(columns=object_cols)
df_numeric.head()
df_numeric.describe()

Unnamed: 0,Referred a Friend,Number of Referrals,Tenure in Months,Phone Service,Avg Monthly Long Distance Charges,Multiple Lines,Internet Service,Avg Monthly GB Download,Online Security,Online Backup,Device Protection Plan,Premium Tech Support,Streaming TV,Streaming Movies,Streaming Music,Unlimited Data,Paperless Billing,Monthly Charge,Total Regular Charges,Total Refunds,Total Extra Data Charges,Total Long Distance Charges,Age,Married,Number of Dependents,Latitude,Longitude,Population,Churn Value,CLTV,Total Customer Svc Requests,Product/Service Issues Reported
count,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0
mean,0.457476,1.951867,32.386767,0.903166,22.958954,0.421837,0.783331,21.11089,0.286668,0.344881,0.343888,0.290217,0.384353,0.387903,0.353259,0.383927,0.592219,65.5388,2280.381264,1.962182,278.499225,749.099262,46.509726,0.483033,0.468692,36.197455,-119.756684,22139.603294,0.26537,4400.295755,1.338776,0.308107
std,0.498224,3.001199,24.542061,0.295752,15.448113,0.493888,0.412004,20.948471,0.452237,0.475363,0.475038,0.453895,0.486477,0.487307,0.478016,0.486375,0.491457,30.606805,2266.220462,7.902614,685.039625,846.660055,16.750352,0.499748,0.962802,2.468929,2.154425,21152.392837,0.441561,1183.057152,1.430471,0.717514
min,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,18.25,18.8,0.0,0.0,0.0,19.0,0.0,0.0,32.555828,-124.301372,11.0,0.0,2003.0,0.0,0.0
25%,0.0,0.0,9.0,1.0,9.21,0.0,1.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,35.89,400.15,0.0,0.0,70.545,32.0,0.0,0.0,33.990646,-121.78809,2344.0,0.0,3469.0,0.0,0.0
50%,0.0,0.0,29.0,1.0,22.89,0.0,1.0,17.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,71.968,1394.55,0.0,0.0,401.44,46.0,0.0,0.0,36.205465,-119.595293,17554.0,0.0,4527.0,1.0,0.0
75%,1.0,3.0,55.0,1.0,36.395,1.0,1.0,28.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,90.65,3786.6,0.0,182.62,1191.1,60.0,1.0,0.0,38.161321,-117.969795,36125.0,1.0,5380.5,2.0,0.0
max,1.0,11.0,72.0,1.0,49.99,1.0,1.0,94.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,123.084,8684.8,49.79,6477.0,3564.72,80.0,1.0,9.0,41.962127,-114.192901,105285.0,1.0,6500.0,9.0,6.0


### Scale

In [180]:
df_scaled = (df_numeric - df_numeric.mean()) / df_numeric.std()
df_scaled.head()
# df_scaled.describe()

Unnamed: 0,Referred a Friend,Number of Referrals,Tenure in Months,Phone Service,Avg Monthly Long Distance Charges,Multiple Lines,Internet Service,Avg Monthly GB Download,Online Security,Online Backup,Device Protection Plan,Premium Tech Support,Streaming TV,Streaming Movies,Streaming Music,Unlimited Data,Paperless Billing,Monthly Charge,Total Regular Charges,Total Refunds,Total Extra Data Charges,Total Long Distance Charges,Age,Married,Number of Dependents,Latitude,Longitude,Population,Churn Value,CLTV,Total Customer Svc Requests,Product/Service Issues Reported
0,-0.918213,-0.650362,-1.278897,-3.053794,-1.486198,-0.854116,0.52589,-0.578128,-0.633888,-0.725511,1.381179,-0.639393,-0.790076,1.256082,-0.73901,-0.789365,0.829739,-0.794033,-0.988753,-0.248295,-0.406545,-0.88477,1.879977,-0.966554,-0.4868,-0.8804,0.742705,2.201235,1.66371,0.872912,2.559453,-0.42941
1,1.088917,-0.317162,-0.993672,0.327415,1.676001,1.170636,0.52589,-0.100766,-0.633888,1.378143,-0.723916,-0.639393,-0.790076,-0.796014,-0.73901,-0.789365,0.829739,0.599122,-0.726797,-0.248295,-0.231372,-0.423191,1.641176,1.034457,0.551835,-0.872113,0.729404,1.585088,1.66371,0.762181,2.559453,-0.42941
2,-0.918213,-0.650362,-0.586209,0.327415,-0.752775,1.170636,0.52589,1.713209,-0.633888,-0.725511,-0.723916,-0.639393,1.265522,1.256082,1.35297,1.266662,0.829739,1.102016,-0.232913,5.523212,-0.406545,-0.643894,1.462075,-0.966554,2.629105,-0.845963,0.708759,1.200545,1.66371,-1.032322,-0.236828,-0.42941
3,1.088917,-0.317162,-0.300984,0.327415,-0.207077,-0.854116,0.52589,-0.387183,-0.633888,1.378143,1.381179,-0.639393,1.265522,1.256082,-0.73901,-0.789365,0.829739,1.205653,0.103308,1.451142,0.0708,-0.301301,1.879977,1.034457,0.551835,-0.915848,0.660986,0.266561,1.66371,0.791766,-0.236828,0.964292
4,1.088917,-0.317162,0.187973,0.327415,-1.076439,1.170636,0.52589,-0.291711,-0.633888,-0.725511,-0.723916,-0.639393,-0.790076,-0.796014,-0.73901,-0.789365,0.829739,0.458107,0.259361,-0.248295,0.221156,-0.608142,1.999377,1.034457,0.551835,-0.901337,0.806013,0.195032,1.66371,-1.358595,-0.236828,-0.42941


### Fit

In [181]:
pca = PCA(n_components=3)
X = pca.fit_transform(df_scaled)
X.shape

(7043, 3)

### DataFrame of Fit

In [182]:
df_pca = pd.DataFrame(
    X, columns=["Component" + str(k + 1) for k in range(pca.n_components_)]
)
df_pca.head()

Unnamed: 0,Component1,Component2,Component3
0,-2.015967,-2.990827,0.510086
1,-0.538588,-0.825153,-2.791033
2,0.779219,-2.669588,-0.421893
3,1.401822,-1.000219,-1.043681
4,-0.653874,-0.143944,-1.216665


## Clustering with KMeans

### Cluster

In [188]:
kmeans = cluster.KMeans(
    n_clusters=5, random_state=123, init="k-means++", verbose=True
).fit(X)

Initialization complete
Iteration 0, inertia 32912.32048041821.
Iteration 1, inertia 27728.43082241868.


Iteration 2, inertia 26985.51472335834.
Iteration 3, inertia 26778.520249164205.
Iteration 4, inertia 26712.973901267433.
Iteration 5, inertia 26649.32288945416.
Iteration 6, inertia 26581.23675635492.
Iteration 7, inertia 26516.34061081727.
Iteration 8, inertia 26460.086853130717.
Iteration 9, inertia 26418.033040755075.
Iteration 10, inertia 26379.239366805556.
Iteration 11, inertia 26342.91427260285.
Iteration 12, inertia 26301.001503312942.
Iteration 13, inertia 26265.180064510976.
Iteration 14, inertia 26230.87417099029.
Iteration 15, inertia 26206.96173080649.
Iteration 16, inertia 26191.057954362674.
Iteration 17, inertia 26179.46560775416.
Iteration 18, inertia 26169.154090297372.
Iteration 19, inertia 26160.203662509364.
Iteration 20, inertia 26150.083097006493.
Iteration 21, inertia 26138.483286835108.
Iteration 22, inertia 26128.336753981574.
Iteration 23, inertia 26121.421598416197.
Iteration 24, inertia 26117.469081101113.
Iteration 25, inertia 26114.39895583011.
Iteration

#### Add KMeans Labels to DataFrame

In [184]:
df_pca_labeled = df_pca.copy(deep=True)
df_pca_labeled["KMeans Label"] = kmeans.labels_
df_pca_labeled.head()

Unnamed: 0,Component1,Component2,Component3,KMeans Label
0,-2.015967,-2.990827,0.510086,4
1,-0.538588,-0.825153,-2.791033,0
2,0.779219,-2.669588,-0.421893,0
3,1.401822,-1.000219,-1.043681,0
4,-0.653874,-0.143944,-1.216665,4


#### Scatter Plot with Color from KMeans Labels

In [185]:
fig = px.scatter_3d(
    data_frame=df_pca_labeled,
    x="Component1",
    y="Component2",
    z="Component3",
    color="KMeans Label",
)

In [186]:
fig.update_layout(autosize=False, width=1200, height=800)
fig.show()

## Clustering with DBSCAN