### Self-study Colab Activity 6.2: Interpreting the Results of K-Means and PCA




In this activity, you are tasked with profiling customer groups for a large telecommunications company.  The data provided contains information on customers' purchasing and usage behavior with telecom products.  Your goal is to use PCA and clustering to segment these customers into meaningful groups, and report back your findings.  

Because these results need to be interpretable, it is important to keep the number of clusters reasonable.  Think about how you might represent some of the non-numeric features so that they can be included in your segmentation models.  You are to report back your approach and findings to the class.  Be specific about what features were used and how you interpret the resulting clusters.

In [None]:

import numpy as np
import pandas as pd
import plotly.express as px
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA


In [None]:
df = pd.read_csv('module 6/colab_activity6_2_starter/data/telco_churn_data.csv')

In [None]:
df.head()

In [None]:
df.info()

In [None]:
df.describe()

In [None]:
df_numeric = df.select_dtypes('number')
df_numeric.info()

In [None]:
df_clean = df_numeric.dropna()
df_clean.info()

In [None]:
pca = PCA().fit(df_clean)

In [None]:
pca_transformed = PCA().fit_transform(df_clean)


In [None]:
explained_variance = pca.explained_variance_ratio_
cumulative_variance = np.cumsum(explained_variance)

In [None]:
fig = px.line(explained_variance, labels={'index': 'Principal Component', 'value': 'Explained Variance Ratio'},
              title='Scree Plot', markers=True)
fig.update_layout(showlegend=False)
fig.show()
fig.write_image('module 6/colab_activity6_2_starter/images/pca.png')

In [None]:
kmeans = KMeans(n_clusters=3).fit(df_clean)

In [None]:
kmeans.labels_

In [None]:
df_clean['cluster'] = kmeans.labels_

In [None]:
px.scatter(df_clean, x='Total Regular Charges', y='Total Extra Data Charges', color='cluster', )

In [None]:
df_pca = pd.DataFrame(pca_transformed)

In [None]:
df_pca['cluster'] = kmeans.labels_


In [None]:
fig = px.scatter(df_pca, x=0, y=1, color='cluster', title='PCA Clustering - component 1 vs. component 2')
fig.show()
fig.write_image('module 6/colab_activity6_2_starter/images/component1_vs_component2.png')