# K-means Clustering

Clustering analysis for customer shopping data. The features of this dataset include customer ID, gender, age, annual income, spending score, profession, work experience, and family size. The k-means clustering algorithm will be used to group customers based on the relevant features.

### Import Libraries

In [1]:
import pandas as pd
import seaborn as sns

### Import Dataset(s)

In [2]:
df = pd.read_csv('../data/customers.csv')

In [3]:
df.head()

Unnamed: 0,CustomerID,Gender,Age,Annual Income ($),Spending Score (1-100),Profession,Work Experience,Family Size
0,1,Male,19,15000,39,Healthcare,1,4
1,2,Male,21,35000,81,Engineer,3,3
2,3,Female,20,86000,6,Engineer,1,1
3,4,Female,23,59000,77,Lawyer,0,2
4,5,Female,31,38000,40,Entertainment,2,6


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 8 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   CustomerID              2000 non-null   int64 
 1   Gender                  2000 non-null   object
 2   Age                     2000 non-null   int64 
 3   Annual Income ($)       2000 non-null   int64 
 4   Spending Score (1-100)  2000 non-null   int64 
 5   Profession              1965 non-null   object
 6   Work Experience         2000 non-null   int64 
 7   Family Size             2000 non-null   int64 
dtypes: int64(6), object(2)
memory usage: 125.1+ KB


In [5]:
df.nunique()

CustomerID                2000
Gender                       2
Age                        100
Annual Income ($)         1786
Spending Score (1-100)     101
Profession                   9
Work Experience             18
Family Size                  9
dtype: int64

### Data Preprocessing

In [6]:
df.drop('CustomerID', axis=1, inplace=True)

In [7]:
df = pd.get_dummies(data=df, columns=['Gender','Profession'], drop_first=True)

### Clustering

In [8]:
from sklearn.cluster import KMeans

In [9]:
num_clusters = 5

In [10]:
kmeans = KMeans(n_clusters=num_clusters)

In [11]:
kmeans.fit(df)



In [12]:
kmeans.cluster_centers_

array([[4.85120192e+01, 1.73022524e+05, 5.10985577e+01, 4.44230769e+00,
        3.80288462e+00, 4.06250000e-01, 9.61538462e-02, 1.03365385e-01,
        1.10576923e-01, 8.41346154e-02, 1.77884615e-01, 2.40384615e-02,
        7.69230769e-02, 3.36538462e-02],
       [5.07535354e+01, 1.00026022e+05, 5.13595960e+01, 4.18989899e+00,
        3.79191919e+00, 4.32323232e-01, 8.28282828e-02, 8.68686869e-02,
        1.31313131e-01, 6.66666667e-02, 1.77777778e-01, 3.43434343e-02,
        6.06060606e-02, 2.42424242e-02],
       [4.42016129e+01, 2.24677419e+04, 4.96935484e+01, 3.08870968e+00,
        2.83870968e+00, 4.11290323e-01, 1.04838710e-01, 1.29032258e-01,
        1.04838710e-01, 6.45161290e-02, 1.29032258e-01, 1.61290323e-02,
        8.06451613e-02, 6.45161290e-02],
       [4.97424893e+01, 1.37666912e+05, 5.23819742e+01, 4.28111588e+00,
        4.05793991e+00, 4.09871245e-01, 6.43776824e-02, 8.15450644e-02,
        1.09442060e-01, 8.15450644e-02, 1.69527897e-01, 3.00429185e-02,
        7.081

In [13]:
pd.DataFrame(data=kmeans.cluster_centers_, index=list(range(1,num_clusters+1)), columns=df.columns)

Unnamed: 0,Age,Annual Income ($),Spending Score (1-100),Work Experience,Family Size,Gender_Male,Profession_Doctor,Profession_Engineer,Profession_Entertainment,Profession_Executive,Profession_Healthcare,Profession_Homemaker,Profession_Lawyer,Profession_Marketing
1,48.512019,173022.524038,51.098558,4.442308,3.802885,0.40625,0.096154,0.103365,0.110577,0.084135,0.177885,0.024038,0.076923,0.033654
2,50.753535,100026.022222,51.359596,4.189899,3.791919,0.432323,0.082828,0.086869,0.131313,0.066667,0.177778,0.034343,0.060606,0.024242
3,44.201613,22467.741935,49.693548,3.08871,2.83871,0.41129,0.104839,0.129032,0.104839,0.064516,0.129032,0.016129,0.080645,0.064516
4,49.742489,137666.912017,52.381974,4.281116,4.05794,0.409871,0.064378,0.081545,0.109442,0.081545,0.169528,0.030043,0.070815,0.062232
5,48.006012,66201.625251,49.44489,3.817635,3.677355,0.378758,0.074148,0.078156,0.118236,0.078156,0.164329,0.034068,0.074148,0.044088
