# Customer Segmentation - Top-Up

## Problem statement

###### “What are the identifiable customer segments based on their TopUp behavior for target marketing?”

## Hypothesis generation

Feature list considered for clustering
* Denomination
* Day of the week
* Time of the day
* Mode of the Recharge

## Data exploration / transformation

In [19]:
% pylab inline 

import pandas as pd
import numpy as np

from sklearn.cluster import KMeans

Populating the interactive namespace from numpy and matplotlib


In [20]:
# reading the sample data
df = pd.read_csv('TopUp.csv')

In [22]:
df_backup = df.copy()
df.head(n=2)

Unnamed: 0,ChargingPartyNumber,frequency,amt,0to50,51to100,100more,wkdayamnt,wkendamnt,OfcTime,ngtTime,trvlTime,card,recharge
0,135413311,18,991.0,199.0,792.0,0.0,942.0,49.0,792.0,49.0,150.0,0.0,991.0
1,120799811,23,700.0,500.0,200.0,0.0,450.0,250.0,400.0,0.0,300.0,50.0,650.0


In [23]:
df = df.query('amt <20000 and amt >0')
df_orig = df.copy()

In [24]:
df.loc[:,'0to50':] = df.loc[:,'0to50':].div(df["amt"],axis=0)
df.head(n=2)

Unnamed: 0,ChargingPartyNumber,frequency,amt,0to50,51to100,100more,wkdayamnt,wkendamnt,OfcTime,ngtTime,trvlTime,card,recharge
0,135413311,18,991.0,0.200807,0.799193,0.0,0.950555,0.049445,0.799193,0.049445,0.151362,0.0,1.0
1,120799811,23,700.0,0.714286,0.285714,0.0,0.642857,0.357143,0.571429,0.0,0.428571,0.071429,0.928571


## Clustering 

### Clusteres based on denomination

In [25]:
# Assign clustering
dfD = df[['0to50','51to100','100more']]
random.seed(123)
km = KMeans(n_clusters=5).fit(dfD)

# cluster representation
dfD.loc[:,'cluster'] =  km.labels_
clustergrp = pd.concat([dfD.groupby('cluster').mean().round(2), dfD.groupby('cluster')['cluster'].count()], axis=1)
print(clustergrp)

         0to50  51to100  100more  cluster
cluster                                  
0         0.15     0.82     0.03    85597
1         0.68     0.31     0.01    94390
2         0.09     0.31     0.59    14676
3         0.93     0.07     0.00   102663
4         0.42     0.56     0.02    92110


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[key] = _infer_fill_value(value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item_labels[indexer[info_axis]]] = value


In [26]:
df_orig.loc[:,'cluster_denominator'] =  km.labels_

### Clusteres based on week behavior

In [27]:
dfw = df[['wkdayamnt','wkendamnt']]

In [28]:
## Assign clusters
random.seed(123)
km = KMeans(n_clusters=4).fit(dfw)

# cluster representation
dfw.loc[:,'cluster'] =  km.labels_
clustergrp = pd.concat([dfw.groupby('cluster').mean().round(2), dfw.groupby('cluster')['cluster'].count()], axis=1)
print(clustergrp)

         wkdayamnt  wkendamnt  cluster
cluster                               
0             0.51       0.49    27827
1             0.78       0.22   164554
2             0.68       0.32   150148
3             0.94       0.06    46907


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[key] = _infer_fill_value(value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item_labels[indexer[info_axis]]] = value


In [29]:
df_orig.loc[:,'cluster_day_of_week'] =  km.labels_

## Clusteres based on time of the day

In [30]:
dfT = df[['OfcTime','ngtTime','trvlTime']]

In [31]:
# Assign clusters
random.seed(123)
km = KMeans(n_clusters=7).fit(dfT)
# cluster representation
dfT.loc[:,'cluster'] =  km.labels_
clustergrp = pd.concat([dfT.groupby('cluster').mean().round(2), dfT.groupby('cluster')['cluster'].count()], axis=1)
print(clustergrp)

         OfcTime  ngtTime  trvlTime  cluster
cluster                                     
0           0.44     0.08      0.48    91048
1           0.29     0.27      0.45    53485
2           0.48     0.25      0.27    65227
3           0.24     0.53      0.24    22044
4           0.61     0.07      0.31    89347
5           0.21     0.08      0.71    33752
6           0.81     0.05      0.14    34533


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[key] = _infer_fill_value(value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item_labels[indexer[info_axis]]] = value


In [32]:
df_orig.loc[:,'cluster_time_of_day'] =  km.labels_

## Clusteres based on the mode of topUp

In [33]:
dfM = df[['card','recharge']]

In [34]:
# Assign clusters
random.seed(123)
km = KMeans(n_clusters=4).fit(dfM)
# cluster representation
dfM.loc[:,'cluster'] =  km.labels_
clustergrp = pd.concat([dfM.groupby('cluster').mean().round(2), dfM.groupby('cluster')['cluster'].count()], axis=1)
print(clustergrp)

         card  recharge  cluster
cluster                         
0        0.02      0.98   258932
1        0.84      0.16    22499
2        0.21      0.79    71393
3        0.48      0.52    36612


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[key] = _infer_fill_value(value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item_labels[indexer[info_axis]]] = value


In [35]:
df_orig.loc[:,'cluster_mode_of_topup'] =  km.labels_

In [36]:
df_orig.round(0).to_csv('topup.csv',index=False)