<H2>KEY METRICES ABOUT MARKETING AND SALES </H2>

Marketing and sales are very important drivers for any company. How do we as marketers understand how our product is doing? In this notebook we will tackle few key metrices that will help you guage the performance of your product.

1. Monthly Revenue
2. Monthly Revenue Growth Rate
3. Monthly active customers
4. Monthly active orders
5. Average Revenue per Order
6. New Customer Ratio
7. Monthly Retention Rate

<h2> Importing relevant packages and libraries </h2>

In [None]:

#import libraries
from __future__ import division

from datetime import datetime, timedelta,date
import pandas as pd
%matplotlib inline
from sklearn.metrics import classification_report,confusion_matrix
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from sklearn.cluster import KMeans


import plotly as py
import plotly.offline as pyoff
import plotly.graph_objs as go

import xgboost as xgb
from sklearn.model_selection import KFold, cross_val_score, train_test_split

import xgboost as xgb


In [None]:
#Read data
tx_data = pd.read_csv('../input/customer_segmentation/customer_segmentation.csv', encoding='cp1252')

In [None]:
#initate plotly
pyoff.init_notebook_mode()

#read data from csv and redo the data work we done before
tx_data.head()

We have all the crucial information we need:
Customer ID
Unit Price
Quantity
Invoice Date
Revenue = Active Customer Count * Order Count * Average Revenue per Order


<h3> Feature Engineering </h3>

In [None]:
#converting the type of Invoice Date Field from string to datetime.
tx_data['InvoiceDate'] = pd.to_datetime(tx_data['InvoiceDate'])

In [None]:
#creating YearMonth field for the ease of reporting and visualization
tx_data['InvoiceYearMonth'] = tx_data['InvoiceDate'].map(lambda date: 100*date.year + date.month)

<h3> 1. Monthly Revenue </h3>

In [None]:
#calculate Revenue for each row and create a new dataframe with YearMonth - Revenue columns
tx_data['Revenue'] = tx_data['UnitPrice'] * tx_data['Quantity']
tx_revenue = tx_data.groupby(['InvoiceYearMonth'])['Revenue'].sum().reset_index()
tx_revenue

In [None]:
#Visulaisation
plot_data = [
    go.Scatter(
        x=tx_revenue['InvoiceYearMonth'],
        y=tx_revenue['Revenue'],
    )
]

plot_layout = go.Layout(
        xaxis={"type": "category"},
        title='Montly Revenue',
        xaxis_title="InvoiceYearMonth",
        yaxis_title="Revenue",
        
)


fig = go.Figure(data=plot_data, layout=plot_layout)
pyoff.iplot(fig)

This clearly shows our revenue is growing especially Aug ‘11 onwards (and our data in December is incomplete). Absolute numbers are fine, let’s figure out what is our Monthly Revenue Growth Rate:


<H3>2. Monthly Revenue Growth Rate</H3>

In [None]:
#using pct_change() function to see monthly percentage change
tx_revenue['MonthlyGrowth'] = tx_revenue['Revenue'].pct_change()

#showing first 5 rows
tx_revenue.head()

In [None]:
#visualization - line graph for monthly revenue growth

plot_data = [
    go.Scatter(
        x=tx_revenue.query("InvoiceYearMonth < 201112")['InvoiceYearMonth'], # since dec data is incomplete
        y=tx_revenue.query("InvoiceYearMonth < 201112")['MonthlyGrowth'],
    )
]

plot_layout = go.Layout(
        xaxis={"type": "category"},
        title='Montly Growth Rate'
    )

fig = go.Figure(data=plot_data, layout=plot_layout)
pyoff.iplot(fig)


Everything looks good, we saw 36.5% growth previous month (December is excluded in the code since it hasn’t been completed yet). But we need to identify what exactly happened on April. Was it due to less active customers or our customers did less orders? Maybe they just started to buy cheaper products? We can’t say anything without doing a deep-dive analysis.


In [None]:
tx_data.describe()

In [None]:
tx_data['Country'].value_counts()

To see the details Monthly Active Customers, we will follow the steps we exactly did for Monthly Revenue. Starting from this part, we will be focusing on UK data only (which has the most records). We can get the monthly active customers by counting unique CustomerIDs. The same analysis can be carried out for customers of other countries as well.

<h3> 3. Monthly Active Customers </h3>

In [None]:
#creating a new dataframe with UK customers only
tx_uk = tx_data.query("Country=='United Kingdom'").reset_index(drop=True)

#creating monthly active customers dataframe by counting unique Customer IDs
tx_monthly_active = tx_uk.groupby('InvoiceYearMonth')['CustomerID'].nunique().reset_index()

In [None]:
print('tx_UK 2 Columns\n',tx_uk.head(2),'\n')
print('tx_monthly_Active 2 Columns\n',tx_monthly_active.head(2),'\n')

In [None]:
#plotting the number of unique customer IDs year and month wise
plot_data = [
    go.Bar(
        x=tx_monthly_active['InvoiceYearMonth'],
        y=tx_monthly_active['CustomerID'],
    )
]

plot_layout = go.Layout(
        xaxis={"type": "category"},
        xaxis_title = 'InvoiceYearMonth',
    yaxis_title='Number of unique CustomerIDs year and month wise',
        title='Monthly Active Customers'
    )

fig = go.Figure(data=plot_data, layout=plot_layout)
pyoff.iplot(fig)


In April, Monthly Active Customer number dropped to 817 from 923 (-11.5%).


<h3> 4. Monthly Order Count </h3>

In [None]:
#create a new dataframe for no. of order by using quantity field
tx_monthly_sales = tx_uk.groupby('InvoiceYearMonth')['Quantity'].sum().reset_index()

#print the dataframe
tx_monthly_sales

In [None]:
#plot
plot_data = [
    go.Bar(
        x=tx_monthly_sales['InvoiceYearMonth'],
        y=tx_monthly_sales['Quantity'],
    )
]

plot_layout = go.Layout(
        xaxis={"type": "category"},
            xaxis_title = 'InvoiceYearMonth',
            yaxis_title='Quantity sold',
            title='Monthly Total # of Order'
    )

fig = go.Figure(data=plot_data, layout=plot_layout)
pyoff.iplot(fig)

As we expected, Order Count is also declined in April (279k to 257k, -8%). We know that Active Customer Count directly affected Order Count decrease. At the end, we should definitely check our Average Revenue per Order as well.

<H3>5. Average Revenue per Order</H3> 


In [None]:
# create a new dataframe for average revenue by taking the mean of it
tx_monthly_order_avg = tx_uk.groupby('InvoiceYearMonth')['Revenue'].mean().reset_index()

#print the dataframe
tx_monthly_order_avg

In [None]:
#plot the bar chart
plot_data = [
    go.Bar(
        x=tx_monthly_order_avg['InvoiceYearMonth'],
        y=tx_monthly_order_avg['Revenue'],
    )
]

plot_layout = go.Layout(
        xaxis={"type": "category"},
                xaxis_title = 'InvoiceYearMonth',
            yaxis_title='Average Revenue',
        title='Monthly Order Average'
    )
fig = go.Figure(data=plot_data, layout=plot_layout)
pyoff.iplot(fig)


Even the monthly order average dropped for April (16.7 to 15.8). We observed slow-down in every metric affecting our Revenue

We have looked at our major metrics. Of course there are many more and it varies across industries. Let’s continue investigating some other important metrics:

* New Customer Ratio: a good indicator of if we are losing our existing customers or unable to attract new ones
* Retention Rate: King of the metrics. Indicates how many customers we retain over specific time window. We will be showing examples for monthly retention rate and cohort based retention rate.

<H3> 6. New Customer Ratio </H3> 


First we should define what is a new customer. In our dataset, we can assume a new customer is whoever did his/her first purchase in the time window we defined. We will do it monthly for this example.

We will be using .min() function to find our first purchase date for each customer and define new customers based on that. The code below will apply this function and show us the revenue breakdown for each group monthly.

In [None]:
tx_min_purchase = tx_uk.groupby('CustomerID').InvoiceDate.min().reset_index()
tx_min_purchase.columns = ['CustomerID','MinPurchaseDate']
tx_min_purchase['MinPurchaseYearMonth'] = tx_min_purchase['MinPurchaseDate'].map(lambda date: 100*date.year + date.month)

#merge first purchase date column to our main dataframe (tx_uk)
tx_uk = pd.merge(tx_uk, tx_min_purchase, on='CustomerID')

tx_uk.head()

In [None]:
# create a column called User Type and assign New as default. Compare the person's invoice date with the minimum purchase 
# date for each row
# For whichever row, invoice purchase date > min. purchase date, assign the person's user type to be existing in that row.
tx_uk['UserType'] = 'New'
tx_uk.loc[tx_uk['InvoiceYearMonth']>tx_uk['MinPurchaseYearMonth'],'UserType'] = 'Existing'

In [None]:
#calculate the Revenue per month for each user type
tx_user_type_revenue = tx_uk.groupby(['InvoiceYearMonth','UserType'])['Revenue'].sum().reset_index()

In [None]:
tx_user_type_revenue.head()

In [None]:
#filtering the dates and plot the result
tx_user_type_revenue = tx_user_type_revenue.query('InvoiceYearMonth != 201012 and InvoiceYearMonth != 201112')
plot_data = [
    go.Scatter(
        x=tx_user_type_revenue.query("UserType == 'Existing'")['InvoiceYearMonth'],
        y=tx_user_type_revenue.query("UserType == 'Existing'")['Revenue'],
        name = 'Existing'
    ),
    go.Scatter(
        x=tx_user_type_revenue.query("UserType == 'New'")['InvoiceYearMonth'],
        y=tx_user_type_revenue.query("UserType == 'New'")['Revenue'],
        name = 'New'
    )
]

plot_layout = go.Layout(
        xaxis={"type": "category"},
        title='New vs Existing',
                    xaxis_title = 'InvoiceYearMonth',
            yaxis_title='Revenue' 
    )
fig = go.Figure(data=plot_data, layout=plot_layout)
pyoff.iplot(fig)


Existing customers are showing a positive trend and tell us that our customer base is growing but new customers have a slight negative trend.

In [None]:
#create a dataframe that shows new user ratio - we also need to drop NA values (first month new user ratio is 0)
tx_user_ratio = tx_uk.query("UserType == 'New'").groupby(['InvoiceYearMonth'])['CustomerID'].nunique()/tx_uk.query("UserType == 'Existing'").groupby(['InvoiceYearMonth'])['CustomerID'].nunique() 
tx_user_ratio = tx_user_ratio.reset_index()

#print the dafaframe
tx_user_ratio

In [None]:
tx_user_ratio = tx_user_ratio.dropna()

In [None]:
#plot the result
plot_data = [
    go.Bar(
        x=tx_user_ratio.query("InvoiceYearMonth>201101 and InvoiceYearMonth<201112")['InvoiceYearMonth'],
        y=tx_user_ratio.query("InvoiceYearMonth>201101 and InvoiceYearMonth<201112")['CustomerID'],
    )
]
#because first month is NAN value and december month has incomplete data
plot_layout = go.Layout(
        xaxis={"type": "category"},
        title='New Customer Ratio'
    )
fig = go.Figure(data=plot_data, layout=plot_layout)
pyoff.iplot(fig)

New Customer Ratio has declined as expected (we assumed on Feb, all customers were New) and running around 20%.


<h3>7. Monthly Retention Rate </h3>

Retention rate should be monitored very closely because it indicates how sticky is your service and how well your product fits the market. For making Monthly Retention Rate visualized, we need to calculate how many customers retained from previous month.

**Monthly Retention Rate** = Retained Customers From Prev. Month/Active Customers Total

In [None]:
#identify which users are active by looking at their revenue per month
tx_user_purchase = tx_uk.groupby(['CustomerID','InvoiceYearMonth'])['Revenue'].sum().reset_index()
tx_user_purchase.head()

In [None]:
#create retention matrix with crosstab
tx_retention = pd.crosstab(tx_user_purchase['CustomerID'], tx_user_purchase['InvoiceYearMonth']).reset_index()

tx_retention.head()

Retention table shows us which customers are active on each month (1 stands for active).


In [None]:
#create an array of dictionary which keeps Retained & Total User count for each month
months = tx_retention.columns[2:]
months

In [None]:
# First we take the sum of all the unique IDs of customers in the selected month. If a person is not active in the selected month,
# then his value will be zero for selected month. IF he is active then his value will be 1. This data is saved in TotalUserCount.
# Then we take the RetainedUserCount. We will see the customers who are not only active in this month (since we dont want dead customers),
# but also people who were active in previous month. And then we can take the sum of the people active in previous month and save it in Retained
#User Count. Hence we will get the Retained User Count.

retention_array = []
for i in range(len(months)-1):
    retention_data = {}
    selected_month = months[i+1]
    prev_month = months[i]
    retention_data['InvoiceYearMonth'] = int(selected_month)
    retention_data['TotalUserCount'] = tx_retention[selected_month].sum()
    retention_data['RetainedUserCount'] = tx_retention[(tx_retention[selected_month]>0) & (tx_retention[prev_month]>0)][prev_month].sum()
    retention_array.append(retention_data)    

In [None]:
retention_array

In [None]:
#convert the array to dataframe and calculate Retention Rate
tx_retention = pd.DataFrame(retention_array)
tx_retention['RetentionRate'] = tx_retention['RetainedUserCount']/tx_retention['TotalUserCount']
tx_retention.head()

In [None]:
#plot the retention rate graph
plot_data = [go.Scatter(
        x=tx_retention.query("InvoiceYearMonth<201112")['InvoiceYearMonth'],
        y=tx_retention.query("InvoiceYearMonth<201112")['RetentionRate'],
        name="organic") ]
plot_layout = go.Layout(xaxis={"type": "category"},
        title='Monthly Retention Rate', xaxis_title = 'Invoice Year Month', yaxis_title = 'Retention Rate')
fig = go.Figure(data=plot_data, layout=plot_layout)
pyoff.iplot(fig)

Monthly Retention Rate significantly jumped from June to August and went back to previous levels afterwards.
