## Online Retail Dataset 

##### Here will be checking the monthly revenue of the online retail company based the available dataset

#### Import Libraries

In [1]:
import numpy as np
import pandas as pd

from datetime import datetime , timedelta

%matplotlib inline
import warnings
warnings.filterwarnings('ignore')

# Read dataset from CSV
customer_transactions = pd.read_csv("OnlineRetail.csv", encoding = 'unicode_escape')

customer_transactions.head(5)


Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,12/1/2010 8:26,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,12/1/2010 8:26,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,12/1/2010 8:26,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,12/1/2010 8:26,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,12/1/2010 8:26,3.39,17850.0,United Kingdom


##### Let's check the shape of the dataset

In [2]:
customer_transactions.shape

(541909, 8)

In [3]:
customer_transactions.dtypes

InvoiceNo       object
StockCode       object
Description     object
Quantity         int64
InvoiceDate     object
UnitPrice      float64
CustomerID     float64
Country         object
dtype: object

##### In order to calcuate the Monthly revenue we need the following
1. CustomerID
2. UnitPrice
3. Quantity 
4. InvoiceDate

***Revenue*** = *Active Customer Count* * *Order Count* * *Average Revenue per Order*

#### Let's engineer the data to get the month-wise revenue 

In [4]:
customer_transactions['InvoiceDate'] = pd.to_datetime(customer_transactions['InvoiceDate'])
customer_transactions['InvoiceYearMonth'] = customer_transactions['InvoiceDate'] \
                                            .apply(lambda x : x.strftime('%Y%m'))

# Calculate revenue column
customer_transactions['Revenue'] = customer_transactions.Quantity * customer_transactions.UnitPrice

# Create a new dataframe grouped by yearmonth and revenue
company_revenue = customer_transactions.groupby(['InvoiceYearMonth'])['Revenue'].sum().reset_index()
company_revenue

Unnamed: 0,InvoiceYearMonth,Revenue
0,201012,748957.02
1,201101,560000.26
2,201102,498062.65
3,201103,683267.08
4,201104,493207.121
5,201105,723333.51
6,201106,691123.12
7,201107,681300.111
8,201108,682680.51
9,201109,1019687.622


##### Now that we have our dataframe of monthly revenues assembled, lets plot it !

In [5]:
import matplotlib.pyplot as plt
import seaborn as sns

import chart_studio.plotly as py
import plotly.graph_objs as go
import plotly.io as pio
# from plotly.offline import plot , iplot ,init_notebook_mode
#initiate visualization library for jupyter notebook 
# init_notebook_mode(connected=True)

In [53]:
#X and Y axis inputs for Plotly graph. We use Scatter for line graphs

figure = go.Figure()

figure.add_trace(go.Scatter(
                            x= company_revenue['InvoiceYearMonth'] ,
                            y= company_revenue['Revenue']
                ))

figure.update_layout(title='Total Revenue per month',
                   xaxis_title='Year/Month',
                
                   yaxis_title='Revenue')
# py.iplot(figure, filename='basic-scatter')
# pio.show(figure)
figure.update_xaxes(type = 'category')
# pio.show(figure)
pio.write_html(figure, file='company_revenue.html', auto_open=True)

![alt text](company_revenue.png "Title")


As seen from the plot, after Aug 2011 the Revenue has a considerable growth

#### Monthly Revenue Growth Rate:

In [11]:
company_revenue['MonthlyGrowthIn%'] = company_revenue.Revenue.pct_change()

company_revenue

Unnamed: 0,InvoiceYearMonth,Revenue,MonthlyGrowthIn%
0,201012,748957.02,
1,201101,560000.26,-0.252293
2,201102,498062.65,-0.110603
3,201103,683267.08,0.37185
4,201104,493207.121,-0.278163
5,201105,723333.51,0.466592
6,201106,691123.12,-0.04453
7,201107,681300.111,-0.014213
8,201108,682680.51,0.002026
9,201109,1019687.622,0.493653


In [10]:
#X and Y axis inputs for Plotly graph. We use Scatter for line graphs

figure = go.Figure()

figure.add_trace(go.Scatter(
                            x= company_revenue.loc[company_revenue['InvoiceYearMonth'] < '201112']['InvoiceYearMonth'] ,
                            y= company_revenue.loc[company_revenue['InvoiceYearMonth'] < '201112']['MonthlyGrowthIn%']
                ))

figure.update_layout(title='Monthly Revenue Growth Rate',
                   xaxis_title='Year/Month',
                
                   yaxis_title='MonthlyGrowthIn%')
# py.iplot(figure, filename='basic-scatter')
# pio.show(figure)
figure.update_xaxes(type = 'category')
# pio.show(figure)
pio.write_html(figure, file='company_revenue_growth_rate.html', auto_open=True)

![alt text](plot_revenue_growth_rate.png "Title")

There was a **36.5** % Growth .
But in order to indentify why the revenue growth dipped in April 2011, we need to do some further Analysis.

#### Monthly Active Customers

In [21]:
customer_transactions.Country.value_counts()

United Kingdom          495478
Germany                   9495
France                    8557
EIRE                      8196
Spain                     2533
Netherlands               2371
Belgium                   2069
Switzerland               2002
Portugal                  1519
Australia                 1259
Norway                    1086
Italy                      803
Channel Islands            758
Finland                    695
Cyprus                     622
Sweden                     462
Unspecified                446
Austria                    401
Denmark                    389
Japan                      358
Poland                     341
Israel                     297
USA                        291
Hong Kong                  288
Singapore                  229
Iceland                    182
Canada                     151
Greece                     146
Malta                      127
United Arab Emirates        68
European Community          61
RSA                         58
Lebanon 

#### Let's concentrate on Uk's data

In [18]:
customers_uk = customer_transactions.loc[customer_transactions["Country"] == 'United Kingdom']

customers_uk.head()

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,InvoiceYearMonth,Revenue
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850.0,United Kingdom,201012,15.3
1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom,201012,20.34
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850.0,United Kingdom,201012,22.0
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom,201012,20.34
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom,201012,20.34


Monthly Active Customers in UK :

In [36]:
customers_uk_mnth_active = customers_uk.groupby('InvoiceYearMonth').CustomerID.nunique().reset_index()
customers_uk_mnth_active = customers_uk_mnth_active.rename(columns = {'CustomerID' : 'ActiveCustomers'})
customers_uk_mnth_active

Unnamed: 0,InvoiceYearMonth,ActiveCustomers
0,201012,871
1,201101,684
2,201102,714
3,201103,923
4,201104,817
5,201105,985
6,201106,943
7,201107,899
8,201108,867
9,201109,1177


In [46]:
figure = go.Figure()

figure.add_trace(
                go.Bar(
                            x= customers_uk_mnth_active['InvoiceYearMonth'] ,
                            y= customers_uk_mnth_active['ActiveCustomers'] 
                            )
                )

figure.update_layout( title='Monthly Active Customers',
                       xaxis_title='Year/Month',
                       yaxis_title='No. of Customers')
# py.iplot(figure, filename='basic-scatter')
# pio.show(figure)
figure.update_xaxes(type = 'category')
# pio.show(figure)
pio.write_html(figure, file='monthly_Active_uk_customers.html', auto_open=True)

![alt text](monthly_Active_uk_customers.png "Title")

***By looking at all the data above, the dip in revenue in the month of April '11 is explicable as the no. of active customers dropped from 923 in March to 817 in April***

Now , we can verify the same for the order count as well

In [50]:
customers_uk_order_count = customers_uk.groupby('InvoiceYearMonth').Quantity.sum().reset_index()
customers_uk_order_count = customers_uk_order_count.rename(columns = {'Quantity' : 'TotalQuantityOrdered'})
customers_uk_order_count

Unnamed: 0,InvoiceYearMonth,TotalQuantityOrdered
0,201012,298101
1,201101,237381
2,201102,225641
3,201103,279843
4,201104,257666
5,201105,306452
6,201106,258522
7,201107,324129
8,201108,319804
9,201109,458490


In [51]:
figure = go.Figure()

figure.add_trace(
                go.Bar(
                            x= customers_uk_order_count['InvoiceYearMonth'] ,
                            y= customers_uk_order_count['TotalQuantityOrdered'] 
                            )
                )

figure.update_layout( title='Monthly Order Count',
                       xaxis_title='Year/Month',
                       yaxis_title='Total no. of Quantities Ordered')
# py.iplot(figure, filename='basic-scatter')
# pio.show(figure)
figure.update_xaxes(type = 'category')
# pio.show(figure)
pio.write_html(figure, file='monthly_quantity_ordered_uk_customers.html', auto_open=True)

![alt text](monthly_quantity_ordered_uk_customers.png "Title")


***So, as already thought of the order count declined in the month of April in comparison to March***

#### Average Revenue Per Order

In [60]:
comp_uk_avg_mnthly_revnu = customers_uk.groupby(['InvoiceYearMonth'])['Revenue'].mean().reset_index()
comp_uk_avg_mnthly_revnu = comp_uk_avg_mnthly_revnu.rename(columns = {'Revenue' : 'AvgRevenue'} )
comp_uk_avg_mnthly_revnu

Unnamed: 0,InvoiceYearMonth,AvgRevenue
0,201012,16.86586
1,201101,13.61468
2,201102,16.093027
3,201103,16.716166
4,201104,15.77338
5,201105,17.713823
6,201106,16.714748
7,201107,15.723497
8,201108,17.315899
9,201109,18.931723


In [61]:
figure = go.Figure()

figure.add_trace(
                go.Bar(
                            x= comp_uk_avg_mnthly_revnu['InvoiceYearMonth'] ,
                            y= comp_uk_avg_mnthly_revnu['AvgRevenue'] 
                            )
                )

figure.update_layout( title='Average Revenue Per Month',
                       xaxis_title='Year/Month',
                       yaxis_title='Average Revenue')
# py.iplot(figure, filename='basic-scatter')
# pio.show(figure)
figure.update_xaxes(type = 'category')
# pio.show(figure)
pio.write_html(figure, file='monthly_averge_revenue_uk_customers.html', auto_open=True)

![alt text](monthly_averge_revenue_uk_customers.png "Title")


The average revenue dipped from March to April

We can also look at other metrics like
1. **New Customer Ratio:** A good indicator of if we are losing our existing customers or unable to attract new ones
2. **Retention Rate:** Indicates how many customers we retain over specific time window. 

#### New Customer Ratio

If a customer does his / her first purchase within the predefined time window , then we term them as a New Customer.

Here we will segregate the New Customers from the existing.

In [88]:
cust_uk_w_first_purchase = customers_uk.sort_values('InvoiceDate')
cust_uk_w_first_purchase['FirstInvoiceDate'] = cust_uk_w_first_purchase.groupby('CustomerID').InvoiceDate.transform('first')
cust_uk_w_first_purchase.tail()


Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,InvoiceYearMonth,Revenue,FirstInvoiceDate
541879,581585,22726,ALARM CLOCK BAKELIKE GREEN,8,2011-12-09 12:31:00,3.75,15804.0,United Kingdom,201112,30.0,2011-05-25 10:12:00
541892,581586,21217,RED RETROSPOT ROUND CAKE TINS,24,2011-12-09 12:49:00,8.95,13113.0,United Kingdom,201112,214.8,2010-12-08 13:07:00
541890,581586,22061,LARGE CAKE STAND HANGING STRAWBERY,8,2011-12-09 12:49:00,2.95,13113.0,United Kingdom,201112,23.6,2010-12-08 13:07:00
541891,581586,23275,SET OF 3 HANGING OWLS OLLIE BEAK,24,2011-12-09 12:49:00,1.25,13113.0,United Kingdom,201112,30.0,2010-12-08 13:07:00
541893,581586,20685,DOORMAT RED RETROSPOT,10,2011-12-09 12:49:00,7.08,13113.0,United Kingdom,201112,70.8,2010-12-08 13:07:00


In [107]:
# Create column FirstInvoiceYearMonth
cust_uk_w_first_purchase['FirstInvoiceDate'] = pd.to_datetime(cust_uk_w_first_purchase['FirstInvoiceDate'])
# cust_uk_w_first_purchase['FirstInvoiceDate'].isnull().sum()
cust_uk_w_first_purchase['FirstInvoiceYearMonth'] = cust_uk_w_first_purchase['FirstInvoiceDate'] \
                                                    .map(lambda date: 100*date.year + date.month)
cust_uk_w_first_purchase['FirstInvoiceYearMonth'] = pd.to_numeric(cust_uk_w_first_purchase['FirstInvoiceYearMonth'], downcast='integer')
cust_uk_w_first_purchase.tail()


Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,InvoiceYearMonth,Revenue,FirstInvoiceDate,FirstInvoiceYearMonth
541879,581585,22726,ALARM CLOCK BAKELIKE GREEN,8,2011-12-09 12:31:00,3.75,15804.0,United Kingdom,201112,30.0,2011-05-25 10:12:00,201105.0
541892,581586,21217,RED RETROSPOT ROUND CAKE TINS,24,2011-12-09 12:49:00,8.95,13113.0,United Kingdom,201112,214.8,2010-12-08 13:07:00,201012.0
541890,581586,22061,LARGE CAKE STAND HANGING STRAWBERY,8,2011-12-09 12:49:00,2.95,13113.0,United Kingdom,201112,23.6,2010-12-08 13:07:00,201012.0
541891,581586,23275,SET OF 3 HANGING OWLS OLLIE BEAK,24,2011-12-09 12:49:00,1.25,13113.0,United Kingdom,201112,30.0,2010-12-08 13:07:00,201012.0
541893,581586,20685,DOORMAT RED RETROSPOT,10,2011-12-09 12:49:00,7.08,13113.0,United Kingdom,201112,70.8,2010-12-08 13:07:00,201012.0


In [108]:
# Create Column UserType ,
# values 'Old' : if 1st first purchase year/ month before invoice year/ month
cust_uk_w_first_purchase['UserType'] = 'New'
cust_uk_w_first_purchase.loc[cust_uk_w_first_purchase.InvoiceYearMonth > cust_uk_w_first_purchase.FirstInvoiceYearMonth ,'UserType' ] = 'Existing' 
cust_uk_w_first_purchase.tail()



TypeError: '>' not supported between instances of 'str' and 'float'