In [1]:
import pandas as pd
import pandasql as ps
import dash
from dash import dcc, html, dash_table
import plotly.express as px

An online retail store has hired you as a consultant to review their data and provide insights that would be valuable to the CEO and CMO of the business. The business has been performing well and the management wants to analyse what the major contributing factors are to the revenue so they can strategically plan for next year.

# 0. Dataset

In [2]:
tata_data = pd.read_excel('online_retail.xlsx')

In [3]:
tata_data = tata_data.loc[tata_data['Quantity'] >= 1]
tata_data = tata_data.loc[tata_data['UnitPrice'] >= 0]

In [4]:
tata_data['StockCode'] = tata_data['StockCode'].astype(str)
tata_data['TotalPrice'] = tata_data['Quantity'] * tata_data['UnitPrice']
tata_data['InvoiceMonth'] = tata_data['InvoiceDate'].dt.month
tata_data['InvoiceYear'] = tata_data['InvoiceDate'].dt.year

In [5]:
print(tata_data.shape)
tata_data.head(3)

(531283, 11)


Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,TotalPrice,InvoiceMonth,InvoiceYear
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850.0,United Kingdom,15.3,12,2010
1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom,20.34,12,2010
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850.0,United Kingdom,22.0,12,2010


### 1. The CEO of the retail store is interested to view the time series of the revenue data for the year 2011 only. He would like to view granular data by looking into revenue for each month. The CEO is interested in viewing the seasonal trends and wants to dig deeper into why these trends occur. This analysis will be helpful for the CEO to forecast for the next year.

In [6]:
q1 = ps.sqldf('''
SELECT
    InvoiceMonth,
    ROUND(SUM (TotalPrice), 2) as MonthTotalPrice
FROM
    tata_data
WHERE
    InvoiceYear = '2011'
GROUP BY
    InvoiceMonth
ORDER BY
    InvoiceMonth ASC
''')

In [21]:
app = dash.Dash(__name__)
app.layout = html.Div([
                html.H1('Q1: Revenue for the year 2011', style={'textAlign': 'center'}), 
                html.Div(children = [
                        dcc.Graph(id = 'example-graph', 
                                figure = {'data': [{'x': q1['InvoiceMonth'], 'y': q1['MonthTotalPrice'], 
                                        'type': 'numeric',
                                        'name': 'TotalSum'}],
                                        'layout': {'height': 500, 'xaxis': {'title': "InvoiceMonth"}, 'yaxis': {'title': 'MonthTotalPrice'}}})], 
                                style = {'width': '80%',
                                        'display': 'inline-block', 
                                        'vertical-align': 'top'}),
                html.Div(children = [
                        dash_table.DataTable(id = 'table', 
                                columns = [{'name': i, 'id': i} for i in q1.columns], 
                                data = q1.to_dict('records'),
                                style_table = {'height': 500},
                                style_cell = {'text_align': 'center'})],
                                style = {'width': '20%', 'display': 'inline-block', 'vertical-align': 'center'})],
                style={'backgroundColor': 'white'}
                                )
app.run()

### 2. The CMO is interested in viewing the top 10 countries which are generating the highest revenue. Additionally, the CMO is also interested in viewing the quantity sold along with the revenue generated. The CMO does not want to have the United Kingdom in this visual.

In [8]:
q2 = ps.sqldf('''
SELECT
    *
FROM
(SELECT
    Country,
    ROUND(SUM(TotalPrice), 2) as TotalSum
FROM
    tata_data 
GROUP BY
    Country) AS Countries
WHERE
    Country != 'United Kingdom'
ORDER BY
    2 DESC LIMIT 10
''')

In [9]:
app = dash.Dash(__name__)
app.layout = html.Div([
                html.H1('Q2: Top 10 countries with highest revenue', style={'textAlign': 'center'}), 
                html.Div(children = [
                        dcc.Graph(id = 'example-graph', 
                                figure = {'data': [{'x': q2['Country'], 'y': q2['TotalSum'], 
                                        'type': 'bar',
                                        'name': 'TotalSum'}],
                                        'layout': {'height': 500, 'xaxis': {'title': "Country"}, 'yaxis': {'title': 'TotalSum'}}})], 
                                style = {'width': '80%',
                                        'display': 'inline-block', 
                                        'vertical-align': 'top'}),
                html.Div(children = [
                        dash_table.DataTable(id = 'table_q2', 
                                columns = [{'name': i, 'id': i} for i in q2.columns], 
                                data = q2.to_dict('records'),
                                style_table = {'height': 500},
                                style_cell = {'text_align': 'center'})],
                                style = {'width': '20%', 'display': 'inline-block', 'vertical-align': 'center'})],
                style={'backgroundColor': 'white'}
                                )
app.run()

### 3. The CMO of the online retail store wants to view the information on the top 10 customers by revenue. He is interested in a visual that shows the greatest revenue generating customer at the start and gradually declines to the lower revenue generating customers. The CMO wants to target the higher revenue generating customers and ensure that they remain satisfied with their products.

In [10]:
q3 = ps.sqldf('''
SELECT
    CustomerID,
    ROUND(SUM(TotalPrice), 2) AS CustomerPrice
FROM
    tata_data
WHERE
    CustomerID IS NOT NULL
GROUP BY
    CustomerID
ORDER BY
    2 DESC LIMIT 10
''')

In [11]:
q3['CustomerID'] = q3['CustomerID'].astype(int)
q3['CustomerID'] = q3['CustomerID'].astype(str)
q3['CustomerID'] = q3['CustomerID'].apply(lambda x: f'id{x}')

In [12]:
app = dash.Dash(__name__)
app.layout = html.Div([
                html.H1('Q3: Top 10 customers with highest revenue', style={'textAlign': 'center'}), 
                html.Div(children = [
                        dcc.Graph(id = 'example-graph', 
                                figure = {'data': [{'x': q3['CustomerID'], 'y': q3['CustomerPrice'], 
                                        'type': 'bar',
                                        'name': 'TotalSum'}],
                                        'layout': {'height': 500, 'xaxis': {'title': "CustomerID"}, 'yaxis': {'title': 'CustomerPrice'}}})], 
                                style = {'width': '80%',
                                        'display': 'inline-block', 
                                        'vertical-align': 'top'}),
                html.Div(children = [
                        dash_table.DataTable(id = 'table_q3', 
                                columns = [{'name': i, 'id': i} for i in q3.columns], 
                                data = q3.to_dict('records'),
                                style_table = {'height': 500},
                                style_cell = {'text_align': 'center'})],
                                style = {'width': '20%', 'display': 'inline-block', 'vertical-align': 'center'})],
                style={'backgroundColor': 'white'}
                                )
app.run()

### 4. The CEO is looking to gain insights on the demand for their products. He wants to look at all countries and see which regions have the greatest demand for their products. Once the CEO gets an idea of the regions that have high demand, he will initiate an expansion strategy which will allow the company to target these areas and generate more business from these regions. He wants to view the entire data on a single view without the need to scroll or hover over the data points to identify the demand. There is no need to show data for the United Kingdom as the CEO is more interested in viewing the countries that have expansion opportunities.

In [13]:
q4 = ps.sqldf('''
SELECT
    Country,
    ROUND(SUM(TotalPrice), 2) as TotalSum
FROM
    tata_data
WHERE
    Country != 'United Kingdom'
GROUP BY
    Country
ORDER BY
    2 DESC
''')

In [20]:
app = dash.Dash(__name__)
app.layout = html.Div([
                html.H1('Q4: Demand of all the countries', style={'textAlign': 'center'}), 
                html.Div(children = [
                        dcc.Graph(id = 'example-graph', 
                                figure = {'data': [{'x': q4['Country'], 'y': q4['TotalSum'], 
                                        'type': 'bar',
                                        'name': 'TotalSum'}],
                                        'layout': {'height': 500, 'xaxis': {'title': "Country"}, 'yaxis': {'title': 'TotalSum'}}})], 
                                style = {'width': '100%',
                                        'display': 'inline-block', 
                                        'vertical-align': 'top'}),
                ],
                style={'backgroundColor': 'white'}
                                )
app.run()