# e-Commerce EDA

In [29]:
# Imoports
import pandas as pd
import sqlite3
import plotly.express as px

In [30]:
# Connect SQLite database.
db_conn = sqlite3.connect("SuperstoreDB/superstore.db")

This week you'll work on the e-commerce EDA using Python. **You will use a Python SQL connector to perform queries on your newly created database.**

This is your third EDA, so we are confident you can do this easily. This time also, you will create a report at the end of the project that will be presented to the management team of 'VS Group.' They would like to see how much you have understood the e-commerce business and your data analysis keeping in mind all the KPIs.



## Epic 1: Exploratory Data Analysis
There are plenty of insights you can deduce from this dataset. Try to look for some these insights yourself. As a starting point, here are some ideas that you could present to the team:

- What customer purchasing patterns can you deduce? Such as during the days of the week, weekly, monthly, quarterly, yearly, etc.

- Are there specific days/months/quarters when the sales have been unusually high/low, and what could be the possible reasons? How about the profit and loss margin?

- Which states and which customers made the highest number of orders? Are they the same as the highest spending states and customers?

- Can you make a map showing the 5 states generating the most and least sales revenue?

- Can we see the quarterly revenue behavior?

- Can you create a plot showing the growth rate of new customers over the months?

- What do you think about the customers? Are they individuals or wholesalers? Why would you say so?

- Are there any issues with the dataset?

**You don't have to create all these plots. Choose the most relevant ones for your analysis.**

**Optional: make a small selection of the plots and try to present them as Plotly-Express animations. Please remember that the most important thing is to make a good analysis, independent of which library you use to create your graphs.**


In [31]:
# What customer purchasing patterns can you deduce? Such as during the days of the week, weekly, monthly, quarterly, yearly, etc.

In [32]:
# Unique customers per year.
pd.read_sql(
    """
    SELECT Year, COUNT (DISTINCT CustomerID) AS CustomerCount
    FROM (
        SELECT
            *,
            SUBSTR(OrderDate, 1, 4) AS Year
        FROM Orders
    )
    GROUP BY Year

    """, db_conn)

Unnamed: 0,Year,CustomerCount
0,2014,595
1,2015,573
2,2016,638
3,2017,693


In [33]:
# Orders per year.
pd.read_sql(
    """
    SELECT Year, COUNT (DISTINCT OrderID) AS OrderCount
    FROM (
        SELECT
            *,
            SUBSTR(OrderDate, 1, 4) AS Year
        FROM Orders
    )
    GROUP BY Year

    """, db_conn)

Unnamed: 0,Year,OrderCount
0,2014,969
1,2015,1038
2,2016,1315
3,2017,1687


In [34]:
# Orders per month.
pd.read_sql(
    """
    SELECT Month, COUNT (DISTINCT OrderID) AS OrderCount
    FROM (
        SELECT
            *,
            SUBSTR(OrderDate, 6, 2) AS Month
        FROM Orders
    )
    GROUP BY Month

    """, db_conn)

Unnamed: 0,Month,OrderCount
0,1,178
1,2,162
2,3,354
3,4,343
4,5,369
5,6,364
6,7,338
7,8,341
8,9,688
9,10,417


In [35]:
# Orders per day of month.
pd.read_sql(
    """
    SELECT Day, COUNT (DISTINCT OrderID) AS OrderCount
    FROM (
        SELECT
            *,
            SUBSTR(OrderDate, 9, 2) AS Day
        FROM Orders
    )
    GROUP BY Day

    """, db_conn)

Unnamed: 0,Day,OrderCount
0,1,165
1,2,179
2,3,191
3,4,159
4,5,179
5,6,147
6,7,159
7,8,172
8,9,167
9,10,163


In [36]:
# Orders per year.
df = pd.read_sql(
    """
    SELECT
        *,
        SUBSTR(OrderDate, 1, 4) AS Year,
        SUBSTR(OrderDate, 6, 2) AS Month,
        SUBSTR(OrderDate, 9, 2) AS Day
        FROM Orders
    
    """, db_conn)

In [40]:
df

Unnamed: 0,OrderID,OrderDate,ShipDate,ShipMode,CustomerID,AddressID,Year,Month,Day
0,CA-2016-152156,2016-11-08,2016-11-11,Second Class,CG-12520,42420-000001,2016,11,08
1,CA-2016-138688,2016-06-12,2016-06-16,Second Class,DV-13045,90036-000001,2016,06,12
2,US-2015-108966,2015-10-11,2015-10-18,Standard Class,SO-20335,33311-000001,2015,10,11
3,CA-2014-115812,2014-06-09,2014-06-14,Standard Class,BH-11710,90032-000001,2014,06,09
4,CA-2017-114412,2017-04-15,2017-04-20,Standard Class,AA-10480,28027-000001,2017,04,15
...,...,...,...,...,...,...,...,...,...
5004,CA-2016-125794,2016-09-29,2016-10-03,Standard Class,ML-17410,90008-000001,2016,09,29
5005,CA-2017-163629,2017-11-17,2017-11-21,Standard Class,RA-19885,30605-000001,2017,11,17
5006,CA-2014-110422,2014-01-21,2014-01-23,Second Class,TB-21400,33180-000001,2014,01,21
5007,CA-2017-121258,2017-02-26,2017-03-03,Standard Class,DB-13060,92627-000001,2017,02,26


In [42]:
px.bar(df, x='Year')

In [43]:
px.bar(df, x='Month')

In [44]:
px.bar(df, x='Day')