<a href="https://colab.research.google.com/github/popo169/Portfolio-Project/blob/main/Copy_of_Pizza_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Introduction

Business intelligence is very important nowadays. It helps you to know better of your company including how the sales recently, which item is going to out of stock and most importantly, what is the profit.

In this project, I am going to demonstrate how to use data from a pizza shop along with SQL to create 3 dashboards for business intelligence.

##Data content

There are 10 tables in the dataset.

Table name | Content | Columns
--- | --- | ---
Address | Customers' address and zip code | 'add_id', 'delivery_address1', 'delivery_address2', 'delivery_city', 'delivery_zipcode'
Customers | First and last name | 'cust_id', 'cust_firstname', 'cust_lastname'
Ingredient | Ingredient name, weight, unit and price | 'ing_id', 'ing_name', 'ing_weight', 'ing_meas', 'ing_price'
Inventory | Quantity of inventory | 'inv_id', 'item_id', 'quantity'
Item | Item name, code, category, size and price | 'item_id', 'sku', 'item_name', 'item_cat', 'item_size', 'item_price'
Orders | Order create time, item ordered, quantity, customer ID, delivery/pick up, address ID | 'row_id', 'order_id', 'created_at', 'item_id', 'quantity', 'cust_id', 'delivery', 'add_id'
Recipe | Recipe ID, ingredient ID, required quantity | 'row_id', 'recipe_id', 'ing_id', 'quantity'
Rota | Duty date, shift ID, staff ID | 'row_id', 'rota_id', 'date', 'shift_id', 'staff_id'
Shift | Day of week, start time and end time | 'shift_id', 'day_of_week', 'start_time', 'end_time'
Staff | Staff first and last name, position, hourly rate | 'staff_id', 'first_name', 'last_name', 'position', 'hourly_rate'

##Dashboard 1

This dashboard contains the basic information about the pizza shop. It tells whether the pizza shop is earning or losing money.
1.   Total number of order
2.   Total sales
3.   Total item sold
4.   Average order value
5.   Sales by category
6.   Top selling item
7.   Orders by hour
8.   Sales by hour
9.   Orders by address
10.  Orders by delivery/pick up

##Dashboard 2

The second dashboard is about the material cost and stock information to keep the pizza shop running.
1.   Total quantity required for orders by ingredient
2.   Total cost of ingredients for orders
3.   Calculated cost of pizza
4.   Percentage stock remaining by ingredient
5.   List of ingredients to re-order based on remaining inventory

##Dashboard 3

The last dashboard is going to capture the human resource cost of the pizza shop.
1.   Hours worked by staff member
2.   Total hours worked
3.   Cost per staff member
4.   Total staff cost

##Import data

In [None]:
# connect google drive
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [None]:
# import library
import sqlite3
import pandas as pd
import numpy as np

In [None]:
# define functions

def pd_to_sqlDB(input_df: pd.DataFrame, table_name: str, db_name: str = 'default.db') -> None:

    '''Take a Pandas dataframe `input_df` and upload it to `table_name` SQLITE table
    Args:
        input_df (pd.DataFrame): Dataframe containing data to upload to SQLITE
        table_name (str): Name of the SQLITE table to upload to
        db_name (str, optional): Name of the SQLITE Database in which the table is created. 
                                 Defaults to 'default.db'.
    '''

    # Step 1: Find columns in the dataframe
    cols = input_df.columns
    cols_string = ','.join(cols)
    val_wildcard_string = ','.join(['?'] * len(cols))

    # Step 2: Connect to a DB file if it exists, else crete a new file
    con = sqlite3.connect(db_name)
    cur = con.cursor()

    # Step 3: Create Table
    sql_string = f"""CREATE TABLE IF NOT EXISTS {table_name} ({cols_string});"""
    cur.execute(sql_string)

    # Step 4: Upload the dataframe
    rows_to_upload = input_df.to_dict(orient='split')['data']
    sql_string = f"""INSERT INTO {table_name} ({cols_string}) VALUES ({val_wildcard_string});"""    
    cur.executemany(sql_string, rows_to_upload)
  
    # Step 5: Commit the changes and close the connection
    con.commit()
    con.close()



def sql_query_to_pd(sql_query_string: str, db_name: str ='default.db') -> pd.DataFrame:
    '''Execute an SQL query and return the results as a pandas dataframe
    Args:
        sql_query_string (str): SQL query string to execute
        db_name (str, optional): Name of the SQLITE Database to execute the query in.
                                 Defaults to 'default.db'.
    Returns:
        pd.DataFrame: Results of the SQL query in a pandas dataframe
    '''    
    # Step 1: Connect to the SQL DB
    con = sqlite3.connect(db_name)

    # Step 2: Execute the SQL query
    cursor = con.execute(sql_query_string)

    # Step 3: Fetch the data and column names
    result_data = cursor.fetchall()
    cols = [description[0] for description in cursor.description]

    # Step 4: Close the connection
    con.close()

    # Step 5: Return as a dataframe
    return pd.DataFrame(result_data, columns=cols)

In [None]:
# import data
address=pd.read_csv('/content/gdrive/My Drive/dataset/SQL/pizza_proj/address.csv')
customers=pd.read_csv('/content/gdrive/My Drive/dataset/SQL/pizza_proj/customers.csv')
ingredient=pd.read_csv('/content/gdrive/My Drive/dataset/SQL/pizza_proj/ingredient.csv')
inventory=pd.read_csv('/content/gdrive/My Drive/dataset/SQL/pizza_proj/inventory.csv')
item=pd.read_csv('/content/gdrive/My Drive/dataset/SQL/pizza_proj/item.csv')
orders=pd.read_csv('/content/gdrive/My Drive/dataset/SQL/pizza_proj/orders.csv')
recipe=pd.read_csv('/content/gdrive/My Drive/dataset/SQL/pizza_proj/recipe.csv')
rota=pd.read_csv('/content/gdrive/My Drive/dataset/SQL/pizza_proj/rota.csv')
shift=pd.read_csv('/content/gdrive/My Drive/dataset/SQL/pizza_proj/shift.csv')
staff=pd.read_csv('/content/gdrive/My Drive/dataset/SQL/pizza_proj/staff.csv')

In [None]:
# import data to sql tables
pd_to_sqlDB(address,"address","pizza_db")
pd_to_sqlDB(customers,"customers","pizza_db")
pd_to_sqlDB(ingredient,"ingredient","pizza_db")
pd_to_sqlDB(inventory,"inventory","pizza_db")
pd_to_sqlDB(item,"item","pizza_db")
pd_to_sqlDB(orders,"orders","pizza_db")
pd_to_sqlDB(recipe,"recipe","pizza_db")
pd_to_sqlDB(rota,"rota","pizza_db")
pd_to_sqlDB(shift,"shift","pizza_db")
pd_to_sqlDB(staff,"staff","pizza_db")

#Query for 1st dashboard

In [None]:
# Query table for the 1st dashboard
query="""
SELECT
  o.order_id,
  o.quantity*i.item_price AS order_sales,
  o.quantity,
  i.item_cat,
  i.item_name,
  o.created_at,
  a.delivery_address1,
  -- a.delivery_address2, --no data in address 2
  a.delivery_city,
  a.delivery_zipcode,
  o.delivery
FROM orders o
LEFT JOIN item i ON o.item_id=i.item_id
LEFT JOIN address a ON o.add_id=a.add_id;
"""
result1=sql_query_to_pd(query,"pizza_db")

In [None]:
# Convert to appropriate data type
result1['created_at']=pd.to_datetime(result1['created_at'],format="%d/%m/%Y %H:%M")
result1['delivery_zipcode']=result1['delivery_zipcode'].astype('str')

In [None]:
# Table for 1st dashboard 
result1

Unnamed: 0,order_id,order_sales,quantity,item_cat,item_name,created_at,delivery_address1,delivery_city,delivery_zipcode,delivery
0,109,24,2,Pizza,Pizza Margherita Reg,2022-08-10 13:22:00,607 Trails End Road,Manchester,6042,1
1,109,24,2,Pizza,Pizza Margherita Reg,2022-08-10 13:22:00,607 Trails End Road,Manchester,6042,1
2,109,24,2,Pizza,Pizza Margherita Reg,2022-08-10 13:22:00,607 Trails End Road,Manchester,6042,1
3,109,24,2,Pizza,Pizza Margherita Reg,2022-08-10 13:22:00,607 Trails End Road,Manchester,6042,1
4,110,16,1,Pizza,Pizza Diavola (hot) Reg,2022-08-10 13:53:00,25 Cliffside Drive,Manchester,6042,1
...,...,...,...,...,...,...,...,...,...,...
2507,166,5,1,Dessert,Chocolate Brownie,2022-08-10 14:22:00,91 Eldridge Street,Manchester,6040,1
2508,166,6,1,Drink,Coca Cola Regular 1.5l,2022-08-10 14:22:00,91 Eldridge Street,Manchester,6040,1
2509,166,6,1,Drink,Coca Cola Regular 1.5l,2022-08-10 14:22:00,91 Eldridge Street,Manchester,6040,1
2510,166,6,1,Drink,Coca Cola Regular 1.5l,2022-08-10 14:22:00,91 Eldridge Street,Manchester,6040,1


##Dashboard 1

This dashboard contains the basic information about the pizza shop. It tells whether the pizza shop is earning or losing money.
1.   Total number of order
2.   Total sales
3.   Total item sold
4.   Average order value
5.   Sales by category
6.   Top selling item
7.   Orders by hour
8.   Sales by hour
9.   Orders by address
10.  Orders by delivery/pick up

##1. Total orders

In [None]:
ttl_no_order=len(result1['order_id'].unique())
print(f'The total number of order is {ttl_no_order}')

The total number of order is 58


##2. Total sales

In [None]:
ttl_sales=sum(result1['order_sales'])
print(f'The total sales is ${ttl_sales}')

The total sales is $47736


##3. Total item sold

In [None]:
ttl_item_sold=sum(result1['quantity'])
print(f'The total item sold is {ttl_item_sold}')

The total item sold is 5008


##4. Average order value

In [None]:
avg_order_val=result1.groupby(['order_id'])['order_sales'].sum().mean()
print(f'The average order value is ${avg_order_val:.2f}')

The average order value is $823.03


##5. Sales by category

In [None]:
sales_by_cat=result1.groupby(['item_cat'])['order_sales'].sum().to_frame().sort_values('order_sales',ascending=False)
print(f'The average order value is \n',sales_by_cat)

The average order value is 
           order_sales
item_cat             
Pizza           30528
Dessert          7320
Side             6408
Drink            3480


##6. Top selling item

In [None]:
top_item=result1.groupby(['item_name'])['order_sales'].sum().idxmax()
print(f'The top selling item is {top_item}')

The top selling item is Pizza Quattro Formaggi Large


##7. Order by hour

In [None]:
order_by_hour=result1.groupby(result1['created_at'].dt.hour)['order_id'].count().to_frame('count').rename_axis('hour')
print(f'The order by hour is\n',order_by_hour)

The order by hour is
       count
hour       
12      240
13      768
14       88
18      120
19      536
20      296
21      224
22      240


##8. Sales by hour

In [None]:
sales_by_hour=result1.groupby(result1['created_at'].dt.hour)['order_sales'].sum().to_frame('sales').rename_axis('hour')
print(f'The sales by hour is\n',sales_by_hour)

The sales by hour is
       sales
hour       
12     3896
13    11192
14      728
18     3352
19    12416
20     4536
21     3760
22     7856


##9. Order by address

In [None]:
order_by_address=result1['delivery_address1'].value_counts().to_frame('count').rename_axis('address')
print(f'The order by address is\n',order_by_address)

The order by address is
                             count
address                          
150 Carter Street             136
68 Princeton Street            96
18 Cambridge Street            96
125 Summer Street              88
425 Middle Turnpike East       72
95 Briarwood Drive             72
60 Desousa Drive               72
61 Plymouth Lane               72
697 Parker Street              72
29 Lucian Street               64
89 High Ledge Circle           56
44 Downey Drive                56
123 Elizabeth Drive            56
211 Oak Street                 56
145 Saint John Street          56
86 Highland Street             48
22 Star Farms Drive            48
61 Hills Street                48
225 Kennedy Road               48
184 Woodland Street            48
126 Garth Road                 48
65 Arcellia Drive              48
34 Holyoke Road                40
4 Orchard Street               40
25 Edwards Street              40
117 Adelaide Road              40
310 Timrod Road        

##10. Order by delivery/pick up

In [None]:
order_by_delivery=result1['delivery'].value_counts().to_frame('count').rename_axis('delivery(1)/pick up(0)')
print(f'The order by delivery/pick up is\n',order_by_delivery)

The order by delivery/pick up is
                         count
delivery(1)/pick up(0)       
1                        1912
0                         600


#Query for 2nd dashboard

In [None]:
# Query table for the 2nd dashboard
query="""
SELECT
  sub1.item_name,
  sub1.ing_id,
  sub1.ing_name,
  sub1.ing_weight,
  sub1.ing_price,
  sub1.order_quan,
  sub1.rep_quan,
  sub1.order_quan*sub1.rep_quan AS quantity_required,
  sub1.ing_price/sub1.ing_weight AS unit_cost,
  (sub1.order_quan*sub1.rep_quan)*(sub1.ing_price/sub1.ing_weight) AS ingredient_cost
FROM
    (SELECT
      o.item_id,
      rep.recipe_id,
      i.item_name,
      rep.ing_id,
      ing.ing_name,
      SUM(o.quantity)  AS order_quan,
      rep.quantity AS rep_quan,
      ing.ing_price,
      ing.ing_weight
    FROM orders o
    LEFT JOIN item i ON o.item_id=i.item_id
    LEFT JOIN recipe rep ON i.sku=rep.recipe_id
    LEFT JOIN ingredient ing ON rep.ing_id=ing.ing_id
    GROUP BY
      o.item_id, 
      rep.recipe_id, 
      i.item_name,
      rep.ing_id,
      rep.quantity,
      ing.ing_name,
      ing.ing_weight,
      ing.ing_price) sub1;
"""
result2=sql_query_to_pd(query,"pizza_db")
result2['item_cost']=result2['rep_quan']*result2['unit_cost']
result2

Unnamed: 0,item_name,ing_id,ing_name,ing_weight,ing_price,order_quan,rep_quan,quantity_required,unit_cost,ingredient_cost,item_cost
0,Pizza Margherita Reg,ING001,Pizza dough ball (8 pack),2000,4.22,288,250,72000,0.002110,151.9200,0.527500
1,Pizza Margherita Reg,ING002,Tomato sauce,4500,3.89,288,80,23040,0.000864,19.9168,0.069156
2,Pizza Margherita Reg,ING003,Mozzarella cheese,2500,14.45,288,170,48960,0.005780,282.9888,0.982600
3,Pizza Margherita Reg,ING004,Dried oregano,500,5.99,288,5,1440,0.011980,17.2512,0.059900
4,Pizza Margherita Large,ING001,Pizza dough ball (8 pack),2000,4.22,112,300,33600,0.002110,70.8960,0.633000
...,...,...,...,...,...,...,...,...,...,...,...
104,Fanta Diet 1.5l,ING043,Fanta Diet 1.5l,1,0.96,32,1,32,0.960000,30.7200,0.960000
105,San Pelligrino 33cl,ING044,San Pelligrino 33cl,1,0.36,64,1,64,0.360000,23.0400,0.360000
106,San Pelligrino 1.5l,ING045,San Pelligrino 1.5l,1,0.86,144,1,144,0.860000,123.8400,0.860000
107,Perrier 33cl,ING046,Perrier 33cl,1,0.36,48,1,48,0.360000,17.2800,0.360000


##Dashboard 2

The second dashboard is about the material cost and stock information to keep the pizza shop running.
1.   Total quantity required for orders by ingredient
2.   Total cost of ingredients for orders
3.   Calculated cost of pizza
4.   Percentage stock remaining by ingredient
5.   List of ingredients to re-order based on remaining inventory

##1. Total quantity required for orders by ingredient

In [None]:
quantity_by_ingredient=result2.groupby('ing_name')['quantity_required'].sum().to_frame()
print(f'The total quantity required for orders by ingredient is\n',quantity_by_ingredient)

The total quantity required for orders by ingredient is
                            quantity_required
ing_name                                    
Anchovies                              29600
Banoffee pie                           67200
Caesar dressing                        11840
Calamari                               20000
Capers                                  2128
Chicken wings                          57600
Chilli pepper                          11280
Chocolate brownie                      51600
Chocolate ice cream                    59200
Coca Cola Diet 1.5l                       16
Coca Cola Diet 33cl                      128
Coca Cola Regular 1.5l                   160
Coca Cola Regular 33cl                   224
Croutons                               29600
Dried oregano                           2336
Eggplant                               41280
Fanta Diet 1.5l                           32
Fanta Regular 1.5l                        96
Fanta Regular 33cl                       46

##2. Total cost of ingredients for orders

In [None]:
cost_by_ingredient=result2.groupby('ing_name')['ingredient_cost'].sum().to_frame()
print(f'The total cost of ingredients for orders is\n',cost_by_ingredient.round(2))

The total cost of ingredients for orders is
                            ingredient_cost
ing_name                                  
Anchovies                           325.30
Banoffee pie                         64.40
Caesar dressing                      56.02
Calamari                            230.16
Capers                                8.85
Chicken wings                       670.37
Chilli pepper                        73.21
Chocolate brownie                    91.85
Chocolate ice cream                 203.25
Coca Cola Diet 1.5l                  15.36
Coca Cola Diet 33cl                  52.48
Coca Cola Regular 1.5l              153.60
Coca Cola Regular 33cl               91.84
Croutons                            124.32
Dried oregano                        27.99
Eggplant                             78.43
Fanta Diet 1.5l                      30.72
Fanta Regular 1.5l                   92.16
Fanta Regular 33cl                  190.24
Fruit salad                          64.93
Garlic an

##3. Calculated cost of pizza

In [None]:
cost_of_pizza=result2.groupby('item_name')[['item_cost']].sum()
filt=(cost_of_pizza.index.str.contains("Pizza"))
print(f'The cost_of_pizza is\n',cost_of_pizza[filt].round(2))

The cost_of_pizza is
                               item_cost
item_name                              
Pizza Diavola (hot) Large          2.73
Pizza Diavola (hot) Reg            2.18
Pizza Hawaiian Large               3.00
Pizza Hawaiian Reg                 2.55
Pizza Margherita Large             1.97
Pizza Margherita Reg               1.64
Pizza Napolitana Large             2.45
Pizza Napolitana Reg               2.70
Pizza Parmigiana Large             3.66
Pizza Parmigiana Reg               3.08
Pizza Pepperoni Large              4.20
Pizza Pepperoni Reg                3.51
Pizza Quattro Formaggi Large       5.13
Pizza Quattro Formaggi Reg         4.29
Pizza Seafood Large                6.13
Pizza Seafood Reg                  5.23


In [None]:
# use previous query for following questions
pd_to_sqlDB(result2,"result2","pizza_db")

# query to answer remaining question in dashboard2
query="""
SELECT
  sub2.ing_name,
  sub2.ordered_weight,
  (ing.ing_weight*inv.quantity) AS total_inv_weight
FROM
  (SELECT
    ing_id,
    ing_name,
    SUM(quantity_required) AS ordered_weight
  FROM result2
  GROUP BY 
  ing_name, ing_id) sub2
LEFT JOIN inventory inv ON inv.item_id = sub2.ing_id
LEFT JOIN ingredient ing ON ing.ing_id = sub2.ing_id;
"""
result3=sql_query_to_pd(query,"pizza_db")
result3

Unnamed: 0,ing_name,ordered_weight,total_inv_weight
0,Anchovies,31450,2000
1,Anchovies,31450,2000
2,Anchovies,31450,2000
3,Anchovies,31450,2000
4,Banoffee pie,71400,2400
...,...,...,...
175,Tuna,21250,6000
176,Vanilla ice cream,13600,9000
177,Vanilla ice cream,13600,9000
178,Vanilla ice cream,13600,9000
