In [2]:
%load_ext sql

In [5]:
%sql postgresql://postgres:password@localhost:5432/pizza_runner

### Data Cleaning 1 - On `Customer_orders` table

Some of the tables we loaded in into the database need some cleaning before we can start querying, to fix this we'll be creating new tables containing the cleaned data

we'll be starting with the `customer_orders` table by cleaning up the exclusion and extras columns

- Convert 'null' text and blanks in `exclusions and extras` columns into NULL

In [28]:
%%sql 
Drop table if exists customer_order_cleaned;

SELECT ORDER_ID,
	CUSTOMER_ID,
	PIZZA_ID,
	CASE WHEN EXCLUSIONS = '' OR EXCLUSIONS = 'null' THEN NULL ELSE EXCLUSIONS END AS EXCLUSIONS,
	CASE WHEN EXTRAS = '' OR EXTRAS = 'null' THEN NULL ELSE EXTRAS END AS EXTRAS,
	ORDER_TIME into customer_order_cleaned
FROM pizza_runner.CUSTOMER_ORDERS;

select * from customer_order_cleaned;

 * postgresql://postgres:***@localhost:5432/pizza_runner
Done.
14 rows affected.
14 rows affected.


order_id,customer_id,pizza_id,exclusions,extras,order_time
1,101,1,,,2020-01-01 18:05:02
2,101,1,,,2020-01-01 19:00:52
3,102,1,,,2020-01-02 23:51:23
3,102,2,,,2020-01-02 23:51:23
4,103,1,4,,2020-01-04 13:23:46
4,103,1,4,,2020-01-04 13:23:46
4,103,2,4,,2020-01-04 13:23:46
5,104,1,,1,2020-01-08 21:00:29
6,101,2,,,2020-01-08 21:03:13
7,105,2,,1,2020-01-08 21:20:29


### Data Cleaning 2 - on `runner_orders` table
- Create a clean table `runner_orders_cleaned` from `runner_orders` table:
- Convert 'null' text values in pickup_time, duration and cancellation columns into null values.
- Cast pickup_time to DATETIME.
- Cast distance to FLOAT.
- Cast duration to INT.

In [6]:
%%sql 

Drop Table if exists runner_orders_cleaned;

select order_id, runner_id, 
case when pickup_time = 'null' then null else cast (pickup_time as timestamp) end as pickup_time,
cast(case when distance = 'null' then null
	 when distance like '%km' then TRIM('km' from distance)
	 else distance end as float) as distance, 
cast(case when duration = 'null' then null 
	when duration like '%min' then TRIM ('mins' from duration)
	 when duration like '%minutes' then TRIM ('minutes' from duration)
	 when duration like '%mins' then TRIM ('mins' from duration)
	 when duration like '%minute' then TRIM ('minute' from duration)
	else duration end as int) as duration, 
case when cancellation in ('null', 'Nan', '') then null else cancellation end as cancellation
into runner_orders_cleaned
from pizza_runner.runner_orders;

select * from runner_orders_cleaned;

 * postgresql://postgres:***@localhost:5432/pizza_runner
Done.
20 rows affected.
20 rows affected.


order_id,runner_id,pickup_time,distance,duration,cancellation
1,1,2020-01-01 18:15:34,20.0,32.0,
2,1,2020-01-01 19:10:54,20.0,27.0,
3,1,2020-01-03 00:12:37,13.4,20.0,
4,2,2020-01-04 13:53:03,23.4,40.0,
5,3,2020-01-08 21:10:57,10.0,15.0,
6,3,,,,Restaurant Cancellation
7,2,2020-01-08 21:30:45,25.0,25.0,
8,2,2020-01-10 00:15:02,23.4,15.0,
9,2,,,,Customer Cancellation
10,1,2020-01-11 18:50:20,10.0,10.0,


### Data Cleaning 3 - on `pizza_reciple` table

- Unnest & expand the comma seperated string in the `pizza_recipe` table

Luckily `PostgreSQL` has a built in function for this use case so no need to go all crazy ;)

In [12]:
%%sql

Drop table if exists pizza_recipes_cleaned;

SELECT pizza_id, UNNEST(STRING_TO_ARRAY(toppings, ',')) AS toppings
INTO pizza_recipes_cleaned
FROM pizza_runner.pizza_recipes;

Select * from pizza_recipes_cleaned;

 * postgresql://postgres:***@localhost:5432/pizza_runner
Done.
14 rows affected.
14 rows affected.


pizza_id,toppings
1,1
1,2
1,3
1,4
1,5
1,6
1,8
1,10
2,4
2,6


- **We are Done cleaning**. Now, we can begin with solving the **case study questions**

1. How many pizzas were ordered?

In [13]:
%%sql
SELECT COUNT(pizza_id)
FROM CUSTOMER_ORDER_CLEANED

 * postgresql://postgres:***@localhost:5432/pizza_runner
1 rows affected.


count
14


2. How many unique customer orders were made?

In [14]:
%%sql

select
	COUNT(distinct ORDER_ID)
from
	CUSTOMER_ORDER_CLEANED

 * postgresql://postgres:***@localhost:5432/pizza_runner
1 rows affected.


count
10


3. How many successful orders were delivered by each runner?

In [17]:
%%sql

SELECT runner_id , count(*) AS successful_orders
FROM runner_orders_cleaned
WHERE cancellation IS NULL
GROUP BY runner_id

 * postgresql://postgres:***@localhost:5432/pizza_runner
3 rows affected.


runner_id,successful_orders
1,4
2,3
3,1


4. How many of each type of pizza was delivered?

In [25]:
%%sql

SELECT pizza_id , pizza_name, count(*) AS successful_orders
FROM runner_orders_cleaned
JOIN customer_order_cleaned
	USING(order_id)
JOIN pizza_names
	USING (pizza_id)
WHERE cancellation IS NULL
GROUP BY 1, 2

 * postgresql://postgres:***@localhost:5432/pizza_runner
2 rows affected.


pizza_id,pizza_name,successful_orders
1,Meatlovers,9
2,Vegetarian,3


5. How many Vegetarian and Meatlovers were ordered by each customer?

In [29]:
%%sql SELECT customer_id,
sum(CASE WHEN pizza_id = 1 THEN 1 ELSE 0 END) AS MeatLovers,
sum(CASE WHEN pizza_id = 2 THEN 1 ELSE 0 END) AS Vegetarians
FROM customer_order_cleaned
GROUP BY 1 

 * postgresql://postgres:***@localhost:5432/pizza_runner
5 rows affected.


customer_id,meatlovers,vegetarians
101,2,1
103,3,1
104,3,0
105,0,1
102,2,1


In [30]:
%%sql 
-- you can use this code to answer question 5 but the one above is more finetuned

SELECT customer_id , pizza_name, count(*)
FROM runner_orders_cleaned
JOIN customer_order_cleaned
	USING(order_id)
JOIN pizza_names
	USING (pizza_id)
GROUP BY 1,2

 * postgresql://postgres:***@localhost:5432/pizza_runner
8 rows affected.


customer_id,pizza_name,count
103,Vegetarian,1
101,Meatlovers,2
105,Vegetarian,1
103,Meatlovers,3
101,Vegetarian,1
104,Meatlovers,3
102,Vegetarian,1
102,Meatlovers,2


6. What was the maximum number of pizzas delivered in a single order?

In [33]:
%%sql 
SELECT max(count) as max_pizza
FROM(SELECT order_id , count(*)
FROM customer_order_cleaned
JOIN runner_orders_cleaned
	USING (order_id)
WHERE cancellation IS NULL
GROUP BY 1) AS orders

 * postgresql://postgres:***@localhost:5432/pizza_runner
1 rows affected.


max_pizza
3


7. For each customer, how many delivered pizzas had at least 1 change and how many had no changes?

In [36]:
%%sql

SELECT customer_id,
SUM(CASE WHEN exclusions IS NULL AND extras IS NULL THEN 1 ELSE 0 END) AS no_changes, 
SUM(CASE WHEN exclusions IS NOT NULL OR extras IS NOT NULL THEN 1 ELSE 0 END) AS has_changes
FROM customer_order_cleaned
JOIN runner_orders_cleaned
	USING(order_id)
WHERE cancellation IS NULL
GROUP BY customer_id
ORDER BY customer_id

 * postgresql://postgres:***@localhost:5432/pizza_runner
5 rows affected.


customer_id,no_changes,has_changes
101,2,0
102,3,0
103,0,3
104,1,2
105,0,1


8. How many pizzas were delivered that had both exclusions and extras?

In [37]:
%%sql

SELECT sum(CASE WHEN exclusions IS NOT NULL
AND extras IS NOT NULL THEN 1 ELSE 0 END) AS changes_in_both
FROM customer_order_cleaned
JOIN runner_orders_cleaned
	USING(order_id)
WHERE cancellation IS NULL

 * postgresql://postgres:***@localhost:5432/pizza_runner
1 rows affected.


changes_in_both
1


9. What was the total volume of pizzas ordered for each hour of the day?

In [38]:
%%sql
SELECT EXTRACT(HOUR
FROM order_time) AS each_hour, count(*)
FROM customer_order_cleaned
GROUP BY 1
ORDER BY 1

 * postgresql://postgres:***@localhost:5432/pizza_runner
6 rows affected.


each_hour,count
11,1
13,3
18,3
19,1
21,3
23,3


10. What was the volume of orders for each day of the week?

In [5]:
%%sql

SELECT to_char(order_time, 'Day') AS day_of_the_week, count(order_id)
FROM customer_order_cleaned
GROUP BY 1
ORDER BY 2 DESC

 * postgresql://postgres:***@localhost:5432/pizza_runner
4 rows affected.


day_of_the_week,count
Saturday,5
Wednesday,5
Thursday,3
Friday,1


[click here](./runner_customer.ipynb) to see the solution of Runner and Customer Experience