# Randomly generated order table

The name of the database for this exercise is ` ex_orders_random`

This example is randomly generated, with the help of the Faker package to generate realistic names for people and products. Because it is randomly generated, do not expect to see:

* Normal trading patterns: I did not program a "spike" at Christmas or Black Friday
* Reasonable product names
* Reasonable prices. Someone buying 10 handcrafted goat cheese pizzas for $699.00 each could occur in this dataset.

Think of these silly prices as fun easter eggs.

The script for generating the data has not been included, to ensure that the CSVs are not accidentally overwritten.

This is a medium-sized data exercise, meaning that it isn't feasible to do the problems by hand, but not so large that it will stress-test your queries.

If you think you have found an error in one of the results, please post a Github Issue.

## Tables

There are 4 tables in this database:

- **customer:** all the details about the customer
- **customer_order**: for each order_id, lists the customer, the order date and delivery date. We assume that all items in the same order are delivered on the same day. This would be called `order` except that is a reserved word in PostgreSQL.
- **order_product**: Tells us which products, and how many, are in each order
- **product**: contains the product_id, name of the product, and price. We assume that the price does not change for a particular product.

In [1]:
%load_ext sql
%sql postgresql://localhost/ex_orders_random

'Connected: @ex_orders_random'

1. **List the 10 most expensive products for sale, and their prices**

In [2]:
%%sql
SELECT 
    *
FROM 
    product
ORDER BY 
    price DESC
LIMIT
    10;

 * postgresql://localhost/ex_orders_random
10 rows affected.


product_id,product_name,price
8028,Incredible Granite Keyboard,115.0
8112,Ergonomic Concrete Cheese,114.0
8002,Ergonomic Granite Soap,113.0
8009,Generic Metal Hat,113.0
8091,Sleek Frozen Shirt,113.0
8134,Fantastic Steel Towels,113.0
8035,Generic Concrete Soap,113.0
8130,Small Metal Bike,112.0
8057,Awesome Metal Salad,111.0
8051,Handcrafted Frozen Pants,110.0


2. **Which states have  more than 5 customers? Use the state column on the `customer` table. Count each customer on the table, regardless of whether they have ever bought anything.**

In [3]:
%%sql
SELECT
    state, count(state) as num_customers
FROM
    customer
GROUP BY
    state
HAVING
    count(state) > 5
ORDER BY 
    num_customers DESC
;

 * postgresql://localhost/ex_orders_random
7 rows affected.


state,num_customers
AL,9
WY,8
IL,7
WV,7
ME,6
FL,6
MS,6


3. **Get the 17 customers that have made the largest number of orders. Include the name, address, state, and number of orders made.**

In [4]:
%%sql
WITH top_orders AS (
    SELECT
        customer_id
        ,COUNT(order_id) AS num_orders
    FROM
        customer_order
    GROUP BY 
        customer_id
    ORDER BY 
        count(order_id) DESC
    LIMIT
        17
)
SELECT 
    name
    ,address
    ,state
    ,num_orders
FROM
        top_orders
    LEFT JOIN
        customer
    ON 
        top_orders.customer_id = customer.customer_id
ORDER BY 
    num_orders DESC
;        

 * postgresql://localhost/ex_orders_random
17 rows affected.


name,address,state,num_orders
Joseph Ponce,93874 Esparza Mountain,KS,19
Andrew Fischer,7764 Brown Divide,ME,18
Sabrina Foster,5075 Mullins Drive Apt. 298,MD,17
George Davis MD,439 Chan Route,IL,16
Benjamin Brown,598 Moore Ports,TN,16
Edgar Perry,333 Jenna Bridge,AL,15
Eric Erickson,7751 Clark Lane,VA,15
Emily Fritz,918 Renee Lights,AL,15
Johnathan Charles,22678 Hartman Mission,HI,15
Beth Rivera,7813 Ingram Junction Apt. 318,AK,15


4. **Get all orders by customer 1026. Include the amount spent in each order, the order id, and the total number of distinct products purchased.**

In [5]:
%%sql
WITH 
order_contents AS (
    SELECT 
        order_id,
        product.product_id,
        price,
        qty * price AS subtotal
    FROM
            order_product 
        JOIN
            product
        ON
            order_product.product_id = product.product_id
    WHERE
        order_id IN (
            SELECT order_id 
            FROM customer_order
            WHERE customer_id = 1026
        )
)
SELECT 
    order_id
    ,count(product_id) num_products
    ,sum(subtotal) total
FROM
    order_contents
GROUP BY 
    order_id
ORDER BY
    order_id
;

 * postgresql://localhost/ex_orders_random
7 rows affected.


order_id,num_products,total
59,5,1086.0
274,4,912.0
387,1,190.0
622,2,1148.0
844,1,870.0
1795,2,317.0
1992,1,285.0


5. **Get the 10 customers that have spent the most. Give the customer_id and amount spent**

In [6]:
%%sql
SELECT 
    co.customer_id
    ,sum(qty * price) as total
FROM
        customer_order co
    JOIN
        order_product op
    ON
        co.order_id = op.order_id
    JOIN
        product p
    ON
        p.product_id = op.product_id
    
GROUP BY
    co.customer_id
ORDER BY 
    sum(qty * price) DESC
LIMIT 10

 * postgresql://localhost/ex_orders_random
10 rows affected.


customer_id,total
1087,22632.0
1178,21972.0
1013,20568.0
1139,19881.0
1153,19791.0
1106,19182.0
1140,18979.0
1042,18091.0
1190,17990.0
1029,17958.0


6. **Repeat the previous question, but include the customer's name, address, and state, in addition to the customer id and total amount spent**

In [7]:
%%sql
WITH previous_result AS ( 
    SELECT 
        co.customer_id
        ,sum(qty * price) as total
    FROM
            customer_order co
        JOIN
            order_product op
        ON
            co.order_id = op.order_id
        JOIN
            product p
        ON
            p.product_id = op.product_id
    
    GROUP BY
        co.customer_id
    ORDER BY 
        sum(qty * price) DESC
    LIMIT 10
)
SELECT 
    pr.*
    ,name
    ,address
    ,state
FROM
        previous_result pr
    LEFT JOIN
        customer c
    ON
        pr.customer_id = c.customer_id
ORDER BY 
    total DESC

 * postgresql://localhost/ex_orders_random
10 rows affected.


customer_id,total,name,address,state
1087,22632.0,Allison Hoffman,55218 Lam Key,KY
1178,21972.0,Jacqueline Frazier,85471 Davis Viaduct Suite 294,AK
1013,20568.0,Timothy Robertson,72067 Bridget Loaf Apt. 580,PA
1139,19881.0,Joseph Ponce,93874 Esparza Mountain,KS
1153,19791.0,Johnathan Charles,22678 Hartman Mission,HI
1106,19182.0,Andrew Fischer,7764 Brown Divide,ME
1140,18979.0,Jennifer Blake,9201 Andrea Courts Apt. 332,MI
1042,18091.0,Jessica Burke,68160 Amanda Pike,NM
1190,17990.0,Michelle Austin,856 Mills Lakes,MI
1029,17958.0,Jordan Rose,0537 Joel Ferry,MT


7. **Find the 10 customers that spent the most in 2017. Give the name and amount spent. Take the date to be the order date (not the delivery date)**

In [8]:
%%sql
WITH
order_amts AS (
    SELECT
        co.order_id 
        ,co.customer_id
        ,op.product_id
        ,op.qty * p.price as subtotal
    FROM
            customer_order co
        JOIN
            order_product op
        ON
            co.order_id = op.order_id
        JOIN
            product p
        ON
            op.product_id = p.product_id
    WHERE
        DATE_PART('year', co.date_ordered)::INT = 2017
)
SELECT
    a.customer_id
    ,c.name
    ,sum(subtotal) as total
FROM
        order_amts a
    JOIN
        customer c
    ON
        a.customer_id = c.customer_id
GROUP BY 
    a.customer_id, c.name
ORDER BY
    sum(subtotal) DESC
LIMIT
    10
;

 * postgresql://localhost/ex_orders_random
10 rows affected.


customer_id,name,total
1120,Sabrina Foster,14986.0
1115,Emily Nelson,13480.0
1014,Timothy Marks,13266.0
1087,Allison Hoffman,11928.0
1181,Jeanne Casey,11789.0
1143,Dana Kline,11312.0
1103,Kristen Davies,11125.0
1106,Andrew Fischer,10659.0
1135,Emily Fritz,10628.0
1139,Joseph Ponce,10439.0


8. **Which three products have we sold the most of? i.e. the greatest number of units?**

In [9]:
%%sql
SELECT
    o.product_id
    ,p.product_name
    ,SUM(qty) units_sold
FROM
        product p
    JOIN
        order_product o
    ON
        p.product_id = o.product_id
GROUP BY 
    o.product_id, p.product_name
ORDER BY 
    sum(qty) DESC
LIMIT 
    3
;

 * postgresql://localhost/ex_orders_random
3 rows affected.


product_id,product_name,units_sold
8020,Ergonomic Concrete Bike,344
8070,Handmade Metal Sausages,315
8009,Generic Metal Hat,311


9. **What is the average number of days between order and delivery?**

In [10]:
%%sql
SELECT 
    AVG(date_delivered - date_ordered) AS avg_deliv_time
FROM
    customer_order

 * postgresql://localhost/ex_orders_random
1 rows affected.


avg_deliv_time
"5 days, 21:48:14.400000"


10. **What is the average number of days between order and delivery for each year? Take the year from the order date.**

In [11]:
%%sql
SELECT 
    DATE_PART('year', date_ordered)::int order_year
    ,AVG(date_delivered - date_ordered) 
FROM
    customer_order
GROUP BY 
    DATE_PART('year', date_ordered)::int 
ORDER BY 
    DATE_PART('year', date_ordered)::int  ASC

 * postgresql://localhost/ex_orders_random
3 rows affected.


order_year,avg
2016,"5 days, 19:12:00"
2017,"5 days, 21:05:17.025440"
2018,"5 days, 23:15:26.732673"
