# 4. Hard Sql questions

In this section we will use some hard sql questions to improve sql query skills

## Configure sql connection
Make sure your database server is up and running


In [3]:
%load_ext sql
%config SqlMagic.autocommit=False
%config SqlMagic.autolimit=20
%config SqlMagic.displaylimit=20
%sql postgresql://pliu:northwind@127.0.0.1:5432/northwind

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


## 4.1 Question 1 High-value customers

We want to send all of our high-value customers a special VIP gift. We're defining high-value customers as those
who've made at least 1 order with a total value (not including the discount) equal to $10,000 or more. We
only want to consider orders made in the year 1997


Your result rows should look like:

```text
 customer_id |        company_name        | order_id | total_order_amount 
-------------+----------------------------+----------+--------------------
 QUICK       | QUICK-Stop                 |    10691 |           10164.80
 QUICK       | QUICK-Stop                 |    10540 |           10191.70
 RATTC       | Rattlesnake Canyon Grocery |    10479 |           10495.60
 QUICK       | QUICK-Stop                 |    10515 |           10588.50
 SIMOB       | Simons bistro              |    10417 |           11283.20
 MEREP       | Mère Paillarde             |    10424 |           11493.20

```

### Hint

First, let's get the necessary fields for all orders made in the year 1997. Don't bother grouping yet, just work on
the Where clause. You'll need the **customer_id, company_name from customers; order_id from orders; and quantity and unit price from order_details**. Order by the total amount of the order, in descending order.



In [24]:
%%sql

select o.customer_id, c.company_name, o.order_id, 
round(cast(sum(od.unit_price*od.quantity) as numeric),2) as total_order_amount
from orders o
inner join order_details od
on o.order_id=od.order_id
inner join customers c
on o.customer_id=c.customer_id
where (extract(year from o.order_date)=1997)
group by o.customer_id, c.company_name, o.order_id
having sum(unit_price*quantity) > 10000
order by total_order_amount;

 * postgresql://pliu:***@127.0.0.1:5432/northwind
6 rows affected.


customer_id,company_name,order_id,total_order_amount
QUICK,QUICK-Stop,10691,10164.8
QUICK,QUICK-Stop,10540,10191.7
RATTC,Rattlesnake Canyon Grocery,10479,10495.6
QUICK,QUICK-Stop,10515,10588.5
SIMOB,Simons bistro,10417,11283.2
MEREP,Mère Paillarde,10424,11493.2


## 4.2 Question 2 High-value customers - total orders

The manager has changed his mind. Instead of requiring that customers have at least one individual orders totaling
$10,000 or more, he wants to define high-value customers as those who have orders totaling $15,000 or more in 1997. 

How would you change the answer to the problem above? Sort the result by total_order_amount in descending order.


Your result rows should look like:

```text
 customer_id |         company_name         | total_order_amount 
-------------+------------------------------+--------------------
 QUICK       | QUICK-Stop                   |           64238.00
 SAVEA       | Save-a-lot Markets           |           60672.64
 ERNSH       | Ernst Handel                 |           53467.38
 MEREP       | Mère Paillarde               |           26087.10
 HUNGO       | Hungry Owl All-Night Grocers |           23959.05
 RATTC       | Rattlesnake Canyon Grocery   |           19658.70
 SIMOB       | Simons bistro                |           17482.15


```

### Hint

This query is almost identical to the one above, but there's just a few lines you need to delete or comment
out, to group at a different level.

In [25]:
%%sql

select o.customer_id, c.company_name, 
round(cast(sum(od.unit_price*od.quantity) as numeric),2) as total_order_amount
from orders o
inner join order_details od
on o.order_id=od.order_id
inner join customers c
on o.customer_id=c.customer_id
where (extract(year from o.order_date)=1997)
group by o.customer_id, c.company_name
having sum(unit_price*quantity) > 15000
order by total_order_amount desc;

 * postgresql://pliu:***@127.0.0.1:5432/northwind
7 rows affected.


customer_id,company_name,total_order_amount
QUICK,QUICK-Stop,64238.0
SAVEA,Save-a-lot Markets,60672.64
ERNSH,Ernst Handel,53467.38
MEREP,Mère Paillarde,26087.1
HUNGO,Hungry Owl All-Night Grocers,23959.05
RATTC,Rattlesnake Canyon Grocery,19658.7
SIMOB,Simons bistro,17482.15


## 4.3 Question 3 High-value customers - with discount

Change the above query to use the discount when calculating high-value customers. Order by the total amount which includes the discount.


Your result rows should look like:

```text
 customer_id |       company_name         | total_order_amount_without_discount | total_order_amount_with_discount 
----------+------------------------------+-------------------------------------+----------------------------------
 QUICK       | QUICK-Stop                   |                        64238.00 |                    61109.91
 SAVEA       | Save-a-lot Markets           |                        60672.64 |                    57713.57
 ERNSH       | Ernst Handel                 |                        53467.38 |                    48096.26
 MEREP       | Mère Paillarde               |                        26087.10 |                    23332.31
 HUNGO       | Hungry Owl All-Night Grocers |                        23959.05 |                    20454.40
 RATTC       | Rattlesnake Canyon Grocery   |                        19658.70 |                    19383.75
 SIMOB       | Simons bistro                |                        17482.15 |                    16232.41


```

### Hint

To start out, just use the order_details table. You'll need to figure out how the **discount column** is structured.
Then include the discount in the total order amount calculation

In [29]:
%%sql


select o.customer_id, c.company_name,
round(cast(sum(od.unit_price*od.quantity) as numeric),2) as total_order_amount_without_discount,
round(cast(sum((od.unit_price*od.quantity)*(1-discount)) as numeric),2) as total_order_amount_with_discount
from orders o
inner join order_details od
on o.order_id=od.order_id
inner join customers c
on o.customer_id=c.customer_id
where (extract(year from o.order_date)=1997)
group by o.customer_id, c.company_name
having (sum((unit_price*quantity) * (1-discount))) > 15000
order by total_order_amount_with_discount desc;

 * postgresql://pliu:***@127.0.0.1:5432/northwind
7 rows affected.


customer_id,company_name,total_order_amount_without_discount,total_order_amount_with_discount
QUICK,QUICK-Stop,64238.0,61109.91
SAVEA,Save-a-lot Markets,60672.64,57713.57
ERNSH,Ernst Handel,53467.38,48096.26
MEREP,Mère Paillarde,26087.1,23332.31
HUNGO,Hungry Owl All-Night Grocers,23959.05,20454.4
RATTC,Rattlesnake Canyon Grocery,19658.7,19383.75
SIMOB,Simons bistro,17482.15,16232.41


## 4.4 Question 4 Month-end orders

At the end of the month, sales people are likely to try much harder to get orders, to meet their month-end quotas. Show all orders made on the last day of the month. Order by employee_id and order_id


Your result rows should look like:

```text
  employee_id | order_id | order_date 
-------------+----------+------------
           1 |    10461 | 1997-02-28
           1 |    10616 | 1997-07-31
           2 |    10583 | 1997-06-30
           2 |    10686 | 1997-09-30
           2 |    10989 | 1998-03-31
           2 |    11060 | 1998-04-30
           3 |    10432 | 1997-01-31
           3 |    10806 | 1997-12-31

```

### Hint

In some database server such as **MS SQL, Mysql, etc.**, you can use predefined function EOMONTH(date) to get the last day of the month which correponds the input date. But in **Postgresql**, we don't have such function. But we can define our own function. Below is an example on how to define a function in Postgresql. 

```sql
-- last_day function take a date as input, then return a new date 
-- which is the last day of the month for the input date

CREATE OR REPLACE FUNCTION last_day(date)
RETURNS date AS
$$
  SELECT (date_trunc('MONTH', $1) + INTERVAL '1 MONTH - 1 day')::date;
$$ LANGUAGE 'sql' IMMUTABLE STRICT;

```

Use the above function in your filter to get the orders.

In [31]:
%%sql
-- last_day function take a date as input, then return a new date 
-- which is the last day of the month for the input date

CREATE OR REPLACE FUNCTION last_day(date)
RETURNS date AS
$$
  SELECT (date_trunc('MONTH', $1) + INTERVAL '1 MONTH - 1 day')::date;
$$ LANGUAGE 'sql' IMMUTABLE STRICT;

 * postgresql://pliu:***@127.0.0.1:5432/northwind
Done.


[]

In [34]:
%%sql

select employee_id, order_id, order_date 
from orders
where order_date=last_day(order_date)
order by employee_id, order_id 

 * postgresql://pliu:***@127.0.0.1:5432/northwind
26 rows affected.


employee_id,order_id,order_date
1,10461,1997-02-28
1,10616,1997-07-31
2,10583,1997-06-30
2,10686,1997-09-30
2,10989,1998-03-31
2,11060,1998-04-30
3,10432,1997-01-31
3,10806,1997-12-31
3,10988,1998-03-31
3,11063,1998-04-30


## 4.5 Question 5 Orders with many line items

The Northwind mobile app developers are testing an app that customers will use to show orders. In order to make
sure that even the largest orders will show up correctly on the app, they'd like some samples of orders that have lots of individual line items. Show the 10 orders with the most line items, in order of total line items.


Your result rows should look like:

```text
  order_id | total_order_details 
----------+---------------------
    11077 |                  25
    10979 |                   6
    10657 |                   6
    10847 |                   6
    10360 |                   5
    10893 |                   5
    10553 |                   5
    10294 |                   5
    10514 |                   5
    11064 |                   5

```

### Hint

Use group by and aggregation function count.

In [38]:
%%sql

select order_id, count(order_id) as total_order_details
from order_details
group by order_id
order by total_order_details desc
limit 10;

 * postgresql://pliu:***@127.0.0.1:5432/northwind
10 rows affected.


order_id,total_order_details
11077,25
10979,6
10657,6
10847,6
10360,5
10893,5
10553,5
10294,5
10514,5
11064,5


## 4.6 Question 6 Orders - random assortment

The Northwind mobile app developers would now like to just get a random assortment of orders for beta testing on
their app. Show a random set of 2% of all orders


Your result rows should look like:

```text
 customer_id |        company_name        | order_id | total_order_amount 
-------------+----------------------------+----------+--------------------
 QUICK       | QUICK-Stop                 |    10691 |           10164.80
 QUICK       | QUICK-Stop                 |    10540 |           10191.70
 RATTC       | Rattlesnake Canyon Grocery |    10479 |           10495.60
 QUICK       | QUICK-Stop                 |    10515 |           10588.50
 SIMOB       | Simons bistro              |    10417 |           11283.20
 MEREP       | Mère Paillarde             |    10424 |           11493.20

```

### Hint

In [None]:
%%sql

## 4.7 Question 7 


Your result rows should look like:

```text
 customer_id |        company_name        | order_id | total_order_amount 
-------------+----------------------------+----------+--------------------
 QUICK       | QUICK-Stop                 |    10691 |           10164.80
 QUICK       | QUICK-Stop                 |    10540 |           10191.70
 RATTC       | Rattlesnake Canyon Grocery |    10479 |           10495.60
 QUICK       | QUICK-Stop                 |    10515 |           10588.50
 SIMOB       | Simons bistro              |    10417 |           11283.20
 MEREP       | Mère Paillarde             |    10424 |           11493.20

```

### Hint

In [None]:
%%sql

## 4.8 Question 8 


Your result rows should look like:

```text
 customer_id |        company_name        | order_id | total_order_amount 
-------------+----------------------------+----------+--------------------
 QUICK       | QUICK-Stop                 |    10691 |           10164.80
 QUICK       | QUICK-Stop                 |    10540 |           10191.70
 RATTC       | Rattlesnake Canyon Grocery |    10479 |           10495.60
 QUICK       | QUICK-Stop                 |    10515 |           10588.50
 SIMOB       | Simons bistro              |    10417 |           11283.20
 MEREP       | Mère Paillarde             |    10424 |           11493.20

```

### Hint

In [None]:
%%sql

## 4.9 Question 9 


Your result rows should look like:

```text
 customer_id |        company_name        | order_id | total_order_amount 
-------------+----------------------------+----------+--------------------
 QUICK       | QUICK-Stop                 |    10691 |           10164.80
 QUICK       | QUICK-Stop                 |    10540 |           10191.70
 RATTC       | Rattlesnake Canyon Grocery |    10479 |           10495.60
 QUICK       | QUICK-Stop                 |    10515 |           10588.50
 SIMOB       | Simons bistro              |    10417 |           11283.20
 MEREP       | Mère Paillarde             |    10424 |           11493.20

```

### Hint

In [None]:
%%sql