# 3. Median Sql questions

In this section we will use some easy sql questions to get familiar with sql syntaxe

## Configure sql connection
Make sure your database server is up and running


In [1]:
%load_ext sql
%config SqlMagic.autocommit=False
%config SqlMagic.autolimit=20
%config SqlMagic.displaylimit=20
%sql postgresql://pliu:northwind@127.0.0.1:5432/northwind


## 3.1 Question 1 Categories, and the total products in each category

For this problem, we’d like to see the total number of products in each category. Sort the results by the total
number of products, in descending order.

Your result rows should look like:

```text
 category_name  | total_products 
----------------+----------------
 Confections    |             13
 Condiments     |             12
 Beverages      |             12
 Seafood        |             12
 Dairy Products |             10
 Grains/Cereals |              7
 Meat/Poultry   |              6
 Produce        |              5
```

### Hint

To solve this problem, you need to combine a join, and a group by.

The simplest way to start is by creating a query that shows the category_name and all product_ids associated with it, without grouping. Then, add the Group by

But with this way, the joined table is much bigger. To make the query more efficient, you can group by the product by using category_id, then join the result with categories table.

In [6]:
%%sql

select ca.category_name, sc.total_products
from categories ca
inner join (select category_id, count(*) as total_products from products group by category_id) sc
on ca.category_id=sc.category_id
order by sc.total_products desc

 * postgresql://pliu:***@127.0.0.1:5432/northwind
8 rows affected.


category_name,total_products
Confections,13
Condiments,12
Beverages,12
Seafood,12
Dairy Products,10
Grains/Cereals,7
Meat/Poultry,6
Produce,5


## 3.2 Question 2  Total customers per country/city

In the Customers table, show the total number of customers per Country and City

Your result rows should look like:

```text
   country   |      city       | total_customers 
-------------+-----------------+-----------------
 UK          | London          |               6
 Mexico      | México D.F.     |               5
 Brazil      | Sao Paulo       |               4
 Brazil      | Rio de Janeiro  |               3
 Argentina   | Buenos Aires    |               3
 Spain       | Madrid          |               3
...

```

### Hint

Just as you can have multiple fields in a Select clause, you can also have multiple fields in a Group By clause

In [10]:
%%sql

select country, city, count(*) as total_customers 
from customers 
group by country, city 
order by total_customers desc;

 * postgresql://pliu:***@127.0.0.1:5432/northwind
69 rows affected.


country,city,total_customers
UK,London,6
Mexico,México D.F.,5
Brazil,Sao Paulo,4
Brazil,Rio de Janeiro,3
Argentina,Buenos Aires,3
Spain,Madrid,3
France,Paris,2
France,Nantes,2
Portugal,Lisboa,2
USA,Portland,2


## 3.3 Question 3 Products that need reordering

What products do we have in our inventory that should be reordered? For now, just use the fields **units_in_stock and reorder_level, where units_in_stock is less than the reorder_level, ignoring the fields units_on_order and
discontinued**. Order the results by product_id.


Your result rows should look like:

```text
  product_id |       product_name        | units_in_stock | reorder_level 
------------+---------------------------+----------------+---------------
          2 | Chang                     |             17 |            25
          3 | Aniseed Syrup             |             13 |            25
         11 | Queso Cabrales            |             22 |            30
         21 | Sir Rodney's Scones       |              3 |             5
         30 | Nord-Ost Matjeshering     |             10 |            15
         31 | Gorgonzola Telino         |              0 |            20
         32 | Mascarpone Fabioli        |              9 |            25
         37 | Gravad lax                |             11 |            25

```

### Hint

We want to show all fields where the units_in_stock is less than reorder_level. So in the Where clause, use the
following: units_in_stock < reorder_level


In [11]:
%%sql

select product_id, product_name, units_in_stock, reorder_level 
from products where units_in_stock < reorder_level
order by product_id;

 * postgresql://pliu:***@127.0.0.1:5432/northwind
18 rows affected.


product_id,product_name,units_in_stock,reorder_level
2,Chang,17,25
3,Aniseed Syrup,13,25
11,Queso Cabrales,22,30
21,Sir Rodney's Scones,3,5
30,Nord-Ost Matjeshering,10,15
31,Gorgonzola Telino,0,20
32,Mascarpone Fabioli,9,25
37,Gravad lax,11,25
43,Ipoh Coffee,17,25
45,Rogede sild,5,15


## 3.4 Question 4 Products that need reordering, continued

Now we need to incorporate these fields (product_id, product_name, units_in_stock, units_on_order, reorder_level, discontinued) into our calculation. 
We’ll define “products that need reordering” with the following:
- units_in_stock plus units_on_order are less than or equal to reorder_level
- The discontinued flag is false (i.e. 0)

Your result rows should look like:

```text
 product_id |     product_name      | units_in_stock | units_on_order | reorder_level | discontinued 
------------+-----------------------+----------------+----------------+---------------+--------------
         30 | Nord-Ost Matjeshering |             10 |              0 |            15 |            0
         70 | Outback Lager         |             15 |             10 |            30 |            0

```

### Hint

For the first part of the Where clause, you should have something like this:

(units_in_stock+units_on_order) <= reorder_level

In [12]:
%%sql

select product_id, product_name, units_in_stock, units_on_order, reorder_level, discontinued
from products 
where (units_in_stock+units_on_order) <= reorder_level and discontinued=0
order by product_id;

 * postgresql://pliu:***@127.0.0.1:5432/northwind
2 rows affected.


product_id,product_name,units_in_stock,units_on_order,reorder_level,discontinued
30,Nord-Ost Matjeshering,10,0,15,0
70,Outback Lager,15,10,30,0


## 3.5 Question 5 Customer list by region

A salesperson for Northwind is going on a business trip to visit customers, and would like to see a list of all
customers, sorted by region, alphabetically.

However, he wants the customers with no region (null in the Region field) to be at the end, instead of at the top,
where you’d normally find the null values. Within the same region, companies should be sorted by CustomerID

Your result rows should look like:

```text
  customer_id |             company_name             |    region     
-------------+--------------------------------------+---------------
 OLDWO       | Old World Delicatessen               | AK
 BOTTM       | Bottom-Dollar Markets                | BC
 LAUGB       | Laughing Bacchus Wine Cellars        | BC
 LETSS       | Let's Stop N Shop                    | CA
 HUNGO       | Hungry Owl All-Night Grocers         | Co. Cork
 GROSR       | GROSELLA-Restaurante                 | DF
 SAVEA       | Save-a-lot Markets                   | ID
 ......
 ANATR       | Ana Trujillo Emparedados y helados   | 
 ANTON       | Antonio Moreno Taquería              | 
 AROUT       | Around the Horn                      | 


```

### Hint

Different database server treat null value differently. In postgresql, the null values are putted at the buttom in a ascending order, and on top in a descending order. 

But in other database server, it may behave differently. To make sure you will always have the right answer. You can create a tmporal column with **case ... end** statement. if region is null then the temporal column has 1, else has 0. With an ascending order, 1 (region is null) will be at the buttom 

In [22]:
%%sql

select customer_id, company_name, region 
from customers
order by 
case
when region is null then 1
else 0
end,
region, customer_id desc;

 * postgresql://pliu:***@127.0.0.1:5432/northwind
91 rows affected.


customer_id,company_name,region
OLDWO,Old World Delicatessen,AK
LAUGB,Laughing Bacchus Wine Cellars,BC
BOTTM,Bottom-Dollar Markets,BC
LETSS,Let's Stop N Shop,CA
HUNGO,Hungry Owl All-Night Grocers,Co. Cork
GROSR,GROSELLA-Restaurante,DF
SAVEA,Save-a-lot Markets,ID
ISLAT,Island Trading,Isle of Wight
LILAS,LILA-Supermercado,Lara
THECR,The Cracker Box,MT


## 3.6 Question 6 High freight charges

Some of the countries we ship to have very high freight charges. We'd like to investigate some more shipping
options for our customers, to be able to offer them lower freight charges. Return the three ship countries with the
highest average freight overall, in descending order by average freight.

Your result rows should look like:

```text
 ship_country | average_freight  
--------------+------------------
 Austria      | 184.787500572205
 Ireland      | 145.012628956845
 USA          | 112.879426603557

```

### Hint

We'll be using the **orders table**, and using the freight and ship_country columns. Use the Avg function on freight after grouping orders by country. Don't worry about showing only the top 3 rows until you have the grouping and average freight set up.

In [23]:
%%sql

select ship_country, avg(freight) as average_freight
from orders
group by ship_country
order by average_freight desc
limit 3;

 * postgresql://pliu:***@127.0.0.1:5432/northwind
3 rows affected.


ship_country,average_freight
Austria,184.787500572205
Ireland,145.012628956845
USA,112.879426603557


## 3.7 Question 7 High freight charges - 1997

We're continuing on the question above on high freight charges. Now, instead of using all the orders we have, we
only want to see orders from the year 1997.

Your result rows should look like:

```text
  ship_country | average_freight  
--------------+------------------
 Austria      |  178.36428569612
 Switzerland  | 117.177499771118
 Sweden       | 105.159999398624

```

### Hint

We can reuse the previous query, we only need to filter all orders that are from 1997.  

If you check the schema of the table **orders**, you could see the order_date has type **date**. Normally, database server provides function to extract year, month, day, etc from the date. But different database server has different syntaxe. For example, in **SQL Server (starting with 2008), Azure SQL Database, Azure SQL Data Warehouse, Parallel Data Warehouse**, you can use year(order_date) to get year. In **Postgresql**, you can use extract(year from order_date) to get year.

```text
\d orders
                Table "public.orders"
      Column      |         Type          | Modifiers 
------------------+-----------------------+-----------
 order_id         | smallint              | not null
 customer_id      | bpchar                | 
 employee_id      | smallint              | 
 order_date       | date                  | 
 required_date    | date                  | 
 shipped_date     | date                  | 
 ship_via         | smallint              | 
 freight          | real                  | 
 ship_name        | character varying(40) | 
 ship_address     | character varying(60) | 
 ship_city        | character varying(15) | 
 ship_region      | character varying(15) | 
 ship_postal_code | character varying(10) | 
 ship_country     | character varying(15) | 

```


In [30]:
%%sql

select ship_country, avg(freight) as average_freight
from orders
where extract(year from order_date)=1997
group by ship_country
order by average_freight desc
limit 3;

 * postgresql://pliu:***@127.0.0.1:5432/northwind
3 rows affected.


ship_country,average_freight
Ireland,339.42248916626
Austria,217.341821323742
USA,153.322307962638


## 3.8 Question 8 High freight charges - last year

We're continuing to work on high freight charges. We now want to get the three ship countries with the highest
average freight charges. But instead of filtering for a particular year, we want to use the last 12 months of
order data, since the last order_date in orders.


Your result rows should look like:

```text
ship_country | average_freight  
--------------+------------------
 Ireland      | 200.209995269775
 Austria      | 186.459601020813
 USA          | 117.970460366929
```

### Hint

The hard part of this question is to detemine the time interval of last 12 months. First we need to get the order_date of the most recent order. Then we use this date as baseline to go back 12 months.

In **SQL Server, Sybase or MySQL**, we can use a function called **Dateadd** to calculate the time interval. For example, 
```sql
-- below will return the day after the current date.
SELECT DATEADD(day, 1, GETDATE());

-- below will return the day before the current date.
SELECT DATEADD(day, -1, GETDATE());

-- in our case, we will get the day at 1 year before the most recent order
Dateadd(yy, -1, (select max(order_date) from orders))
```

In **postgreql**, we don't have DATEADD function, we need to do the **arithmetic operation with interval literals** to get the same results

```sql
-- below will return the day after the current date.
SELECT CURRENT_DATE + INTERVAL '1 day';
 
-- below will return the day before the current date.
SELECT CURRENT_DATE + INTERVAL '1 day';

-- in our case, we will get the day at 1 year before the most recent order
-- 1 year or 12 month both works
(select max(order_date) from orders) - INTERVAL '1 year'
(select max(order_date) from orders) - INTERVAL '12 month'
```


In [33]:
%%sql

select ship_country, avg(freight) as average_freight
from orders
where order_date>=(select max(order_date) from orders) - INTERVAL '12 month'
group by ship_country
order by average_freight desc
limit 3;

 * postgresql://pliu:***@127.0.0.1:5432/northwind
3 rows affected.


ship_country,average_freight
Ireland,200.209995269775
Austria,186.459601020813
USA,117.970460366929


## 3.9 Question 9 Inventory list

We're doing inventory, and need to show information like employee_id, last_name, order_id, product_name, quantity, for all orders. Sort by order_id and product_id.

Your result rows should look like:

```text
 employee_id | last_name | order_id |           product_name           | quantity 
-------------+-----------+----------+----------------------------------+----------
           5 | Buchanan  |    10248 | Queso Cabrales                   |       12
           5 | Buchanan  |    10248 | Singaporean Hokkien Fried Mee    |       10
           5 | Buchanan  |    10248 | Mozzarella di Giovanni           |        5
           6 | Suyama    |    10249 | Tofu                             |        9
           6 | Suyama    |    10249 | Manjimup Dried Apples            |       40
           4 | Peacock   |    10250 | Jack's New England Clam Chowder  |       10

```

### Hint

You'll need to do a **join between 4 tables (e.g. orders, employees, order_details, products)**, displaying only those fields that are necessary.

In [37]:
%%sql

select o.employee_id, e.last_name, o.order_id, p.product_name, od.quantity
from orders o
inner join employees e
on o.employee_id=e.employee_id
inner join order_details od
on o.order_id=od.order_id
inner join products p
on od.product_id=p.product_id
order by o.order_id, p.product_id;


 * postgresql://pliu:***@127.0.0.1:5432/northwind
2155 rows affected.


employee_id,last_name,order_id,product_name,quantity
5,Buchanan,10248,Queso Cabrales,12
5,Buchanan,10248,Singaporean Hokkien Fried Mee,10
5,Buchanan,10248,Mozzarella di Giovanni,5
6,Suyama,10249,Tofu,9
6,Suyama,10249,Manjimup Dried Apples,40
4,Peacock,10250,Jack's New England Clam Chowder,10
4,Peacock,10250,Manjimup Dried Apples,35
4,Peacock,10250,Louisiana Fiery Hot Pepper Sauce,15
3,Leverling,10251,Gustaf's Knäckebröd,6
3,Leverling,10251,Ravioli Angelo,15


## 3.10 Question 10 Customers with no orders

There are some customers who have never actually placed an order. Show these customers.

Your result rows should look like:

```text
  customers_cid |             company_name             | orders_cid 
---------------+--------------------------------------+------------
 PARIS         | Paris spécialités                    | 
 FISSA         | FISSA Fabrica Inter. Salchichas S.A. | 

```

### Hint

One way of doing this is to use a left join, also known as a left outer join. For the customer that are never placed an order, the column from orders table will be null.

In [41]:
%%sql
select c.customer_id as customers_cid, c.company_name, o.customer_id as orders_cid
from customers c
left join orders o
on c.customer_id=o.customer_id
where o.customer_id is null;


 * postgresql://pliu:***@127.0.0.1:5432/northwind
2 rows affected.


customer_id,company_name
PARIS,Paris spécialités
FISSA,FISSA Fabrica Inter. Salchichas S.A.


## 3.11 Question 11 Customers with no orders for EmployeeID 4

One employee (Margaret Peacock, EmployeeID 4) has placed the most orders. However, there are some
customers who've never placed an order with her. 
Show only those customers who have never placed an order with her.



Your result rows should look like, and you have 16 rows in total

```text
customers_cid |             company_name             | orders_cid 
---------------+--------------------------------------+------------
 CONSH         | Consolidated Holdings                | 
 DUMON         | Du monde entier                      | 
 FISSA         | FISSA Fabrica Inter. Salchichas S.A. | 
 FRANR         | France restauration                  | 
 GROSR         | GROSELLA-Restaurante                 | 
 LAUGB         | Laughing Bacchus Wine Cellars        | 
 LAZYK         | Lazy K Kountry Store                 | 
 NORTS         | North/South                          | 
 PARIS         | Paris spécialités                    | 
 PERIC         | Pericles Comidas clásicas            | 
 PRINI         | Princesa Isabel Vinhos               | 
 SANTG         | Santé Gourmet                        | 
 SEVES         | Seven Seas Imports                   | 
 SPECD         | Spécialités du monde                 | 
 THEBI         | The Big Cheese                       | 
 VINET         | Vins et alcools Chevalier            | 


```

### Hint

There are at least two good solutions:
Solution1: Use previous query, filter the orders table, only keep the rows that employee_id = 4
Solution2: change the left join condition, add another condition employee_id = 4

```sql 

select c.customer_id as customers_cid, c.company_name, o.customer_id as orders_cid
from customers c
left join (select * from orders where employee_id=4 ) o
on c.customer_id=o.customer_id
where o.customer_id is null
order by o.customer_id;
```

In [45]:
%%sql
select c.customer_id as customers_cid, c.company_name, o.customer_id as orders_cid
from customers c
left join orders o
on c.customer_id=o.customer_id and o.employee_id=4
where o.customer_id is null
order by o.customer_id
;

 * postgresql://pliu:***@127.0.0.1:5432/northwind
16 rows affected.


customers_cid,company_name,orders_cid
CONSH,Consolidated Holdings,
DUMON,Du monde entier,
FISSA,FISSA Fabrica Inter. Salchichas S.A.,
FRANR,France restauration,
GROSR,GROSELLA-Restaurante,
LAUGB,Laughing Bacchus Wine Cellars,
LAZYK,Lazy K Kountry Store,
NORTS,North/South,
PARIS,Paris spécialités,
PERIC,Pericles Comidas clásicas,
