## Exercises - Basic SQL Queries

Here are some of the exercises for which you can write SQL queries to self evaluate.

Let us start spark context for this Notebook so that we can execute the code provided. You can sign up for our [10 node state of the art cluster/labs](https://labs.itversity.com/plans) to learn Spark SQL using our unique integrated LMS.

In [1]:
val username = System.getProperty("user.name")

username = itv001477


itv001477

In [2]:
import org.apache.spark.sql.SparkSession

val username = System.getProperty("user.name")
val spark = SparkSession.
    builder.
    config("spark.ui.port", "0").
    config("spark.sql.warehouse.dir", s"/user/${username}/warehouse").
    enableHiveSupport.
    appName(s"${username} | Spark SQL - Basic Transformations").
    master("yarn").
    getOrCreate

username = itv001477
spark = org.apache.spark.sql.SparkSession@55cc6f38


org.apache.spark.sql.SparkSession@55cc6f38

If you are going to use CLIs, you can use Spark SQL using one of the 3 approaches.

**Using Spark SQL**

```
spark2-sql \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
```

**Using Scala**

```
spark2-shell \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
```

**Using Pyspark**

```
pyspark2 \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
```

In [None]:
%%sql
CREATE DATABASE itv001477_retail_db

In [3]:
%%sql
USE itv001477_retail_db

Waiting for a Spark session to start...

++
||
++
++



In [None]:
%%sql
CREATE TABLE orders (
    order_id INT,
    order_date STRING,
    order_customer_id INT,
    order_status STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

In [None]:
%%sql
LOAD DATA LOCAL INPATH '/data/retail_db/orders' INTO TABLE orders

In [None]:
%%sql
select * from orders

In [None]:
%%sql
CREATE TABLE customers(
 customer_id int, 
 customer_fname STRING, 
 customer_lname STRING, 
 customer_email STRING, 
 customer_password STRING, 
 customer_street STRING, 
 customer_city STRING,
 customer_state STRING,
 customer_zipcode STRING 
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

In [None]:
%%sql
LOAD DATA LOCAL INPATH '/data/retail_db/customers' INTO TABLE customers

In [None]:
%%sql
select * from customers

In [None]:
%%sql
CREATE TABLE order_items(
 order_item_id int,
 order_item_order_id int, 
 order_item_product_id int,
 order_item_quantity int,
 order_item_subtotal double, 
 order_item_product_price double
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

In [None]:
%%sql
LOAD DATA LOCAL INPATH '/data/retail_db/order_items' INTO TABLE order_items

In [None]:
%%sql
select * from order_items

In [None]:
%%sql
CREATE TABLE products(
    product_id int,
 product_category_id int, 
 product_name STRING,
 product_description STRING,
 product_price double, 
 product_image STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

In [None]:
%%sql
LOAD DATA LOCAL INPATH '/data/retail_db/products' INTO TABLE products

In [None]:
%%sql
select * from products

In [None]:
%%sql
CREATE TABLE categories(
    category_id int,
 category_department_id int, 
 category_name STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

In [None]:
%%sql
LOAD DATA LOCAL INPATH '/data/retail_db/categories' INTO TABLE categories 

In [None]:
%%sql
CREATE TABLE departments(
     department_id int,
 department_name string
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

In [None]:
%%sql
LOAD DATA LOCAL INPATH '/data/retail_db/departments' INTO TABLE departments

### Exercise 1 - Customer order count

Get order count per customer for the month of 2014 January.
* Tables - orders and customers
* Data should be sorted in descending order by count and ascending order by customer id.
* Output should contain customer_id, customer_first_name, customer_last_name and customer_order_count.

In [5]:
%%sql
SELECT c.customer_id,c.customer_fname,c.customer_lname,count(o.order_id) as customer_order_count from orders o
JOIN customers c on c.customer_id=o.order_customer_id
WHERE o.order_date LIKE '2014-01%'
GROUP BY c.customer_id,c.customer_fname,c.customer_lname
ORDER BY c.customer_id,customer_order_count desc


| ...


+-----------+--------------+--------------+--------------------+
|customer_id|customer_fname|customer_lname|customer_order_count|
+-----------+--------------+--------------+--------------------+
|          7|       Melissa|        Wilcox|                   4|
|          8|         Megan|         Smith|                   2|
|         13|          Mary|       Baldwin|                   1|
|         14|     Katherine|         Smith|                   1|
|         15|          Jane|          Luna|                   1|
|         17|          Mary|      Robinson|                   1|
|         18|        Robert|         Smith|                   2|
|         24|          Mary|         Smith|                   2|
|         26|        Johnny|          Hood|                   2|
|         27|          Mary|       Vincent|                   1|
+-----------+--------------+--------------+--------------------+
only showing top 10 rows



### Exercise 2 - Dormant Customers

Get the customer details who have not placed any order for the month of 2014 January.
* Tables - orders and customers
* Data should be sorted in ascending order by customer_id
* Output should contain all the fields from customers

In [4]:
%%sql

SELECT c.* from customers c
LEFT OUTER JOIN orders o on c.customer_id= o.order_customer_id
WHERE o.order_date LIKE '2014-01%' AND o.order_id IS NULL
ORDER BY c.customer_id 

+-----------+--------------+--------------+--------------+-----------------+---------------+-------------+--------------+----------------+
|customer_id|customer_fname|customer_lname|customer_email|customer_password|customer_street|customer_city|customer_state|customer_zipcode|
+-----------+--------------+--------------+--------------+-----------------+---------------+-------------+--------------+----------------+
+-----------+--------------+--------------+--------------+-----------------+---------------+-------------+--------------+----------------+



### Exercise 3 - Revenue Per Customer

Get the revenue generated by each customer for the month of 2014 January
* Tables - orders, order_items and customers
* Data should be sorted in descending order by revenue and then ascending order by customer_id
* Output should contain customer_id, customer_first_name, customer_last_name, customer_revenue.
* If there are no orders placed by customer, then the corresponding revenue for a give customer should be 0.
* Consider only COMPLETE and CLOSED orders

In [6]:
%%sql

SELECT c.customer_id,c.customer_fname,c.customer_lname,COALESCE(round(sum(oi.order_item_subtotal),2),0) as customer_revenue
FROM customers c
LEFT OUTER JOIN orders o on o.order_customer_id=c.customer_id
JOIN order_items oi on oi.order_item_order_id=o.order_id
WHERE o.order_date LIKE '2014-01%' AND o.order_status IN ('COMPLETE','CLOSED')
GROUP BY c.customer_id,c.customer_fname,c.customer_lname
ORDER BY c.customer_id,customer_revenue desc

|         43|          Mary|       Herring|   ...


+-----------+--------------+--------------+----------------+
|customer_id|customer_fname|customer_lname|customer_revenue|
+-----------+--------------+--------------+----------------+
|          8|         Megan|         Smith|          353.93|
|         14|     Katherine|         Smith|          704.93|
|         17|          Mary|      Robinson|          569.95|
|         18|        Robert|         Smith|         1309.85|
|         26|        Johnny|          Hood|          699.96|
|         28|       Timothy|         Smith|           59.99|
|         38|          Mary|         Smith|         1209.83|
|         42|         Ethan|         Smith|          559.94|
|         43|          Mary|       Herring|          119.98|
|         51|       Jessica|         Smith|           59.99|
+-----------+--------------+--------------+----------------+
only showing top 10 rows



### Exercise 4 - Revenue Per Category

Get the revenue generated for each category for the month of 2014 January
* Tables - orders, order_items, products and categories
* Data should be sorted in ascending order by category_id.
* Output should contain all the fields from category along with the revenue as category_revenue.
* Consider only COMPLETE and CLOSED orders

In [7]:
%%sql
SELECT c.*,round(sum(oi.order_item_subtotal),2) as category_revenue from categories c
JOIN products p on p.product_category_id=c.category_id
JOIN order_items oi on oi.order_item_product_id=p.product_id
JOIN orders o on o.order_id=oi.order_item_order_id
WHERE order_date LIKE '2014-01%' AND o.order_status IN ('COMPLETE','CLOSED')
GROUP BY c.category_id,c.category_department_id,c.category_name
ORDER BY c.category_id

|          9|                     3|   Cardio Equip...


+-----------+----------------------+-------------------+----------------+
|category_id|category_department_id|      category_name|category_revenue|
+-----------+----------------------+-------------------+----------------+
|          2|                     2|             Soccer|         1094.88|
|          3|                     2|Baseball & Softball|         3214.41|
|          4|                     2|         Basketball|         1299.98|
|          5|                     2|           Lacrosse|         1299.69|
|          6|                     2|   Tennis & Racquet|         1124.75|
|          7|                     2|             Hockey|          1433.0|
|          9|                     3|   Cardio Equipment|       133156.77|
|         10|                     3|  Strength Training|         3388.96|
|         11|                     3|Fitness Accessories|         1509.73|
|         12|                     3|       Boxing & MMA|         3998.46|
+-----------+----------------------+--

### Exercise 5 - Product Count Per Department

Get the products for each department.
* Tables - departments, categories, products
* Data should be sorted in ascending order by department_id
* Output should contain all the fields from department and the product count as product_count

In [8]:
%%sql
SELECT d.*,count(p.product_id) AS product_count FROM departments d
JOIN categories c on c.category_department_id=d.department_id
JOIN products p on p.product_category_id=c.category_id
GROUP BY d.department_id,d.department_name
ORDER BY d.department_id

+-------------+---------------+-------------+
|department_id|department_name|product_count|
+-------------+---------------+-------------+
|            2|        Fitness|          168|
|            3|       Footwear|          168|
|            4|        Apparel|          140|
|            5|           Golf|          120|
|            6|       Outdoors|          336|
|            7|       Fan Shop|          149|
+-------------+---------------+-------------+

