
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 600px">
</div>

## Create Tables
Run the cell below to create tables for the questions in this notebook. 

In [0]:
%run ../Utilities/05-CreateTables

## Question 1: Min Function
### Summary
Compute the minimum value from the **`Amount`** field for each unique value in the **`TrueFalse`** field in the table **`revenue1`**.

### Steps to complete
Write a SQL query that achieves the following: 
* Computes the number of **`true`** and **`false`** records in the **`TrueFalse`** field from the table **`revenue1`**
* Renames the new column to **`count`**
* Store the records in a temporary view named  **`q1Results`** with the following schema:

| column | type |
|--------|--------|
| TrueFalse | boolean |
| MinAmount | int |

A properly completed solution should produce a view similar to this sample output:

|TrueFalse|         count |
|---------|------------------|
|     true|        4956|
|    false|        5044|

In [0]:
%sql
CREATE 
OR REPLACE TEMPORARY VIEW q1Results AS
  SELECT count(*) AS count FROM revenue1
  GROUP BY TrueFalse;

In [0]:
%sql
SELECT
 * 
FROM
 q1Results;

count
4996
5004


## Question 2: Max Function
### Summary
Compute the maximum value from the **`Amount`** field for each unique value in the **`TrueFalse`** field in the table **`revenue2`**.

### Steps to complete
* Computes the maximum **`Amount`** for **`True`** records and **`False`** records from the **`TrueFalse`** field from the table **`revenue2`**
* Renames the new column to **`maxAmount`**
* Store the records in a temporary view named  **`q2Results`** with the following schema:
   
| column | type |
|--------|--------|
| TrueFalse | boolean |
| maxAmount | double |

A properly completed solution should produce a DataFrame similar to this sample output:

|TrueFalse|         MaxAmount|
|---------|------------------|
|     true|        2243937.93|
|    false|2559457.1799999997|

In [0]:
%sql
CREATE OR REPLACE TEMPORARY VIEW q2results AS
  SELECT TrueFalse, max(Amount) AS MaxAmount 
  FROM revenue2
  GROUP BY TrueFalse;

In [0]:
%sql
SELECT
 * 
FROM
 q2Results;

TrueFalse,MaxAmount
True,9996.03
False,9998.57


## Question 3: Avg Function
### Summary
Compute the average of the **`Amount`** field for each unique value in the **`TrueFalse`** field in the table **`revenue3`**.

### Steps to complete

* Computes the average of **`Amount`** for **`True`** records and **`False`** records from the **`TrueFalse`** field in the table **`revenue3`**.
* Renames the new column to **`avgAmount`**
* Store the records in a temporary view named  **`q3Results`** with the following schema:

| column | type |
|--------|--------|
| TrueFalse | boolean |
| avgAmount | double |

A properly completed solution should produce a DataFrame similar to this sample output:

|TrueFalse|         AvgAmount|
|---------|------------------|
|     true|        2243937.93|
|    false|2559457.1799999997|

In [0]:
%sql
CREATE OR REPLACE TEMPORARY VIEW q3Results AS
  SELECT TrueFalse, avg(Amount) AS AvgAmount 
  FROM revenue3
  GROUP BY TrueFalse;

In [0]:
%sql
SELECT
 * 
FROM 
  q3Results;

TrueFalse,AvgAmount
True,4979.792502001609
False,5027.741139088747


## Question 4: Pivot
### Summary
Calculate the total **`Amount`** for **`YesNo`** values of **true** and **false** in 2002 and 2003 from the table **`revenue4`**.
    
### Steps to complete
* Casts the **`UTCTime`** field to Timestamp and names the new column **`Date`**
* Extracts a **`Year`** column from the **`Date`** column
* Filters for years greater than 2001 and less than or equal to 2003
* Groups by **`YesNo`** and creates a pivot table to get the total **`Amount`** for each year and each value in **`YesNo`**
* Represents each total amount as a float rounded to two decimal places
* Store the results into a temporary table named **`q4results`**
   
A properly completed solution should produce a view similar to this sample output:

|YesNo|    2002|    2003|
|-----|--------|--------|
| true| 61632.3| 8108.47|
|false|44699.99|35062.22|

In [0]:
%sql
SELECT year(CAST(UTCTime AS timestamp)) as Year,
  YesNo,
  Amount 
FROM revenue4

Year,YesNo,Amount
2019,False,1909.84
2012,True,3330.26
2010,False,8466.72
2012,True,3008.16
2011,False,4643.41
2020,True,7469.96
2023,False,6119.32
2018,True,7401.74
2013,False,4841.03
2014,False,6912.95


In [0]:
%sql
CREATE 
OR REPLACE TEMPORARY VIEW q4Results AS
  SELECT * 
  FROM (SELECT Year, YesNo, Amount
        FROM (SELECT year(CAST(UTCTime AS timestamp)) as Year,
                     YesNo,
                     Amount 
              FROM revenue4) 
        WHERE Year > 2001 AND Year <= 2003)
 PIVOT ( round( sum(Amount), 2) AS total FOR Year in (2002, 2003) );

In [0]:
%sql
SELECT
 * 
FROM
 q4Results;

YesNo,2002,2003
True,1037216.95,1048665.41
False,1175143.21,1023798.38


## Question 5: Null Values and Aggregates
### Summary
Compute sums of **`amount`** grouped by **`aisle`** after dropping null values from **`products`** table.

### Steps to complete

* Drops any rows that contain null values in either the **`itemId`** or the **`aisle`** column
* Aggregates sums of the **`amount`** column grouped by **`aisle`**
* Store the results into a temporary view named  **`q5Results`**

In [0]:
%sql
CREATE 
OR REPLACE TEMPORARY VIEW q5Results AS
  SELECT aisle, sum(amount) 
  FROM products 
  WHERE (itemId IS NOT NULL AND aisle IS NOT NULL) 
  GROUP BY aisle;

In [0]:
%sql
SELECT
 * 
FROM 
  q5Results;

aisle,sum(amount)
3,63
5,14
7,107
12,56
2,126
8,8


## Question 6: Generate Subtotals By Rollup
### Summary
Compute averages of **`income`** grouped by **`itemName`** and **`month`** such that the results include averages across all months as well as a subtotal for an individual month from the **`sales`** table. 

### Steps to complete

* Coalesces null values in the **`month`** column generated by the `ROLLUP` clause
* Store the results into a temporary view named  **`q6Results`**

Your results should look something like this: 

| itemName| month | avgRevenue |
| --------| ----- | ---------- |
| Anim | 10 | 4794.16 |
| Anim | 7 | 5551.31 |
| Anim | All months | 5046.54 |
| Aute | 4 | 4069.51 |
| Aute | 7 | 3479.31 |
| Aute | 8 | 6339.28 |
| Aute | All months |  4489.41 |
| ... | ... | ... | 

In [0]:
%sql
CREATE 
OR REPLACE TEMPORARY VIEW q6Results AS
  SELECT 
    COALESCE(itemName, "All items") AS itemName,
    COALESCE(month(date), "All months") AS month,
    ROUND(AVG(revenue), 2) as avgRevenue
  FROM sales
  GROUP BY ROLLUP (itemName, month(date))
  ORDER BY itemName, month;

In [0]:
%sql
SELECT 
  * 
FROM 
  q6Results;

itemName,month,avgRevenue
Ad,4,6175.87
Ad,7,9149.41
Ad,8,4635.61
Ad,All months,6534.19
Adipiscing,10,8859.71
Adipiscing,11,1613.25
Adipiscing,All months,5236.48
Aliqua,10,3220.9
Aliqua,11,2610.84
Aliqua,7,562.74


&copy; 2020 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="http://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="http://help.databricks.com/">Support</a>