## Lecture 5 Sorting and limiting

```SQL
SELECT [select_list]
  FROM database_name.table_name
  WHERE [boolean_expressions_list] -- to filter individual rows
  GROUP BY [group_by_list]
  HAVING [boolean_expressions_list] -- to filter groups
  ORDER BY [order_by_list]
  LIMIT [...]
```

SQL engine 执行的顺序:
`FROM, WHERE, GROUP BY, HAVING, SELECT, ORDER BY, LIMIT`

A common variation is that, some SQL engines execute `SELECT` clause a little earlier to identify **column aliases** that are defined in select list.

Clauses previously covered:
- `SELECT`
- `FROM`
- `WHERE`
- `GROUP BY`
- `HAVING`

### `ORDER BY` clause

#### Sorting in SQL
```SQL
... ORDER BY [...]
```
    [...]: order by list

Examples
```SQL
-- sort rows by list_price from lowest to highest, i.e. in ascending order
SELECT * FROM games ORDER BY list_price;
SELECT * FROM games ORDER BY list_price ASC;

-- descending order using DESC
SELECT * FROM games ORDER BY list_price DESC;

-- first sort by max_palyers in descending
-- then sort by list_price in ascending order
SELECT name, max_players, list_price FROM games
  ORDER BY max_palyers DESC, list_price ASC;
```

**QUIZ**: Which ones are the same as
```SQL
SELECT * FROM crayons ORDER BY red;
```

Wrong answers are omitted.

```SQL
✅ SELECT * FROM crayons ORDER BY red ASC;
✅ SELECT * FROM crayons ORDER BY -red DESC;
```

#### Ordering expressions

Order by list can have just one column reference / expression or it can have two or more column references or expressions.

```MySQL
-- sort by saturation
SELECT * FROM crayons
  ORDER BY (greatest(red, green, blue) - least(red, green, blue)) /
           greatest(red, green, blue) DESC;
                                               
-- use column alias
SELECT *, (greatest(red, green, blue) - least(red, green, blue)) /
          greatest(red, green, blue) AS saturation 
  FROM crayons
  ORDER BY saturation DESC;                                        
```

#### Missing values in ordered results

Different SQL engines handle them in different ways.

- Impala, PostgreSQL 对待 `NULL` 就像对待无穷大
- Hive, MySQL 对待 `NULL` 就像对待无穷小

Specify how to deal with `NULL` when sorting using a boolean trick
```SQL
-- NULLS first
SELECT shop, game, price FROM inventory ORDER BY price IS NULL ASC, price;
```

Specify how to deal with `NULL` when sorting (newer Hive, Impala, PostgreSQL):
```SQL
SELECT shop, game, price FROM inventory ORDER BY price ASC NULLS FIRST;
SELECT shop, game, price FROM inventory ORDER BY price ASC NULLS LAST;
```

#### Using `ORDER BY` with Hive and Impala

With Hive, the columns used in the `ORDERED BY` clause **must** be included in the result set.


```SQL
-- ❌
-- The following two are not supported in Hive
SELECT shop, game FROM inventory ORDER BY price;
SELECT shop, game FROM inventory ORDER BY (qty * price);

-- ✅
SELECT shop, game, qty * price AS qty_times_price FROM inventory ORDER BY qty_times_price;
-- ✅
SELECT color, red, green, blue FROM wax.crayons ORDER BY read + green + blue;
```

With Impala, we can use positional reference in order by list.

Sorting is a notoriously expensive operation in distributive computing. Better to filtering data before sorting.

### `LIMIT` clause

`LIMIT` clause 最后执行

```SQL
SELECT * FROM flights LIMIT 5;
```


#### When to use `LIMIT` clause

To get a sense of what the return is like; avoid returning too many rows.

rows picked arbitrarily ≠ random sample of rows


QUIZ: select the appropriate uses for the `LIMIT` clause.

✅ Reduce the computer resources used by the SQL engine. <br>
✅ Return a few rows from a table to inspect some of the values. <br>
✅ Protect against returing an unexpectedly large number of rows. <br>
❌ Filter individual rows based on conditions. <br>
❌ Randomly sample from a large table.


#### Using `LIMIT` with `ORDER BY`

The `LIMIT` clause is especially useful when used with the `ORDER BY` clause.

**TOP-N** query

Who are the one hundred highest-spending customers? <br>
Who are the ten lowest-performing salespeople?

**Note**: Watch out for ties.


#### Paging

    limit    offset
    100      0      ->   1   ~ 100
    100      100    ->   101 ~ 200
    100      200    ->   201 ~ 300
    ...

Different engines
```MySQL
-- Impala, PostgreSQL, MySQL
[...] LIMIT limit_num OFFSET offset_num

-- MySQL, Newer Hive
[...] LIMIT offset_num, limit_num
```
---

Note: Without `ORDER BY`, order is upredictable. There could be duplicate or missing rows. Solution: Always use `ORDER BY` to arrange rows for paging.