# Common summary statistics

- `MIN()` for the minimum value of a column
- `MAX()` for the maximum value of a column
- `AVG()` for the mean or average value of a column
- `SUM()` for the sum of all values of a column
- These are aggregation function, so if there is non-aggregated columns, they need to be in `GROUP BY` clause
- If filtering is needed with Aggregation function, use `HAVING` instead of `WHERE`
- example:

```
SELECT non_agg_col, AVG(col) AS avg_col,
FROM table_name
GROUP BY non_agg_col
HAVING MAX(col) > 100
``

# Detecting missing values

- To determine if a column contains a `NULL` value, use `IS NULL` and `IS NOT NULL`
- example: `SELECT col FROM table_name WHERE col2 IS NOT NULL`
- Blank is not NULL
    - An empty string `''` can be used to identify blanks : ``WHERE col = ''``
    - The best way is to filter out blanks : `WHERE LEN(col) > 0`
- Substituting `NULL`:
    - With `ISNULL()` : `SELECT ISNULL(col, 'Missing') AS new_col FROM table_name`
    - With `COALESCE()` : `SELECT COALESCE(col1, col2, 'Missing') AS new_col FROM table_name`


# Binning Data with Case

- Used to create customized column based on existing data
- New column can be used for binning for distribution, bar plot
```
SELECT col1,
CASE 
    WHEN col2 = 'val1' or col2 = 'val2' THEN 'X'
    WHEN col2 = 'val3' or col2 = 'val4' THEN 'Y'
    ELSE 'Others'
END AS new_bin_col
FROM table_name
```