# Missing Data

- Causes
    - human error
    - systematic issues
- Types:
    - Missing Completely at Random (MCAR) : 
        - No relationship between missing data and other values
        - Salesman forgot to mention weight and directly states price on a vegetable in the invoice. 
    - Missing at Random (MAR) : 
        - Systematic relationship between missing data and other observed values
        - Present data can explain the missing data
        - Reason is known to us, or can be identified
        - Salesman includes different prices of candies in a single tag chocolate products inside  invoice.
    - Missing Not at Random (MNAR) : 
        - Systematic relationship between missing data and other unobserved values
        - Present data can explain the missing data
        - Reason is not known to us, or can not be identified
        - Salesman can not include some tags inside invoice and only mentions price, Since the new product is not listed in the system.
- Identify:
    - with `SELECT *  FROM table_name WHERE col IS NULL`
    - with `SELECT *  FROM table_name WHERE col = ''`
    - with `SELECT COALESCE(col1, col2, 'Unknown') FROM table_name`
    - with `SELECT NULLIF(col, 'Unknown') FROM table_name`
- Rectifying missing data
    - Best option: locate and add missing values
        - May not be feasible
        - May not be worthwhile
    - Provide a value (average, median, etc)
    - Exclude records


# Detecting Duplicates

- Impartial duplicate
    - column values are duplicated with ambiguity where values differ
    - example : All combinations of columns are duplicates. Only one column holds unique record.
        - Resolving impartial duplicates: `AVERAGE()` , `MIN()` , `MAX()`
1. Group your dataset by specific columns and see the count. Records with `>1` counts are duplicates  :
    ```
    SELECT col1, col2, col3 FROM some_table
    GROUP BY col1, col2, col3
    HAVING COUNT(*) > 1;
    ```
2. See row number of specific combinations of columns. If there are `>1` rows for a specific combination, then that combination have duplicates:
    ```
    SELECT col1, col2, col3,
    ROW_NUMBER() OVER(PARTITION BY col1, col2, col3) AS row_num
    FROM some_table;
    WHERE row_num > 1;
    ```

# Catching Specific Valid Patterns

- With `SIMILAR TO` or `NOT SIMILAR TO`
- With `BETWEEN val1 AND val2`
- With `col ILIKE 'Ab_R%'`

# Converting an existing column's type to another type

1. Alter the column's type from the table
2. Since there is already data present, with `USING` convert the existing values to new type
```
ALTER TABLE table_name
ALTER COLUMN column_name TYPE new_type 
USING column_name::new_type;
```