### Tutorial 11: Data Imputation in PostgreSQL

In this tutorial, we will explore various methods for **data imputation** in PostgreSQL. Data imputation is the process of replacing missing or NULL values with meaningful substitutes to maintain data integrity.

#### 1. Using `COALESCE()` to Replace NULLs
- **Purpose**: Replaces NULL values with a specified default value.
- **Example**: Replace missing `salary` values with the **average salary**.
  ```sql
  SELECT employee_id,
         employee_name,
         COALESCE(salary, (SELECT AVG(salary) FROM employees)) AS imputed_salary
  FROM employees;
  ```
- **Explanation**:
  - `COALESCE(salary, avg_value)` checks if `salary` is NULL and replaces it with the computed average.

### 2. Using `CASE WHEN` for Conditional Imputation
- **Purpose**: Allows custom logic for imputation based on conditions.
- **Example**: Replace missing `age` values with different defaults based on `department`.
  ```sql
  SELECT employee_id, employee_name,
         CASE
           WHEN age IS NULL AND department = 'IT' THEN 30
           WHEN age IS NULL THEN 35
           ELSE age
         END AS imputed_age
  FROM employees;
  ```

### 3. Using `UPDATE` to Permanently Replace NULL Values
- **Purpose**: Updates missing values in a table permanently.
- **Example**: Set missing `order_amount` to the median value.
  ```sql
  UPDATE orders
  SET order_amount = (SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY order_amount) FROM orders)
  WHERE order_amount IS NULL;
  ```

### 4. Using `LEAD()` and `LAG()` for Forward/Backward Filling
- **Purpose**: Fills missing values using previous or next row values.
- **Example**: Forward fill missing `temperature` values.
  ```sql
  SELECT id,
         COALESCE(temperature, LAG(temperature) OVER (ORDER BY id)) AS filled_temperature
  FROM weather_data;
  ```

### 5. Using `INTERPOLATION` for Numeric Data
- **Purpose**: Estimates missing values by averaging surrounding data points.
- **Example**: Interpolate missing `sales` data.
  ```sql
  SELECT id,
         COALESCE(sales, (LEAD(sales) OVER (ORDER BY id) + LAG(sales) OVER (ORDER BY id)) / 2) AS interpolated_sales
  FROM sales_data;
  ```

### Summary
| Method | Purpose | Example |
|--------|---------|---------|
| `COALESCE()` | Replace NULL with default/average | Replace missing salary with avg salary |
| `CASE WHEN` | Custom conditional replacement | Assign different age values by department |
| `UPDATE` | Permanently set missing values | Set NULL order amounts to median |
| `LAG()/LEAD()` | Fill missing values using adjacent rows | Forward-fill missing temperature |
| Interpolation | Estimate missing values | Use the average of adjacent sales |

These techniques help ensure data **consistency** and **accuracy** in PostgreSQL. Choose the method that best fits your dataset and requirements!

