######## Tutorial 03-Dealing with NULL Values

In databases, NULL values represent missing or undefined data. When working with datasets, you often need to handle NULL values by updating them, either by replacing them with specific values or calculating the missing data based on available information.

**NOTE**: NULL is not the same as an empty string or zero. It signifies the absence of a value.

**Approaches for Handling NULL Values**
There are ways you might handle missing values depending on the use case:
- **Update Values**: If the data source is contactable, you can update `NULL` values with new or corrected information.
- **Data Imputation**: If you need to replace missing data based on the overall dataset (such as using the average or median values), you can calculate those first and then apply them.
- **Leaving NULLs**: Sometimes it's best to leave NULL values in place and handle them in the application layer.

---
**Identifying `NULL` Values**
To check for `NULL` values in SQL, you can use the IS `NULL` condition in your queries. This condition helps you find rows where a column contains `NULL` instead of a valid value.

**Example 1**: Checking for NULL Values
To identify rows where a column has a NULL value in the participants table.

---
**UPDATE: Updating NULL Values**
You may want to update `NULL` values with a specific value. The `UPDATE` statement is used to modify existing records in a table, including updating NULL values.

**Example 2**: Update `NULL` values of two participants from Bhutan using new information. 

---
**COALESCE: Replacing NULL with a Default Value**

The COALESCE() function allows you to replace NULL values with a default value, making it useful when you want to ensure that NULL values are substituted with something more meaningful or practical in your results.

**Example 3**: Replace NULL Country with 'Myanmar'

In [None]:
import sqlite3
import pandas as pd

db_path = './database/mmdt.db3'

query = "SELECT * FROM participants WHERE Gender is NULL;"
df = pd.read_sql_query(query, f"sqlite:///{db_path}")
df





In [None]:
update_query = "UPDATE participants SET Selected = 'Replace' WHERE ID = 'mmdt2024.082' OR ID = 'mmdt2024.085';"

conn = sqlite3.connect(db_path)
cursor = conn.cursor()
cursor.execute(update_query)
conn.commit()
conn.close()

df = pd.read_sql_query(query, f"sqlite:///{db_path}")
df

In [None]:
query = "SELECT ID, COALESCE(BOD, 2000) as BOD FROM participants LIMIT 20 OFFSET 80;"
df = pd.read_sql_query(query, f"sqlite:///{db_path}")
df

In [None]:
update_query = "UPDATE participants SET Country = COALESCE(Country, 'Bhutan');"

conn = sqlite3.connect(db_path)
cursor = conn.cursor()
cursor.execute(update_query)
conn.commit()
conn.close()


query = "SELECT ID, BOD, Country FROM participants LIMIT 20 OFFSET 80;"
df = pd.read_sql_query(query, f"sqlite:///{db_path}")
df