# Handling Missing Data in NumPy

In data science, missing values are common. NumPy provides ways to handle missing data, such as identifying, replacing, and performing aggregations while safely ignoring missing values (represented as `np.nan`).

---

### 1. Identifying Missing Data with `np.nan`

`np.nan` represents "Not a Number" in NumPy, and it is used to represent missing or undefined data in arrays.



In [1]:
import numpy as np

In [2]:
# Creating an array with missing values (np.nan)
arr_with_nan = np.array([1, 2, np.nan, 4, np.nan, 6])
print("Array with missing values (np.nan):", arr_with_nan)

Array with missing values (np.nan): [ 1.  2. nan  4. nan  6.]


### 2. Removing or Replacing Missing Data
We can either remove or replace missing data using np.isnan() to identify np.nan values and np.nan_to_num() to replace them.

In [3]:
# Identifying missing data
is_nan = np.isnan(arr_with_nan)
print("Boolean array indicating missing values (True for np.nan):", is_nan)

Boolean array indicating missing values (True for np.nan): [False False  True False  True False]


In [4]:
# Removing missing values
arr_without_nan = arr_with_nan[~np.isnan(arr_with_nan)]
print("Array without missing values:", arr_without_nan)

Array without missing values: [1. 2. 4. 6.]


In [5]:
# Replacing missing values with 0
arr_replaced_nan = np.nan_to_num(arr_with_nan, nan=0)
print("Array with missing values replaced by 0:", arr_replaced_nan)

# Replacing missing values with a specific number (e.g., -1)
arr_replaced_nan_specific = np.nan_to_num(arr_with_nan, nan=-1)
print("Array with missing values replaced by -1:", arr_replaced_nan_specific)

Array with missing values replaced by 0: [1. 2. 0. 4. 0. 6.]
Array with missing values replaced by -1: [ 1.  2. -1.  4. -1.  6.]


### 3. Aggregations with Missing Data
When performing aggregations like mean, min, or max on data that contains missing values (np.nan), regular functions such as np.mean() will return np.nan. To handle missing data during aggregations, NumPy provides special functions like np.nanmean(), np.nanmin(), and np.nanmax().

In [6]:
# Calculating mean while ignoring missing values
mean_without_nan = np.nanmean(arr_with_nan)
print("Mean ignoring missing values:", mean_without_nan)

Mean ignoring missing values: 3.25


In [7]:
# Calculating the minimum value while ignoring missing values
min_without_nan = np.nanmin(arr_with_nan)
print("Minimum value ignoring missing values:", min_without_nan)

Minimum value ignoring missing values: 1.0


In [8]:
# Calculating the maximum value while ignoring missing values
max_without_nan = np.nanmax(arr_with_nan)
print("Maximum value ignoring missing values:", max_without_nan)

Maximum value ignoring missing values: 6.0
