In [None]:
import numpy as np
import pandas as pd

In [None]:
pd.isnull(np.nan)

In [None]:
pd.isnull(None)

In [None]:
pd.isna(np.nan)

In [None]:
pd.isna(None)

The opposite ones also exist:

In [None]:
pd.notnull(None)

In [None]:
pd.notnull(np.nan)

In [None]:
pd.notna(np.nan)

In [None]:
pd.notnull(3)

These functions also work with Series and `DataFrame`s:

In [None]:
pd.isnull(pd.Series([1, np.nan, 7]))

In [None]:
pd.notnull(pd.Series([1, np.nan, 7]))

In [None]:
pd.isnull(pd.DataFrame({
    'Column A': [1, np.nan, 7],
    'Column B': [np.nan, 2, 3],
    'Column C': [np.nan, 2, np.nan]
}))

### Pandas Operations with Missing Values

In [None]:
pd.Series([1, 2, np.nan]).count()

In [None]:
pd.Series([1, 2, np.nan]).sum()

In [None]:
pd.Series([2, 2, np.nan]).mean()

### Filtering missing data

As we saw with numpy, we could combine boolean selection + `pd.isnull` to filter out those `nan`s and null values:

In [None]:
s = pd.Series([1, 2, 3, np.nan, np.nan, 4])

In [None]:
pd.notnull(s)

In [None]:
pd.isnull(s)

In [None]:
pd.notnull(s).sum()

In [None]:
pd.isnull(s).sum()

In [None]:
s[pd.notnull(s)]

But both `notnull` and `isnull` are also methods of `Series` and `DataFrame`s, so we could use it that way:

In [None]:
s.isnull()

In [None]:
s.notnull()

In [None]:
s[s.notnull()]

### Dropping null values

In [None]:
s

In [None]:
s.dropna()

### Dropping null values on DataFrames

In [None]:
df = pd.DataFrame({
    'Column A': [1, np.nan, 30, np.nan],
    'Column B': [2, 8, 31, np.nan],
    'Column C': [np.nan, 9, 32, 100],
    'Column D': [5, 8, 34, 110],
})

In [None]:
df

In [None]:
df.shape

In [None]:
df.info()

In [None]:
df.isnull()

In [None]:
df.isnull().sum()

The default `dropna` behavior will drop all the rows in which _any_ null value is present:

In [None]:
df.dropna()

In [None]:
df.dropna(axis=1)  # axis='columns' also works

In this case, any row or column that contains **at least** one null value will be dropped. Which can be, depending on the case, too extreme. You can control this behavior with the `how` parameter. Can be either `'any'` or `'all'`:

In [None]:
df2 = pd.DataFrame({
    'Column A': [1, np.nan, 30],
    'Column B': [2, np.nan, 31],
    'Column C': [np.nan, np.nan, 100]
})

In [None]:
df2

In [None]:
df.dropna(how='all')

In [None]:
df.dropna(how='any')  # default behavior

You can also use the `thresh` parameter to indicate a _threshold_ (a minimum number) of non-null values for the row/column to be kept:

In [None]:
df

In [None]:
df.dropna(thresh=3)

In [None]:
df.dropna(thresh=3, axis='columns')

### Filling null values

In [None]:
s

**Filling nulls with a arbitrary value**

In [None]:
s.fillna(0)

In [None]:
s.fillna(s.mean())

In [None]:
s

**Filling nulls with contiguous (close) values**

The `method` argument is used to fill null values with other values close to that null one:

In [None]:
s.fillna(method='ffill')

In [None]:
s.fillna(method='bfill')

This can still leave null values at the extremes of the Series/DataFrame:

In [None]:
pd.Series([np.nan, 3, np.nan, 9]).fillna(method='ffill')

In [None]:
pd.Series([1, np.nan, 3, np.nan, np.nan]).fillna(method='bfill')

### Filling null values on DataFrames

The `fillna` method also works on `DataFrame`s, and it works similarly. The main differences are that you can specify the `axis` (as usual, rows or columns) to use to fill the values (specially for methods) and that you have more control on the values passed:

In [None]:
df

In [None]:
df.fillna({'Column A': 0, 'Column B': 99, 'Column C': df['Column C'].mean()})

In [None]:
df.fillna(method='ffill', axis=0)

In [None]:
df.fillna(method='ffill', axis=1)

### Checking if there are NAs

In [None]:
s.dropna().count()

In [None]:
missing_values = len(s.dropna()) != len(s)
missing_values

There's also a `count` method, that excludes `nan`s from its result:

In [None]:
len(s)

In [None]:
s.count()

So we could just do:

In [None]:
missing_values = s.count() != len(s)
missing_values