<a href="https://colab.research.google.com/github/rubyspch/Colaboratory-Notes/blob/main/null_values.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import numpy as np
import pandas as pd

In [6]:
falsy_values = (0, False, None, '', [], {})
any(falsy_values)

False

NumPy has:
```
np.nan
```
which is 'Not a Number'. Do any arithmatic with this it will result in nan.

3+none would raise a traceback error and stop code.

In [8]:
3 + np.nan

nan

But using np.nan for empty values just registers as nan with no traceback.

There is also the infinite type:

In [12]:
np.inf

inf

Do any arithmatic with this it will result in inf.

Two functions: np.isnan and np.isinf will check for nan and inf:

They return boolean arrays if you pass through an array

In [15]:
np.isnan(np.array([1, 2, 3, np.nan, np.inf, 4]))

array([False, False, False,  True, False, False])

In [14]:
np.isinf(np.inf)

True

In [19]:
np.isfinite(np.array([1, 2, 3, np.nan, np.inf, 4]))

array([ True,  True,  True, False, False,  True])

isfinite filters for both inf and nan

**Filtering**
To filter them out of an array:

In [16]:
a = np.array([1, 2, 3, np.nan, np.nan, 4])

In [20]:
a[~np.isnan(a)]

array([1., 2., 3., 4.])

In [18]:
a[np.isfinite(a)]

array([1., 2., 3., 4.])

**Pandas has utility functions to detect nulls as well:**

isnull, isna, notnull, notna:
- return true or false.
-work with series and dataframes

In [21]:
pd.isnull(np.nan)

True

To count the amount of not null/null values use .sum:

In [22]:
a=pd.Series([1,2,3,np.nan,4])
pd.notnull(a).sum()

4

Can filter data series by:
```
series_name[pd.notnull(series_name)]
```
This will output all the values in series_name that are not null as a new series

Another way to write that:

```
series_name[series_name.notnull()]
```

But if you're wanting to drop null values in a series or dataframe you can use:

```
s.dropna()
```

In a dataframe, dropna will drop entire rows that have any nan values, not just the nans themselves.
Can get it to delete columns instead if you do:
```
df.dropna(axis=1)
```

Other attributes for dropna:
`how='any'` is the default (deletes row with any amount of nans)
`how='all'` (deletes rows/cols where ALL of the values are nans

`thresh=3` specifies a threshhold. Saying 3 means keep any rows/cols with `at least 3 `nonnull values, at least 3 truthy values.

**filling values**


```
s.fillna(0)
```
This will fill any nan values with 0.


```
s.fillna(s.mean())
```

This will fill them with the mean.

fillna attribute 'method':
`method='bfill'` OR `'ffill'`
- this will give the nan values the value of the data after or before it
-add `axis=1` to do this by row

Boolean tests:
`series.any()`
`series.all()`


In [39]:
s=pd.Series([True, False, True])

In [24]:
s.any()

True

In [25]:
s.all()

False