# Pandas Missing Values

## Imports and data

In [110]:
import numpy as np
import pandas as pd

In [111]:
df = pd.DataFrame(np.ones((10,3)), index=list('abcdefghij'), columns=['one', 'two', 'three'])
df.loc['a','one':'two'] = np.nan
df.loc['c','one'] = np.nan
df.loc['d','two'] = np.nan
df.loc['e','three'] = np.nan
df.loc['f',:] = np.nan
df.loc['g','one':'two'] = np.nan
df.loc['h', 'two':'three'] = np.nan
df['state'] = ['CA', '', None, 'OR', 'WA', None, '', 'WA', 'OR', None]

Here is the `DataFrame` we will work with:

In [112]:
df

Unnamed: 0,one,two,three,state
a,,,1.0,CA
b,1.0,1.0,1.0,
c,,1.0,1.0,
d,1.0,,1.0,OR
e,1.0,1.0,,WA
f,,,,
g,,,1.0,
h,1.0,,,WA
i,1.0,1.0,1.0,OR
j,1.0,1.0,1.0,


Replace empty strings in the `state` column by `None`, so that its missing values are handled in a consistent manner:

In [113]:
# YOUR CODE HERE
df.loc['b','state'] = None
df.loc['g','state'] = None

In [114]:
assert '' not in df.state.unique()
assert df.loc['b','state'] is None
assert df.loc['g','state'] is None

Create a new `DataFrame`, named `df2`, that has all rows with any missing values dropped:

In [115]:
# YOUR CODE HERE
df2 = df.dropna()

In [116]:
assert len(df2)==1
assert 'i' in df2.index

Create a new `DataFrame`, named `df3`, from `df1` by dropping rows that have only missing values:

In [117]:
# YOUR CODE HERE
df3 = df.dropna(axis='rows', how='all')

In [118]:
assert len(df3)==9

Create a new `DataFrame`, named `df4`, from `df1` that has all columns with fewer than 7 actual values dropped:

In [119]:
# YOUR CODE HERE
df4 = df.dropna(axis='columns', thresh=7)

In [120]:
assert list(df4.columns)==['three']

Create a new `DataFrame`, named `df5`, from `df1` that has only the numerical columns, with missing values replace by the number -9.

In [121]:
# YOUR CODE HERE
df5 = df[['one','two','three']]
df5 = df5.fillna(-9)

In [122]:
assert list(df5.columns)==['one','two','three']
sums = df5.sum()
assert sums['one']==-30.0
assert sums['two']==-40.0
assert sums['three']==-20.0

Write a function `count_null` that takes a `Series` and return an integer valued count of the number of null values in the `Series`:

In [123]:
def count_null(column):
    """Count the number of missing values in a column (Series)."""
    count = 0
    for item in column:
        if isinstance(item, float):
            if np.isnan(item):
                count += 1
        else:
            if item == None:
                count += 1
    return count

In [124]:
assert count_null(df.one)==4
assert count_null(df.two)==5
assert count_null(df.three)==3
assert count_null(df.state)==5