In [1]:
import pandas as pd

In [2]:
ufo = pd.read_csv('http://bit.ly/uforeports')

In [3]:
ufo.tail()

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
18236,Grant Park,,TRIANGLE,IL,12/31/2000 23:00
18237,Spirit Lake,,DISK,IA,12/31/2000 23:00
18238,Eagle River,,,WI,12/31/2000 23:45
18239,Eagle River,RED,LIGHT,WI,12/31/2000 23:45
18240,Ybor,,OVAL,FL,12/31/2000 23:59


`NaN` stands for not a number. But what it means conceptually is that, it is a missing value. What does that mean ? Well, it means that when building the dataframe, read_csv detected what it thought as a missing value and it FLAGGED it with this special `NaN`.

Let's take a look at some **methods** for working with missing values.

## isnull

`isnull` is a dataframe method. (It is also a Series method, which we will see later)

In [4]:
ufo.isnull().tail()

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
18236,False,True,False,False,False
18237,False,True,False,False,False
18238,False,True,True,False,False
18239,False,False,False,False,False
18240,False,True,False,False,False


It is only because Pandas uses this special value `NaN` that `isnull()` can detect them and produce these True's and False's as a result.

## notnull

`notnull` is the inverse of `isnull`, in that it produces a `True` when the value is not NaN and it produces a `False` when it is NaN.

In [5]:
ufo.notnull().tail()

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
18236,True,False,True,True,True
18237,True,False,True,True,True
18238,True,False,False,True,True
18239,True,True,True,True,True
18240,True,False,True,True,True


## Why are isnull and notnull useful ?

For one, there is really cool pandas trick. 

In [6]:
ufo.isnull().sum()

City                  25
Colors Reported    15359
Shape Reported      2644
State                  0
Time                   0
dtype: int64

What this is telling us is the number of missing values in each of the columns.

Second way to use the `isnull()` method, is to pass it to a dataframe as a filtering option.

So, remember, you are passing a Series of booleans to the dataframe. You can combine this with loc method if you wish.

In [8]:
ufo.loc[ufo.City.isnull(), :]

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
21,,,,LA,8/15/1943 0:00
22,,,LIGHT,LA,8/15/1943 0:00
204,,,DISK,CA,7/15/1952 12:30
241,,BLUE,DISK,MT,7/4/1953 14:00
613,,,DISK,NV,7/1/1960 12:00
1877,,YELLOW,CIRCLE,AZ,8/15/1969 1:00
2013,,,,NH,8/1/1970 9:30
2546,,,FIREBALL,OH,10/25/1973 23:30
3123,,RED,TRIANGLE,WV,11/25/1975 23:00
4736,,,SPHERE,CA,6/23/1982 23:00
