### Missing Data in Pandas:

1. None: Pythonic missing data

The first sentinel value used by Pandas is None , a Python singleton object that is often used for missing data in Python code. Because None is a Python object, it cannot be used in any arbitrary NumPy/Pandas array, but only in arrays with data type 'object' (i.e., arrays of Python objects)
    
The use of Python objects in an array also means that if you perform aggregations like sum() or min() across an array with a None value, you will generally get an error.

2. NaN: Missing numerical data

The other missing data representation, NaN (acronym for Not a Number), is different; it is a special floating-point value recognized by all systems that use the standard IEEE floating-point representation.
    
The result of arithmetic with NaN will be another NaN.

3. Null:
isnull()
    Generate a Boolean mask indicating missing values
notnull()
    Opposite of isnull()
dropna()
    Return a filtered version of the data
fillna()
    Return a copy of the data with missing values filled or imputed

In [10]:
import numpy as np
import pandas as pd
# Creating a numpy array with 'None' object
vals1 = np.array([1, 2, None, 3, 4])
vals1

array([1, 2, None, 3, 4], dtype=object)

In [3]:
for dtype in ['object', 'int']:
    print('dtype = ', dtype)
    %timeit np.arange(1E6, dtype = dtype).sum()
    print()

dtype =  object
105 ms ± 4.31 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

dtype =  int
3.28 ms ± 40.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)



In [5]:
# Finding sum of array
vals1.sum()

TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'

In [8]:
# Creating a numpy array with 'Nan' object
vals2 = np.array([1, 2, 3, np.nan, 5, np.nan])
print(vals2)
print(vals2.dtype)

[ 1.  2.  3. nan  5. nan]
float64


In [16]:
# The result of arithmetic with NaN will be another NaN
print("np.nan + 1 = " , np.nan + 1)
print("np.nan - 10 = ", np.nan - 10)
print("np.nan * 101 = ", np.nan * 101)
print("np.nan / 12 = ", np.nan / 12)

np.nan + 1 =  nan
np.nan - 10 =  nan
np.nan * 101 =  nan
np.nan / 12 =  nan


In [17]:
# Finding sum of array
print("Sum of vals2: ",np.nansum(vals2))
print("Minimum of vals2: ", np.nanmin(vals2))
print("Maximum of vals2: ", np.nanmax(vals2))

Sum of vals2:  11.0
Minimum of vals2:  1.0
Maximum of vals2:  5.0


In [19]:
# NaN and None in Pandas
series1 = pd.Series([1, 2, np.nan, 4, None])
series1

0    1.0
1    2.0
2    NaN
3    4.0
4    NaN
dtype: float64

In [20]:
series2 = pd.Series(range(4), dtype = int)
series2

0    0
1    1
2    2
3    3
dtype: int64

In [21]:
series2[2] = None
series2

0    0.0
1    1.0
2    NaN
3    3.0
dtype: float64

In [23]:
# Detecting null values
data = pd.Series([1, np.nan, 3, None])
data.isnull()

0    False
1     True
2    False
3     True
dtype: bool

In [24]:
# Dropping null values
data.dropna()

0    1.0
2    3.0
dtype: float64

In [28]:
# Dropping null values from a DataFrame
df = pd.DataFrame([[1, 2, np.nan],[None, 5, np.nan], [7, 8, 9]])
df

Unnamed: 0,0,1,2
0,1.0,2,
1,,5,
2,7.0,8,9.0


In [31]:
df.dropna()

Unnamed: 0,0,1,2
2,7.0,8,9.0


In [32]:
# drops all columns containing a null value
df.dropna(axis = 1)

Unnamed: 0,1
0,2
1,5
2,8


### Filling null values
Pandas provides the fillna() method, which returns a copy of the array with the null values replaced

In [40]:
data = pd.Series([1, np.nan, 3, None, 5, np.nan], index=list('abcdef'))
data

a    1.0
b    NaN
c    3.0
d    NaN
e    5.0
f    NaN
dtype: float64

In [41]:
# We can fill NA entries with a single value, such as zero
data.fillna(0)

a    1.0
b    0.0
c    3.0
d    0.0
e    5.0
f    0.0
dtype: float64

In [43]:
# We can specify a forward-fill to propagate the previous value forward
data.fillna(method = 'ffill')

a    1.0
b    1.0
c    3.0
d    3.0
e    5.0
f    5.0
dtype: float64

In [44]:
# we can specify a back-fill to propagate the next values backward
data.fillna(method = 'bfill')

a    1.0
b    3.0
c    3.0
d    5.0
e    5.0
f    NaN
dtype: float64

In [45]:
# For a DataFrame we can fill NA entries with a single value, such as zero
df.fillna(0)

Unnamed: 0,0,1,2
0,1.0,2,0.0
1,0.0,5,0.0
2,7.0,8,9.0


In [46]:
# For a DataFrame we can specify a forward-fill to propagate the previous value forward
# For DataFrame s, the options are similar, but we can also specify an axis along which the fills take place
df.fillna(method = 'ffill', axis = 1)

Unnamed: 0,0,1,2
0,1.0,2.0,2.0
1,,5.0,5.0
2,7.0,8.0,9.0
