# Pandas
# Dataframes - Handling Missing Values


In [2]:
import numpy as np
import pandas as pd

**Index Alignment** 

**For binary operations on Series or DataFrame objects, Pandas will align indices in the process of performing the operation.** 
**For missing index in one of the structures python will fill in with NaN -**


In [4]:
A = pd.Series([2, 4, 6], index=[0, 1, 2]) 
B = pd.Series([1, 3, 5], index=[1, 2, 3]) 
A+B

0    NaN
1    5.0
2    9.0
3    NaN
dtype: float64

**Alternativelly its possible to speciify the fill value for missing elements**

In [6]:
A.add(B, fill_value=0)

B.add(A, fill_value=0)


0    2.0
1    5.0
2    9.0
3    5.0
dtype: float64

**Operations with NaN
NaN is specifically a floating-point value; there is no equivalent NaN value for integers, strings, or other types
regardless of the operation, the result of arithmetic with NaN will be another NaN**

In [8]:
vals2 = np.array([1, np.nan, 3, 4])
a, b, c = vals2.sum(), vals2.min(), vals2.max()
print(a, b, c)

nan nan nan


**NumPy provide some special aggregations that will ignore these missing values**

In [10]:
a, b, c = np.nansum(vals2), np.nanmin(vals2), np.nanmax(vals2)
print(a,b,c)

8.0 1.0 4.0


**Detecting Null Values**

In [12]:
data = pd.Series([1, np.nan, 'hello', None])
data.isnull()

0    False
1     True
2    False
3     True
dtype: bool

**Finding Not Null Values in a Series**

In [14]:
data[data.notnull()]

0        1
2    hello
dtype: object

**Dropping Null values - for a series**

In [17]:
data.dropna()

0        1
2    hello
dtype: object

**In a DataFrame it's only possible to drop full null rows or columns**

In [19]:
df = pd.DataFrame([[1, np.nan, 2], [2, 3, 5], [np.nan, 4, 6]])
df.head()

Unnamed: 0,0,1,2
0,1.0,,2
1,2.0,3.0,5
2,,4.0,6


**By default, dropna() will drop all rows in which any null value is present:**

In [21]:
df.dropna()

Unnamed: 0,0,1,2
1,2.0,3.0,5


**You can specify columns to drop all collumns containing at least one null value**

In [23]:
df.dropna(axis='columns')

Unnamed: 0,2
0,2
1,5
2,6


**Specify a minimum number of non-null values (2) for the row/column to be kept**

In [25]:
df.dropna(axis='columns', thresh=2)

Unnamed: 0,0,1,2
0,1.0,,2
1,2.0,3.0,5
2,,4.0,6


**Fill null values**

**fill NA entries with a single value, such as zero:**

In [28]:
data.fillna(0)

0        1
1        0
2    hello
3        0
dtype: object

**specify a forward-fill (or backwar-fill bfill) to propagate the previous value forward**

In [30]:
data.fillna(method='ffill')

0        1
1        1
2    hello
3    hello
dtype: object

**for DataFrames we can also specify an axis (1 along columns) along which the fills take place
if a previous value is not available the NA value will remain**


In [31]:
df.fillna(method='ffill', axis=1)

Unnamed: 0,0,1,2
0,1.0,1.0,2.0
1,2.0,3.0,5.0
2,,4.0,6.0
