# NullIndicator

This notebook shows the functionality of the NullIndicator class. This transformer adds additional columns to the input dataframe, where the values in these columns indicate as to whether the values in selected columns are null.

In [1]:
import pandas as pd
import numpy as np

In [2]:
import tubular
from tubular.imputers import NullIndicator

In [3]:
tubular.__version__

'0.2.8'

## Load Boston house price dataset from sklearn
Note, the load_boston script modifies the original Boston dataset to include nulls values and pandas categorical dtypes.

In [4]:
boston_df = tubular.testing.test_data.prepare_boston_df()
boston_df.shape

(506, 17)

In [5]:
boston_df.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,target,ZN_cat,CHAS_cat,RAD_cat
0,0.00632,18.0,2.31,0.0,0.538,6.575,,4.09,,296.0,15.3,396.9,4.98,24.0,18.0,0.0,
1,0.02731,,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14,21.6,,0.0,2.0
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,,17.8,392.83,4.03,34.7,0.0,0.0,2.0
3,,,2.18,0.0,0.458,,45.8,6.0622,3.0,222.0,18.7,,,33.4,,0.0,3.0
4,0.06905,0.0,2.18,0.0,0.458,,,6.0622,3.0,222.0,18.7,396.9,5.33,36.2,0.0,0.0,3.0


In [6]:
boston_df.isnull().sum()

CRIM        55
ZN          62
INDUS        0
CHAS         0
NOX         44
RM          56
AGE         42
DIS         51
RAD         62
TAX         52
PTRATIO     56
B           50
LSTAT       49
target       0
ZN_cat      62
CHAS_cat     0
RAD_cat     62
dtype: int64

## Simple Usage

### Initialising NullIndicator

In [7]:
imp = NullIndicator(
     columns=['CRIM', 'ZN', 'INDUS'],
     copy=True, 
     verbose=True
)

BaseTransformer.__init__() called


### NullIndicator Transform

The transform method takes a pandas dataframe as input and will add indicator columns for each of the columns specified in the initialisation, providing that they are present in the dataframe. Indicator columns are added with the names feature_nulls.

In [8]:
boston_df_2=imp.transform(boston_df)

BaseTransformer.transform() called


In [9]:
boston_df_2.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,target,ZN_cat,CHAS_cat,RAD_cat,CRIM_nulls,ZN_nulls,INDUS_nulls
0,0.00632,18.0,2.31,0.0,0.538,6.575,,4.09,,296.0,15.3,396.9,4.98,24.0,18.0,0.0,,0,0,0
1,0.02731,,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14,21.6,,0.0,2.0,0,1,0
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,,17.8,392.83,4.03,34.7,0.0,0.0,2.0,0,0,0
3,,,2.18,0.0,0.458,,45.8,6.0622,3.0,222.0,18.7,,,33.4,,0.0,3.0,1,1,0
4,0.06905,0.0,2.18,0.0,0.458,,,6.0622,3.0,222.0,18.7,396.9,5.33,36.2,0.0,0.0,3.0,0,0,0
