# NullIndicator

This notebook shows the functionality of the NullIndicator class. This transformer adds additional columns to the input dataframe, where the values in these columns indicate as to whether the values in selected columns are null.

In [1]:
import pandas as pd
import numpy as np
from sklearn.datasets import fetch_california_housing

In [2]:
import tubular
from tubular.imputers import NullIndicator

In [3]:
tubular.__version__

'0.3.0'

## Load California housing dataset from sklearn

In [4]:
cali = fetch_california_housing()
cali_df = pd.DataFrame(cali["data"], columns=cali["feature_names"])
cali_df["AveOccup"] = cali_df["AveOccup"].sample(frac=0.99, random_state=1)
cali_df["HouseAge"] = cali_df["HouseAge"].sample(frac=0.95, random_state=2)
cali_df["Population"] = cali_df["Population"].sample(frac=0.995, random_state=3)

In [5]:
cali_df.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25


In [6]:
cali_df.isnull().sum()

MedInc           0
HouseAge      1032
AveRooms         0
AveBedrms        0
Population     103
AveOccup       206
Latitude         0
Longitude        0
dtype: int64

## Simple Usage

### Initialising NullIndicator

In [7]:
imp = NullIndicator(
    columns=["HouseAge", "AveOccup", "Population"], copy=True, verbose=True
)

BaseTransformer.__init__() called


### NullIndicator Transform

The transform method takes a pandas dataframe as input and will add indicator columns for each of the columns specified in the initialisation, providing that they are present in the dataframe. Indicator columns are added with the names feature_nulls.

In [8]:
cali_df_2 = imp.transform(cali_df)

BaseTransformer.transform() called


In [9]:
cali_df_2.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,HouseAge_nulls,AveOccup_nulls,Population_nulls
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23,0,0,0
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22,0,0,0
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24,0,0,0
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25,0,0,0
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25,0,0,0
