# SetValueTransformer
This notebook shows the functionality of the `SetValueTransformer` class. This transformer simply sets the specified columns to a predefined set value.

In [1]:
import pandas as pd
import numpy as np

In [2]:
import tubular
from tubular.misc import SetValueTransformer

In [3]:
tubular.__version__

'0.2.8'

## Load Boston house price dataset from sklearn
Note, the load_boston script modifies the original Boston dataset to include nulls values and pandas categorical dtypes.

In [4]:
boston_df = tubular.testing.test_data.prepare_boston_df()
boston_df.shape

(506, 17)

In [5]:
boston_df.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,target,ZN_cat,CHAS_cat,RAD_cat
0,0.00632,18.0,2.31,0.0,0.538,6.575,,4.09,,296.0,15.3,396.9,4.98,24.0,18.0,0.0,
1,0.02731,,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14,21.6,,0.0,2.0
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,,17.8,392.83,4.03,34.7,0.0,0.0,2.0
3,,,2.18,0.0,0.458,,45.8,6.0622,3.0,222.0,18.7,,,33.4,,0.0,3.0
4,0.06905,0.0,2.18,0.0,0.458,,,6.0622,3.0,222.0,18.7,396.9,5.33,36.2,0.0,0.0,3.0


In [6]:
boston_df.dtypes

CRIM         float64
ZN            object
INDUS        float64
CHAS          object
NOX          float64
RM           float64
AGE          float64
DIS          float64
RAD           object
TAX          float64
PTRATIO      float64
B            float64
LSTAT        float64
target       float64
ZN_cat      category
CHAS_cat    category
RAD_cat     category
dtype: object

## Simple usage

### Initialising SetValueTransformer

The user must specify the following;
- `columns` giving the columns to set to a specific value
- `value` giving the value to set `columns` to

In [7]:
set_value_1 = SetValueTransformer(
    columns = ['CRIM', 'ZN'], 
    value = 0
)

### SetValueTransformer fit
`SetValueTransformer` has not fit method, there is nothing learnt from the input data `X`.

### CutTransformer transform

In [8]:
boston_df_2 = set_value_1.transform(boston_df)

In [9]:
boston_df_2['CRIM'].value_counts(dropna = False)

0    506
Name: CRIM, dtype: int64

In [10]:
boston_df_2['ZN'].value_counts(dropna = False)

0    506
Name: ZN, dtype: int64