# LogTransformer
This notebook shows the functionality in the LogTransformer class. This transformer applies the log transform to numeirc columns. <br>

In [1]:
import pandas as pd
import numpy as np
from sklearn.datasets import fetch_california_housing

In [2]:
import tubular
from tubular.numeric import LogTransformer

In [3]:
tubular.__version__

'0.3.0'

## Load California housing dataset from sklearn

In [4]:
cali = fetch_california_housing()
cali_df = pd.DataFrame(cali["data"], columns=cali["feature_names"])
cali_df["HouseAge"] = cali_df["HouseAge"].sample(frac=0.995, random_state=123)

In [5]:
cali_df.shape

(20640, 8)

In [6]:
cali_df.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25


In [7]:
cali_df.isnull().sum()

MedInc          0
HouseAge      103
AveRooms        0
AveBedrms       0
Population      0
AveOccup        0
Latitude        0
Longitude       0
dtype: int64

## Simple usage

### Initialising LogTransformer

All the arguments are optional in this transformer. The user can specify;
- `columns` the columns to apply the log transform to
- `add_1` whether to add 1 to the column before applying the log transform, useful if you have 0s in the column
- `drop` to drop the original columns
- `suffix` to specify the suffix to add onto the original column names for the logged versions of these columns

In [8]:
log_1 = LogTransformer(
    columns=["AveRooms", "AveBedrms", "Population"],
    add_1=False,
    drop=True,
    suffix="log",
)

### LogTransformer fit
There is no fit method for the LogTransformer as the transformer only applies to log function.

### LogTransformer transform
Multiple column mappings were specified when creating `log_1` so these columns will be logged and then dropped. <br>
Notice that nulls are preserved when logging. The transformer uses `np.log`.

In [9]:
cali_df_2 = log_1.transform(cali_df)

In [10]:
cali_df_2[["AveRooms_log", "AveBedrms_log", "Population_log"]].head()

Unnamed: 0,AveRooms_log,AveBedrms_log,Population_log
0,1.94364,0.02353,5.774552
1,1.830682,-0.028522,7.783641
2,2.114825,0.070874,6.206576
3,1.760845,0.070514,6.324359
4,1.837665,0.077962,6.336826


In [11]:
[x in cali_df_2.columns for x in ["AveRooms", "AveBedrms", "Population"]]

[False, False, False]

## Adding 1 columns before transform
To deomstrate this feature we impute nulls with 0 in the `AGE` column. <br>
By setting the `add_1` argument to `True` a constant value of 1 will be added to the column before applying the log transform. <br>
This is useful if you have 0 values in the column, as the log of 0 is undefined and you will encounter a `RuntimeWarning` and have resulting `-inf` values in your output, if you try to log 0s.

In [12]:
cali_df["HouseAge"].fillna(0, inplace=True)

In [13]:
log_2 = LogTransformer(
    columns=["HouseAge"], add_1=True, drop=False, suffix="log_plus_1"
)

In [14]:
cali_df_3 = log_2.transform(cali_df)

In [15]:
cali_df_3[["HouseAge", "HouseAge_log_plus_1"]].head()

Unnamed: 0,HouseAge,HouseAge_log_plus_1
0,41.0,3.73767
1,21.0,3.091042
2,52.0,3.970292
3,52.0,3.970292
4,52.0,3.970292
