# DateDiffLeapYearTransformer
This notebook shows the functionality in the DateDiffLeapYearTransformer class. This transformer calculates the age gap between two datetime columns in a pandas DataFrame. The transformer doesn't use np.timedelta64 to avoid miscalculations due to leap years.<br>

In [1]:
import datetime
import pandas as pd
import numpy as np

In [2]:
import tubular
from tubular.dates import DateDiffLeapYearTransformer

In [3]:
tubular.__version__

'0.3.0'

## Create and Load datetime data

In [4]:
def create_datetime_data():
    days_1 = np.random.randint(1, 29, 10)
    months_1 = np.random.randint(1, 13, 10)
    years_1 = np.random.randint(1970, 2000, 10)

    days_2 = np.random.randint(1, 29, 10)
    months_2 = np.random.randint(1, 13, 10)
    years_2 = np.random.randint(2010, 2020, 10)

    date_1 = [datetime.date(x, y, z) for x, y, z in zip(years_1, months_1, days_1)]
    date_2 = [datetime.date(x, y, z) for x, y, z in zip(years_2, months_2, days_2)]

    data = pd.DataFrame({"date_of_birth": date_1, "sale_date": date_2})

    return data

In [5]:
datetime_data = create_datetime_data()

In [6]:
datetime_data

Unnamed: 0,date_of_birth,sale_date
0,1971-07-04,2014-10-27
1,1970-10-28,2010-03-10
2,1972-11-07,2014-08-09
3,1989-08-22,2018-10-02
4,1991-03-16,2010-05-28
5,1984-12-21,2017-11-16
6,1976-06-22,2018-03-13
7,1993-04-13,2016-12-03
8,1972-04-10,2011-06-08
9,1990-12-26,2012-11-26


In [7]:
datetime_data.dtypes

date_of_birth    object
sale_date        object
dtype: object

## Usage
The transformer requires 4 arguments:
- column_lower: the datetime column that is being subtracted.
- column_upper: the datetime column that is subtracted from.
- new_column_name: the name of the new age column.
- drop_cols: boolean to determine wherther column_lower and column_upper are dropped after the calculation.


### Keeping old columns

In [8]:
date_diff_leap_year_transformer = DateDiffLeapYearTransformer(
    column_lower="date_of_birth",
    column_upper="sale_date",
    new_column_name="age",
    drop_cols=False,
)

In [9]:
transformed_data = date_diff_leap_year_transformer.transform(datetime_data)

In [10]:
transformed_data

Unnamed: 0,date_of_birth,sale_date,age
0,1971-07-04,2014-10-27,43
1,1970-10-28,2010-03-10,39
2,1972-11-07,2014-08-09,41
3,1989-08-22,2018-10-02,29
4,1991-03-16,2010-05-28,19
5,1984-12-21,2017-11-16,32
6,1976-06-22,2018-03-13,41
7,1993-04-13,2016-12-03,23
8,1972-04-10,2011-06-08,39
9,1990-12-26,2012-11-26,21


### Dropping old columns

In [11]:
date_diff_leap_year_transformer = DateDiffLeapYearTransformer(
    column_lower="date_of_birth",
    column_upper="sale_date",
    new_column_name="age",
    drop_cols=True,
)

In [12]:
transformed_data_2 = date_diff_leap_year_transformer.transform(datetime_data)

In [13]:
transformed_data_2

Unnamed: 0,age
0,43
1,39
2,41
3,29
4,19
5,32
6,41
7,23
8,39
9,21
