# DateDiffLeapYearTransformer
This notebook shows the functionality in the DateDiffLeapYearTransformer class. This transformer calculates the age gap between two datetime columns in a pandas DataFrame. The transformer doesn't use np.timedelta64 to avoid miscalculations due to leap years.<br>

In [1]:
import datetime
import pandas as pd
import numpy as np

In [2]:
import tubular
from tubular.dates import DateDiffLeapYearTransformer

In [3]:
tubular.__version__

'0.2.8'

## Create and Load datetime data

In [4]:
def create_datetime_data():
    
    days_1 = np.random.randint(1, 29, 10)
    months_1 = np.random.randint(1, 13, 10)
    years_1 = np.random.randint(1970, 2000, 10)

    days_2 = np.random.randint(1, 29, 10)
    months_2 = np.random.randint(1, 13, 10)
    years_2 = np.random.randint(2010, 2020, 10)
    
    date_1 = [datetime.date(x, y, z) for x, y, z in zip(years_1, months_1, days_1)]
    date_2 = [datetime.date(x, y, z) for x, y, z in zip(years_2, months_2, days_2)]
    
    data = pd.DataFrame({"date_of_birth": date_1, "sale_date": date_2})
    
    return data

In [5]:
datetime_data = create_datetime_data()

In [6]:
datetime_data

Unnamed: 0,date_of_birth,sale_date
0,1992-03-17,2015-07-22
1,1971-01-08,2018-02-20
2,1976-08-14,2018-11-03
3,1998-01-24,2012-04-27
4,1999-12-25,2014-08-21
5,1980-08-08,2019-04-03
6,1974-10-28,2011-12-21
7,1987-02-14,2010-06-06
8,1974-09-26,2019-03-03
9,1986-03-26,2010-12-14


In [7]:
datetime_data.dtypes

date_of_birth    object
sale_date        object
dtype: object

## Usage
The transformer requires 4 arguments:
- column_lower: the datetime column that is being subtracted.
- column_upper: the datetime column that is subtracted from.
- new_column_name: the name of the new age column.
- drop_cols: boolean to determine wherther column_lower and column_upper are dropped after the calculation.


### Keeping old columns

In [8]:
date_diff_leap_year_transformer = DateDiffLeapYearTransformer(
    column_lower="date_of_birth",
    column_upper="sale_date",
    new_column_name="age",
    drop_cols=False,
)

In [9]:
transformed_data = date_diff_leap_year_transformer.transform(datetime_data)

In [10]:
transformed_data

Unnamed: 0,date_of_birth,sale_date,age
0,1992-03-17,2015-07-22,23
1,1971-01-08,2018-02-20,47
2,1976-08-14,2018-11-03,42
3,1998-01-24,2012-04-27,14
4,1999-12-25,2014-08-21,14
5,1980-08-08,2019-04-03,38
6,1974-10-28,2011-12-21,37
7,1987-02-14,2010-06-06,23
8,1974-09-26,2019-03-03,44
9,1986-03-26,2010-12-14,24


### Dropping old columns

In [11]:
date_diff_leap_year_transformer = DateDiffLeapYearTransformer(
    column_lower="date_of_birth",
    column_upper="sale_date",
    new_column_name="age",
    drop_cols=True,
)

In [12]:
transformed_data_2 = date_diff_leap_year_transformer.transform(datetime_data)

In [13]:
transformed_data_2

Unnamed: 0,age
0,23
1,47
2,42
3,14
4,14
5,38
6,37
7,23
8,44
9,24
