<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#DateDiffTransformer" data-toc-modified-id="DateDiffTransformer-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>DateDiffTransformer</a></span><ul class="toc-item"><li><span><a href="#Create-and-Load-datetime-data" data-toc-modified-id="Create-and-Load-datetime-data-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Create and Load datetime data</a></span></li><li><span><a href="#Usage" data-toc-modified-id="Usage-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Usage</a></span><ul class="toc-item"><li><span><a href="#Years" data-toc-modified-id="Years-1.2.1"><span class="toc-item-num">1.2.1&nbsp;&nbsp;</span>Years</a></span></li><li><span><a href="#Months" data-toc-modified-id="Months-1.2.2"><span class="toc-item-num">1.2.2&nbsp;&nbsp;</span>Months</a></span></li><li><span><a href="#Days" data-toc-modified-id="Days-1.2.3"><span class="toc-item-num">1.2.3&nbsp;&nbsp;</span>Days</a></span></li><li><span><a href="#Hours" data-toc-modified-id="Hours-1.2.4"><span class="toc-item-num">1.2.4&nbsp;&nbsp;</span>Hours</a></span></li><li><span><a href="#Minutes" data-toc-modified-id="Minutes-1.2.5"><span class="toc-item-num">1.2.5&nbsp;&nbsp;</span>Minutes</a></span></li><li><span><a href="#Seconds" data-toc-modified-id="Seconds-1.2.6"><span class="toc-item-num">1.2.6&nbsp;&nbsp;</span>Seconds</a></span></li></ul></li></ul></li></ul></div>

# DateDiffTransformer
This notebook shows the functionality in the `DateDifferenceTransformer` class. This transformer calculates the difference between 2 date fields in specified units

In [1]:
import datetime
import pandas as pd
import numpy as np

In [2]:
import tubular
from tubular.dates import DateDifferenceTransformer

In [3]:
tubular.__version__

'0.3.0'

## Create and Load datetime data

In [4]:
def create_datetime_data():
    
    seconds_1 = np.random.randint(0, 59, 10)
    mins_1 = np.random.randint(0, 59, 10)
    hours_1 = np.random.randint(0, 23, 10)
    days_1 = np.random.randint(1, 29, 10)
    months_1 = np.random.randint(1, 13, 10)
    years_1 = np.random.randint(1970, 2000, 10)

    seconds_2 = np.random.randint(0, 59, 10)
    mins_2 = np.random.randint(0, 59, 10)
    hours_2 = np.random.randint(0, 23, 10)
    days_2 = np.random.randint(1, 29, 10)
    months_2 = np.random.randint(1, 13, 10)
    years_2 = np.random.randint(2010, 2020, 10)
    
    date_1 = [datetime.datetime(a, b, c, x, y, z) for a, b, c, x, y, z in zip(years_1, months_1, days_1, hours_1, mins_1, seconds_1)]
    date_2 = [datetime.datetime(a, b, c, x, y, z) for a, b, c, x, y, z in zip(years_2, months_2, days_2, hours_2, mins_2, seconds_2)]
    
    data = pd.DataFrame({"date_of_birth": date_1, "sale_date": date_2})
    
    return data

In [5]:
datetime_data = create_datetime_data()

In [6]:
datetime_data

Unnamed: 0,date_of_birth,sale_date
0,1986-07-02 20:22:01,2019-01-19 07:28:15
1,1992-05-21 11:08:53,2011-04-20 01:21:54
2,1992-09-28 10:54:55,2010-04-16 11:27:18
3,1992-01-24 02:13:39,2014-09-08 20:08:26
4,1994-06-14 20:39:50,2015-11-20 19:39:07
5,1996-07-12 03:24:36,2018-05-20 20:09:18
6,1970-10-02 04:11:19,2014-10-11 16:16:20
7,1980-08-08 00:10:10,2014-02-22 21:11:43
8,1970-06-18 09:10:12,2016-09-23 22:48:48
9,1985-09-03 12:13:42,2016-05-27 04:12:46


In [7]:
datetime_data.dtypes

date_of_birth    datetime64[ns]
sale_date        datetime64[ns]
dtype: object

## Usage
The transformer requires 4 arguments:
- `column_lower`: the datetime column that is being subtracted.
- `column_upper`: the datetime column that is subtracted from.
- `new_column_name`: the name of the new age column.
- `units`: the time units: 'Y', 'M', 'D', 'h', 'm' or 's'

### Years

In [8]:
date_difference_transformer = DateDifferenceTransformer(
    column_lower="date_of_birth",
    column_upper="sale_date",
    new_column_name="years",
    units='Y',
)

In [9]:
transformed_data_years = date_difference_transformer.transform(datetime_data)

In [10]:
transformed_data_years

Unnamed: 0,date_of_birth,sale_date,years
0,1986-07-02 20:22:01,2019-01-19 07:28:15,32.549505
1,1992-05-21 11:08:53,2011-04-20 01:21:54,18.912346
2,1992-09-28 10:54:55,2010-04-16 11:27:18,17.547308
3,1992-01-24 02:13:39,2014-09-08 20:08:26,22.625369
4,1994-06-14 20:39:50,2015-11-20 19:39:07,21.434959
5,1996-07-12 03:24:36,2018-05-20 20:09:18,21.855884
6,1970-10-02 04:11:19,2014-10-11 16:16:20,44.026923
7,1980-08-08 00:10:10,2014-02-22 21:11:43,33.544497
8,1970-06-18 09:10:12,2016-09-23 22:48:48,46.269447
9,1985-09-03 12:13:42,2016-05-27 04:12:46,30.729354


### Months

In [11]:
date_difference_transformer = DateDifferenceTransformer(
    column_lower="date_of_birth",
    column_upper="sale_date",
    new_column_name="months",
    units='M',
)

In [12]:
transformed_data_months = date_difference_transformer.transform(transformed_data_years)

In [13]:
transformed_data_months

Unnamed: 0,date_of_birth,sale_date,years,months
0,1986-07-02 20:22:01,2019-01-19 07:28:15,32.549505,390.594063
1,1992-05-21 11:08:53,2011-04-20 01:21:54,18.912346,226.948147
2,1992-09-28 10:54:55,2010-04-16 11:27:18,17.547308,210.567691
3,1992-01-24 02:13:39,2014-09-08 20:08:26,22.625369,271.504429
4,1994-06-14 20:39:50,2015-11-20 19:39:07,21.434959,257.219502
5,1996-07-12 03:24:36,2018-05-20 20:09:18,21.855884,262.270608
6,1970-10-02 04:11:19,2014-10-11 16:16:20,44.026923,528.323078
7,1980-08-08 00:10:10,2014-02-22 21:11:43,33.544497,402.533968
8,1970-06-18 09:10:12,2016-09-23 22:48:48,46.269447,555.233363
9,1985-09-03 12:13:42,2016-05-27 04:12:46,30.729354,368.752246


### Days

In [14]:
date_difference_transformer = DateDifferenceTransformer(
    column_lower="date_of_birth",
    column_upper="sale_date",
    new_column_name="days",
    units='D',
)

In [15]:
transformed_data_days = date_difference_transformer.transform(transformed_data_months)

In [16]:
transformed_data_days

Unnamed: 0,date_of_birth,sale_date,years,months,days
0,1986-07-02 20:22:01,2019-01-19 07:28:15,32.549505,390.594063,11888.462662
1,1992-05-21 11:08:53,2011-04-20 01:21:54,18.912346,226.948147,6907.592373
2,1992-09-28 10:54:55,2010-04-16 11:27:18,17.547308,210.567691,6409.022488
3,1992-01-24 02:13:39,2014-09-08 20:08:26,22.625369,271.504429,8263.746377
4,1994-06-14 20:39:50,2015-11-20 19:39:07,21.434959,257.219502,7828.957836
5,1996-07-12 03:24:36,2018-05-20 20:09:18,21.855884,262.270608,7982.697708
6,1970-10-02 04:11:19,2014-10-11 16:16:20,44.026923,528.323078,16080.503484
7,1980-08-08 00:10:10,2014-02-22 21:11:43,33.544497,402.533968,12251.876076
8,1970-06-18 09:10:12,2016-09-23 22:48:48,46.269447,555.233363,16899.568472
9,1985-09-03 12:13:42,2016-05-27 04:12:46,30.729354,368.752246,11223.666019


### Hours

In [17]:
date_difference_transformer = DateDifferenceTransformer(
    column_lower="date_of_birth",
    column_upper="sale_date",
    new_column_name="hours",
    units='h',
)

In [18]:
transformed_data_hours = date_difference_transformer.transform(transformed_data_days)

In [19]:
transformed_data_hours

Unnamed: 0,date_of_birth,sale_date,years,months,days,hours
0,1986-07-02 20:22:01,2019-01-19 07:28:15,32.549505,390.594063,11888.462662,285323.103889
1,1992-05-21 11:08:53,2011-04-20 01:21:54,18.912346,226.948147,6907.592373,165782.216944
2,1992-09-28 10:54:55,2010-04-16 11:27:18,17.547308,210.567691,6409.022488,153816.539722
3,1992-01-24 02:13:39,2014-09-08 20:08:26,22.625369,271.504429,8263.746377,198329.913056
4,1994-06-14 20:39:50,2015-11-20 19:39:07,21.434959,257.219502,7828.957836,187894.988056
5,1996-07-12 03:24:36,2018-05-20 20:09:18,21.855884,262.270608,7982.697708,191584.745
6,1970-10-02 04:11:19,2014-10-11 16:16:20,44.026923,528.323078,16080.503484,385932.083611
7,1980-08-08 00:10:10,2014-02-22 21:11:43,33.544497,402.533968,12251.876076,294045.025833
8,1970-06-18 09:10:12,2016-09-23 22:48:48,46.269447,555.233363,16899.568472,405589.643333
9,1985-09-03 12:13:42,2016-05-27 04:12:46,30.729354,368.752246,11223.666019,269367.984444


### Minutes

In [20]:
date_difference_transformer = DateDifferenceTransformer(
    column_lower="date_of_birth",
    column_upper="sale_date",
    new_column_name="minutes",
    units='m',
)

In [21]:
transformed_data_minutes = date_difference_transformer.transform(transformed_data_hours)

In [22]:
transformed_data_minutes

Unnamed: 0,date_of_birth,sale_date,years,months,days,hours,minutes
0,1986-07-02 20:22:01,2019-01-19 07:28:15,32.549505,390.594063,11888.462662,285323.103889,17119390.0
1,1992-05-21 11:08:53,2011-04-20 01:21:54,18.912346,226.948147,6907.592373,165782.216944,9946933.0
2,1992-09-28 10:54:55,2010-04-16 11:27:18,17.547308,210.567691,6409.022488,153816.539722,9228992.0
3,1992-01-24 02:13:39,2014-09-08 20:08:26,22.625369,271.504429,8263.746377,198329.913056,11899790.0
4,1994-06-14 20:39:50,2015-11-20 19:39:07,21.434959,257.219502,7828.957836,187894.988056,11273700.0
5,1996-07-12 03:24:36,2018-05-20 20:09:18,21.855884,262.270608,7982.697708,191584.745,11495080.0
6,1970-10-02 04:11:19,2014-10-11 16:16:20,44.026923,528.323078,16080.503484,385932.083611,23155930.0
7,1980-08-08 00:10:10,2014-02-22 21:11:43,33.544497,402.533968,12251.876076,294045.025833,17642700.0
8,1970-06-18 09:10:12,2016-09-23 22:48:48,46.269447,555.233363,16899.568472,405589.643333,24335380.0
9,1985-09-03 12:13:42,2016-05-27 04:12:46,30.729354,368.752246,11223.666019,269367.984444,16162080.0


### Seconds

In [23]:
date_difference_transformer = DateDifferenceTransformer(
    column_lower="date_of_birth",
    column_upper="sale_date",
    new_column_name="seconds",
    units='s',
)

In [24]:
transformed_data_seconds = date_difference_transformer.transform(transformed_data_minutes)

In [25]:
transformed_data_seconds 

Unnamed: 0,date_of_birth,sale_date,years,months,days,hours,minutes,seconds
0,1986-07-02 20:22:01,2019-01-19 07:28:15,32.549505,390.594063,11888.462662,285323.103889,17119390.0,1027163000.0
1,1992-05-21 11:08:53,2011-04-20 01:21:54,18.912346,226.948147,6907.592373,165782.216944,9946933.0,596816000.0
2,1992-09-28 10:54:55,2010-04-16 11:27:18,17.547308,210.567691,6409.022488,153816.539722,9228992.0,553739500.0
3,1992-01-24 02:13:39,2014-09-08 20:08:26,22.625369,271.504429,8263.746377,198329.913056,11899790.0,713987700.0
4,1994-06-14 20:39:50,2015-11-20 19:39:07,21.434959,257.219502,7828.957836,187894.988056,11273700.0,676422000.0
5,1996-07-12 03:24:36,2018-05-20 20:09:18,21.855884,262.270608,7982.697708,191584.745,11495080.0,689705100.0
6,1970-10-02 04:11:19,2014-10-11 16:16:20,44.026923,528.323078,16080.503484,385932.083611,23155930.0,1389356000.0
7,1980-08-08 00:10:10,2014-02-22 21:11:43,33.544497,402.533968,12251.876076,294045.025833,17642700.0,1058562000.0
8,1970-06-18 09:10:12,2016-09-23 22:48:48,46.269447,555.233363,16899.568472,405589.643333,24335380.0,1460123000.0
9,1985-09-03 12:13:42,2016-05-27 04:12:46,30.729354,368.752246,11223.666019,269367.984444,16162080.0,969724700.0
