<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#DateDiffTransformer" data-toc-modified-id="DateDiffTransformer-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>DateDiffTransformer</a></span><ul class="toc-item"><li><span><a href="#Create-and-Load-datetime-data" data-toc-modified-id="Create-and-Load-datetime-data-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Create and Load datetime data</a></span></li><li><span><a href="#Usage" data-toc-modified-id="Usage-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Usage</a></span><ul class="toc-item"><li><span><a href="#Years" data-toc-modified-id="Years-1.2.1"><span class="toc-item-num">1.2.1&nbsp;&nbsp;</span>Years</a></span></li><li><span><a href="#Months" data-toc-modified-id="Months-1.2.2"><span class="toc-item-num">1.2.2&nbsp;&nbsp;</span>Months</a></span></li><li><span><a href="#Days" data-toc-modified-id="Days-1.2.3"><span class="toc-item-num">1.2.3&nbsp;&nbsp;</span>Days</a></span></li><li><span><a href="#Hours" data-toc-modified-id="Hours-1.2.4"><span class="toc-item-num">1.2.4&nbsp;&nbsp;</span>Hours</a></span></li><li><span><a href="#Minutes" data-toc-modified-id="Minutes-1.2.5"><span class="toc-item-num">1.2.5&nbsp;&nbsp;</span>Minutes</a></span></li><li><span><a href="#Seconds" data-toc-modified-id="Seconds-1.2.6"><span class="toc-item-num">1.2.6&nbsp;&nbsp;</span>Seconds</a></span></li></ul></li></ul></li></ul></div>

# DateDiffTransformer
This notebook shows the functionality in the `DateDifferenceTransformer` class. This transformer calculates the difference between 2 date fields in specified units

In [1]:
import datetime
import pandas as pd
import numpy as np

In [2]:
import tubular
from tubular.dates import DateDifferenceTransformer

In [3]:
tubular.__version__

'0.2.8'

## Create and Load datetime data

In [4]:
def create_datetime_data():
    
    seconds_1 = np.random.randint(0, 59, 10)
    mins_1 = np.random.randint(0, 59, 10)
    hours_1 = np.random.randint(0, 23, 10)
    days_1 = np.random.randint(1, 29, 10)
    months_1 = np.random.randint(1, 13, 10)
    years_1 = np.random.randint(1970, 2000, 10)

    seconds_2 = np.random.randint(0, 59, 10)
    mins_2 = np.random.randint(0, 59, 10)
    hours_2 = np.random.randint(0, 23, 10)
    days_2 = np.random.randint(1, 29, 10)
    months_2 = np.random.randint(1, 13, 10)
    years_2 = np.random.randint(2010, 2020, 10)
    
    date_1 = [datetime.datetime(a, b, c, x, y, z) for a, b, c, x, y, z in zip(years_1, months_1, days_1, hours_1, mins_1, seconds_1)]
    date_2 = [datetime.datetime(a, b, c, x, y, z) for a, b, c, x, y, z in zip(years_2, months_2, days_2, hours_2, mins_2, seconds_2)]
    
    data = pd.DataFrame({"date_of_birth": date_1, "sale_date": date_2})
    
    return data

In [5]:
datetime_data = create_datetime_data()

In [6]:
datetime_data

Unnamed: 0,date_of_birth,sale_date
0,1997-05-27 12:16:24,2010-09-16 10:48:20
1,1989-05-22 00:09:46,2015-11-11 18:02:58
2,1983-04-13 17:10:49,2015-06-09 19:58:49
3,1983-10-11 02:16:11,2014-11-19 04:16:35
4,1986-02-19 15:04:47,2018-10-10 06:18:31
5,1987-02-16 09:23:41,2014-01-22 19:07:31
6,1993-06-13 09:06:02,2012-03-01 03:13:02
7,1989-10-20 04:00:33,2016-02-07 02:22:19
8,1983-06-28 08:41:02,2014-05-24 15:31:30
9,1988-01-06 00:27:23,2012-03-16 16:13:11


In [7]:
datetime_data.dtypes

date_of_birth    datetime64[ns]
sale_date        datetime64[ns]
dtype: object

## Usage
The transformer requires 4 arguments:
- `column_lower`: the datetime column that is being subtracted.
- `column_upper`: the datetime column that is subtracted from.
- `new_column_name`: the name of the new age column.
- `units`: the time units: 'Y', 'M', 'D', 'h', 'm' or 's'

### Years

In [8]:
date_difference_transformer = DateDifferenceTransformer(
    column_lower="date_of_birth",
    column_upper="sale_date",
    new_column_name="years",
    units='Y',
)

In [9]:
transformed_data_years = date_difference_transformer.transform(datetime_data)

In [10]:
transformed_data_years

Unnamed: 0,date_of_birth,sale_date,years
0,1997-05-27 12:16:24,2010-09-16 10:48:20,13.306061
1,1989-05-22 00:09:46,2015-11-11 18:02:58,26.474863
2,1983-04-13 17:10:49,2015-06-09 19:58:49,32.157037
3,1983-10-11 02:16:11,2014-11-19 04:16:35,31.108328
4,1986-02-19 15:04:47,2018-10-10 06:18:31,32.637589
5,1987-02-16 09:23:41,2014-01-22 19:07:31,26.933901
6,1993-06-13 09:06:02,2012-03-01 03:13:02,18.715661
7,1989-10-20 04:00:33,2016-02-07 02:22:19,26.300148
8,1983-06-28 08:41:02,2014-05-24 15:31:30,30.906275
9,1988-01-06 00:27:23,2012-03-16 16:13:11,24.193945


### Months

In [11]:
date_difference_transformer = DateDifferenceTransformer(
    column_lower="date_of_birth",
    column_upper="sale_date",
    new_column_name="months",
    units='M',
)

In [12]:
transformed_data_months = date_difference_transformer.transform(transformed_data_years)

In [13]:
transformed_data_months

Unnamed: 0,date_of_birth,sale_date,years,months
0,1997-05-27 12:16:24,2010-09-16 10:48:20,13.306061,159.672727
1,1989-05-22 00:09:46,2015-11-11 18:02:58,26.474863,317.69836
2,1983-04-13 17:10:49,2015-06-09 19:58:49,32.157037,385.884447
3,1983-10-11 02:16:11,2014-11-19 04:16:35,31.108328,373.29994
4,1986-02-19 15:04:47,2018-10-10 06:18:31,32.637589,391.651066
5,1987-02-16 09:23:41,2014-01-22 19:07:31,26.933901,323.206815
6,1993-06-13 09:06:02,2012-03-01 03:13:02,18.715661,224.587934
7,1989-10-20 04:00:33,2016-02-07 02:22:19,26.300148,315.601775
8,1983-06-28 08:41:02,2014-05-24 15:31:30,30.906275,370.875297
9,1988-01-06 00:27:23,2012-03-16 16:13:11,24.193945,290.327335


### Days

In [14]:
date_difference_transformer = DateDifferenceTransformer(
    column_lower="date_of_birth",
    column_upper="sale_date",
    new_column_name="days",
    units='D',
)

In [15]:
transformed_data_days = date_difference_transformer.transform(transformed_data_months)

In [16]:
transformed_data_days

Unnamed: 0,date_of_birth,sale_date,years,months,days
0,1997-05-27 12:16:24,2010-09-16 10:48:20,13.306061,159.672727,4859.938843
1,1989-05-22 00:09:46,2015-11-11 18:02:58,26.474863,317.69836,9669.745278
2,1983-04-13 17:10:49,2015-06-09 19:58:49,32.157037,385.884447,11745.116667
3,1983-10-11 02:16:11,2014-11-19 04:16:35,31.108328,373.29994,11362.083611
4,1986-02-19 15:04:47,2018-10-10 06:18:31,32.637589,391.651066,11920.634537
5,1987-02-16 09:23:41,2014-01-22 19:07:31,26.933901,323.206815,9837.40544
6,1993-06-13 09:06:02,2012-03-01 03:13:02,18.715661,224.587934,6835.754861
7,1989-10-20 04:00:33,2016-02-07 02:22:19,26.300148,315.601775,9605.931782
8,1983-06-28 08:41:02,2014-05-24 15:31:30,30.906275,370.875297,11288.285046
9,1988-01-06 00:27:23,2012-03-16 16:13:11,24.193945,290.327335,8836.656806


### Hours

In [17]:
date_difference_transformer = DateDifferenceTransformer(
    column_lower="date_of_birth",
    column_upper="sale_date",
    new_column_name="hours",
    units='h',
)

In [18]:
transformed_data_hours = date_difference_transformer.transform(transformed_data_days)

In [19]:
transformed_data_hours

Unnamed: 0,date_of_birth,sale_date,years,months,days,hours
0,1997-05-27 12:16:24,2010-09-16 10:48:20,13.306061,159.672727,4859.938843,116638.532222
1,1989-05-22 00:09:46,2015-11-11 18:02:58,26.474863,317.69836,9669.745278,232073.886667
2,1983-04-13 17:10:49,2015-06-09 19:58:49,32.157037,385.884447,11745.116667,281882.8
3,1983-10-11 02:16:11,2014-11-19 04:16:35,31.108328,373.29994,11362.083611,272690.006667
4,1986-02-19 15:04:47,2018-10-10 06:18:31,32.637589,391.651066,11920.634537,286095.228889
5,1987-02-16 09:23:41,2014-01-22 19:07:31,26.933901,323.206815,9837.40544,236097.730556
6,1993-06-13 09:06:02,2012-03-01 03:13:02,18.715661,224.587934,6835.754861,164058.116667
7,1989-10-20 04:00:33,2016-02-07 02:22:19,26.300148,315.601775,9605.931782,230542.362778
8,1983-06-28 08:41:02,2014-05-24 15:31:30,30.906275,370.875297,11288.285046,270918.841111
9,1988-01-06 00:27:23,2012-03-16 16:13:11,24.193945,290.327335,8836.656806,212079.763333


### Minutes

In [20]:
date_difference_transformer = DateDifferenceTransformer(
    column_lower="date_of_birth",
    column_upper="sale_date",
    new_column_name="minutes",
    units='m',
)

In [21]:
transformed_data_minutes = date_difference_transformer.transform(transformed_data_hours)

In [22]:
transformed_data_minutes

Unnamed: 0,date_of_birth,sale_date,years,months,days,hours,minutes
0,1997-05-27 12:16:24,2010-09-16 10:48:20,13.306061,159.672727,4859.938843,116638.532222,6998312.0
1,1989-05-22 00:09:46,2015-11-11 18:02:58,26.474863,317.69836,9669.745278,232073.886667,13924430.0
2,1983-04-13 17:10:49,2015-06-09 19:58:49,32.157037,385.884447,11745.116667,281882.8,16912970.0
3,1983-10-11 02:16:11,2014-11-19 04:16:35,31.108328,373.29994,11362.083611,272690.006667,16361400.0
4,1986-02-19 15:04:47,2018-10-10 06:18:31,32.637589,391.651066,11920.634537,286095.228889,17165710.0
5,1987-02-16 09:23:41,2014-01-22 19:07:31,26.933901,323.206815,9837.40544,236097.730556,14165860.0
6,1993-06-13 09:06:02,2012-03-01 03:13:02,18.715661,224.587934,6835.754861,164058.116667,9843487.0
7,1989-10-20 04:00:33,2016-02-07 02:22:19,26.300148,315.601775,9605.931782,230542.362778,13832540.0
8,1983-06-28 08:41:02,2014-05-24 15:31:30,30.906275,370.875297,11288.285046,270918.841111,16255130.0
9,1988-01-06 00:27:23,2012-03-16 16:13:11,24.193945,290.327335,8836.656806,212079.763333,12724790.0


### Seconds

In [23]:
date_difference_transformer = DateDifferenceTransformer(
    column_lower="date_of_birth",
    column_upper="sale_date",
    new_column_name="seconds",
    units='s',
)

In [24]:
transformed_data_seconds = date_difference_transformer.transform(transformed_data_minutes)

In [25]:
transformed_data_seconds 

Unnamed: 0,date_of_birth,sale_date,years,months,days,hours,minutes,seconds
0,1997-05-27 12:16:24,2010-09-16 10:48:20,13.306061,159.672727,4859.938843,116638.532222,6998312.0,419898700.0
1,1989-05-22 00:09:46,2015-11-11 18:02:58,26.474863,317.69836,9669.745278,232073.886667,13924430.0,835466000.0
2,1983-04-13 17:10:49,2015-06-09 19:58:49,32.157037,385.884447,11745.116667,281882.8,16912970.0,1014778000.0
3,1983-10-11 02:16:11,2014-11-19 04:16:35,31.108328,373.29994,11362.083611,272690.006667,16361400.0,981684000.0
4,1986-02-19 15:04:47,2018-10-10 06:18:31,32.637589,391.651066,11920.634537,286095.228889,17165710.0,1029943000.0
5,1987-02-16 09:23:41,2014-01-22 19:07:31,26.933901,323.206815,9837.40544,236097.730556,14165860.0,849951800.0
6,1993-06-13 09:06:02,2012-03-01 03:13:02,18.715661,224.587934,6835.754861,164058.116667,9843487.0,590609200.0
7,1989-10-20 04:00:33,2016-02-07 02:22:19,26.300148,315.601775,9605.931782,230542.362778,13832540.0,829952500.0
8,1983-06-28 08:41:02,2014-05-24 15:31:30,30.906275,370.875297,11288.285046,270918.841111,16255130.0,975307800.0
9,1988-01-06 00:27:23,2012-03-16 16:13:11,24.193945,290.327335,8836.656806,212079.763333,12724790.0,763487100.0
