# BetweenDatesTransformer
This notebook shows the functionality in the `BetweenDatesTransformer` class. This transformer creates a new boolean indicator column showing if one date column is between 2 others, row-wise.

In [1]:
import pandas as pd
import numpy as np
import datetime

In [2]:
import tubular
from tubular.dates import BetweenDatesTransformer

In [3]:
tubular.__version__

'0.3.0'

## Create dummy dataset

In [4]:
df = pd.DataFrame(
    {
        "a": [
            datetime.datetime(1990, 2, 1),
            datetime.datetime(1990, 2, 1),
            datetime.datetime(1990, 2, 1),
            datetime.datetime(1990, 2, 1),
            datetime.datetime(1990, 2, 1),
            datetime.datetime(1990, 2, 1),
        ],
        "b": [
            datetime.datetime(1990, 1, 20),
            datetime.datetime(1990, 2, 1),
            datetime.datetime(1990, 2, 2),
            datetime.datetime(1990, 2, 6),
            datetime.datetime(1990, 3, 1),
            datetime.datetime(1990, 3, 2),
        ],
        "c": [
            datetime.datetime(1990, 3, 1),
            datetime.datetime(1990, 3, 1),
            datetime.datetime(1990, 3, 1),
            datetime.datetime(1990, 3, 1),
            datetime.datetime(1990, 3, 1),
            datetime.datetime(1990, 3, 1),
        ],
    }
)

In [5]:
df

Unnamed: 0,a,b,c
0,1990-02-01,1990-01-20,1990-03-01
1,1990-02-01,1990-02-01,1990-03-01
2,1990-02-01,1990-02-02,1990-03-01
3,1990-02-01,1990-02-06,1990-03-01
4,1990-02-01,1990-03-01,1990-03-01
5,1990-02-01,1990-03-02,1990-03-01


In [6]:
df.dtypes

a    datetime64[ns]
b    datetime64[ns]
c    datetime64[ns]
dtype: object

## Simple usage

### Initialising BetweenDatesTransformer

The user must specify the following; <br>
- `new_column_name` the name of the column to assign the results to <br> 
- `column_lower` the name of column containing lower limits for the comparison  <br>
- `column_between` the column containing datetime values to check if they fall between `column_lower` and `column_upper` <br>
- `column_upper` the name of column containing upper limits for the comparison  <br>

Optionally the user can also specify boolean values for `lower_inclusive` and `upper_inclusive` to set if the comparison should include or exlcude the limits. These are both defaulted to `True`.

In [7]:
between_dates_1 = BetweenDatesTransformer(
    column_lower="a", column_between="b", column_upper="c", new_column_name="d"
)

### BetweenDatesTransformer fit
There is no fit method for the `BetweenDatesTransformer` as the methods that it can run do not 'learn' anything from the data.

### BetweenDatesTransformer transform
When running transform with this configuration a new column `d` is added to the input `X`. 

In [8]:
df_2 = between_dates_1.transform(df)

In [9]:
df_2

Unnamed: 0,a,b,c,d
0,1990-02-01,1990-01-20,1990-03-01,False
1,1990-02-01,1990-02-01,1990-03-01,True
2,1990-02-01,1990-02-02,1990-03-01,True
3,1990-02-01,1990-02-06,1990-03-01,True
4,1990-02-01,1990-03-01,1990-03-01,True
5,1990-02-01,1990-03-02,1990-03-01,False


## Excluding comparison limits
By default `lower_inclusive` and `upper_inclusive` are set to `True`, but they can be varied independently to vary whether either limits are included in the comparison.

In [10]:
between_dates_2 = BetweenDatesTransformer(
    column_lower="a",
    column_between="b",
    column_upper="c",
    new_column_name="d",
    lower_inclusive=False,
    upper_inclusive=False,
)

In [11]:
df_3 = between_dates_2.transform(df)

In [12]:
df_3

Unnamed: 0,a,b,c,d
0,1990-02-01,1990-01-20,1990-03-01,False
1,1990-02-01,1990-02-01,1990-03-01,False
2,1990-02-01,1990-02-02,1990-03-01,True
3,1990-02-01,1990-02-06,1990-03-01,True
4,1990-02-01,1990-03-01,1990-03-01,False
5,1990-02-01,1990-03-02,1990-03-01,False
