# DatetimeInfoExtractor
This notebook shows the functionality of the `DatetimeInfoExtractor` class. This transformer extracts information from a `datetime` type column - such as the hour of the day, the month, etc - and then maps it to a label. The transformer contains default mappings for each set of extracted information, which can be overridden with an optional parameter.

In [1]:
import pandas as pd
import numpy as np

In [2]:
import tubular
from tubular.dates import DatetimeInfoExtractor

import tests.test_data as td

In [3]:
tubular.__version__

'0.3.3'

## Load dataset
Here we load in a dataset with datetime dtypes

In [4]:
df = td.create_datediff_test_df()

In [5]:
df

Unnamed: 0,a,b
0,1993-09-27 11:58:58,2020-05-01 12:59:59
1,2000-03-19 12:59:59,2019-12-25 11:58:58
2,2018-11-10 11:59:59,2018-11-10 11:59:59
3,2018-10-10 11:59:59,2018-11-10 11:59:59
4,2018-10-10 11:59:59,2018-09-10 09:59:59
5,2018-10-10 10:59:59,2015-11-10 11:59:59
6,2018-12-10 11:59:59,2015-11-10 12:59:59
7,1985-07-23 11:59:59,2015-07-23 11:59:59


In [6]:
df.dtypes

a    datetime64[ns]
b    datetime64[ns]
dtype: object

## Simple usage

### Initialising DatetimeInfoExtractor

The user must specify the following;
- `columns` the datetime columns in the `DataFrame` passed to the `transform` method to extract information from.

The user can also choose to specify;
- `include` the information to extract. Must be a list containing some or all of the following - `["timeofday", "timeofmonth", "timeofyear", "dayofweek"]`

If one of these is in include but no mappings are provided default values will be used as follows:
    timeofday_mapping = {
        "night": range(0, 6),  # Midnight - 6am
        "morning": range(6, 12),  # 6am - Noon
        "afternoon": range(12, 18),  # Noon - 6pm
        "evening": range(18, 24),  # 6pm - Midnight
    }
    timeofmonth_mapping = {
        "start": range(0, 11),
        "middle": range(11, 21),
        "end": range(21, 32),
    }
    timeofyear_mapping = {
        "spring": range(3, 6),  # Mar, Apr, May
        "summer": range(6, 9),  # Jun, Jul, Aug
        "autumn": range(9, 12),  # Sep, Oct, Nov
        "winter": [12, 1, 2],  # Dec, Jan, Feb
    }
    dayofweek_mapping = {
        "monday": [0],
        "tuesday": [1],
        "wednesday": [2],
        "thursday": [3],
        "friday": [4],
        "saturday": [5],
        "sunday": [6],
    }

In [7]:
simple_datetime_extractor = DatetimeInfoExtractor(columns=["a"])

### DatetimeInfoExtractor fit
There is no fit method for the `DatetimeInfoExtractor` as the methods that it can run do not 'learn' anything from the data.

### DatetimeInfoExtractor transform
When running transform with this configuration a new column `a_timeofday_` is added to the input `X`.

In [8]:
df_2 = simple_datetime_extractor.transform(df)

In [9]:
df_2[["a", "a_timeofday"]].head()

Unnamed: 0,a,a_timeofday
0,1993-09-27 11:58:58,morning
1,2000-03-19 12:59:59,afternoon
2,2018-11-10 11:59:59,morning
3,2018-10-10 11:59:59,morning
4,2018-10-10 11:59:59,morning


## Use Custom Mappings

The user can choose to specify individual mappings for any of the features extracted. The `datetime_mappings` must take the following form:

`datetime_mappings = {"feature_to_map": {"label": [List_to_map]}}`

All hours/days/months must be mapped for each feature

ie, a mapping for `dayofweek` must include all values 0-6;
datetime_mappings = {"dayofweek": {"week": [0, 1, 2, 3, 4],
                                    "weekend": [5, 6]}}
The values for the mapping array must be iterable;
datetime_mappings = {"timeofday": {"am": range(0, 12),
                                    "pm": range(12, 24)}}

Keys of the dictionary must be contained in `include`

The required ranges for each mapping are:
- timeofday: 0-23
- timeofmonth: 1-31
- timeofyear: 1-12
- dayofweek: 0-6


In [10]:
datetime_mappings = {
    "timeofday": {"am": range(0, 12), "pm": range(12, 24)},
    "dayofweek": {"week": range(0, 5), "weekend": [5, 6]},
}

This `datetime_mapping` can then be used when creating the transformer. It is important to note that the transformer will only extract features in `include`, so all features with defined mappings must be in `include`

In [11]:
time_and_day_transformer = DatetimeInfoExtractor(
    columns=["b"],
    include=["timeofday", "dayofweek"],
    datetime_mappings=datetime_mappings,
)

df3 = time_and_day_transformer.transform(df)

df3[["b", "b_timeofday", "b_dayofweek"]]

Unnamed: 0,b,b_timeofday,b_dayofweek
0,2020-05-01 12:59:59,pm,week
1,2019-12-25 11:58:58,am,week
2,2018-11-10 11:59:59,am,weekend
3,2018-11-10 11:59:59,am,weekend
4,2018-09-10 09:59:59,am,week
5,2015-11-10 11:59:59,am,week
6,2015-11-10 12:59:59,pm,week
7,2015-07-23 11:59:59,am,week
