### Handeling Dates in Python
When reading in date columns, the first step should always be to normalize them to an enriched date datetime64[ns] format.  Csv are some times auto converted, but often dates are stored as strings and need to be converted.

using pd.to_datetime to auto coerrence string columns to date formats

In [32]:
import pandas as pd
df = pd.DataFrame({'start': ['2012-01-15','2012-12-25' ], 'end': ['2013-01-15','2013-01-17' ]})

## coerce a column to use pandas datetime 
df['start'] = pd.to_datetime(df['start'])
df['end']  = pd.to_datetime(df['end'])
print(df)
df.dtypes

       start        end
0 2012-01-15 2013-01-15
1 2012-12-25 2013-01-17


start    datetime64[ns]
end      datetime64[ns]
dtype: object

#### Unusual Date formats
pandas uses strftime under the hood and can make a date out of about anything.
reference
http://strftime.org/

In [62]:
## Examples of reading different date formats
f='%Y/%m/%d'
pd.to_datetime('2012/12/25', format=f)

f='%m-%d-%Y'
pd.to_datetime('12-25-2012', format=f)

f='%m-%d-%Y'
pd.to_datetime('12-25-2012', format=f)

f='%b %d, %Y'
pd.to_datetime('dec 25, 2012', format=f)

f='%B %d, %Y'
pd.to_datetime('december 25, 2012', format=f)

pd.to_datetime('2012-12-25 23:59:15')


Timestamp('2012-12-25 23:59:15')

#### Basic Features Using a Signle Date time column 
To extract basic information from a  Timestamp column, use the df module

In [34]:
df['start'].dt.dayofweek
df['start'].dt.day_name()
df['start'].dt.year
df['start'].dt.month
df['start'].dt.day
df['start'].dt.dayofyear

0     15
1    360
Name: start, dtype: int64

#### Dealing with Holidays
pandas has a number of built in calendars in the tseries.holiday package.  After intialized,use a simple is in generator to determine 0 /1 when there day is a holiday, list the gen backinto a 'holiday' column

In [35]:
from pandas.tseries.holiday import USFederalHolidayCalendar
cal = USFederalHolidayCalendar().holidays()

## is your date a federal holiday 
def is_holiday(array, cal):
    for val in array:
        if val in cal:
            yield 1
        else:
            yield 0

list(is_holiday(df['start'], cal))


[0, 1]

#### Date Differences
 by default, the date differences are as days,and can be done by directly subtracting columns.  This actually returns a timedelta object, where days can be extracted 

In [36]:
x = df['start'] - df['end']
print(x.dt.days)

0   -366
1    -23
dtype: int64


#### time delta manululation 
What Day is it today, and a week ago today 
Using Time Deltas, to find differences 


In [52]:
from datetime import timedelta
now = pd.to_datetime('today')
one_hour_ago =  now +  timedelta(hours=-1)  
one_day_ago = now +  timedelta(days=-1)  
one_week_ago = now + timedelta(weeks=-1)    

print('now {}'.format(now))
print('one hour ago {}'.format(one_hour_ago))
print('yesterday {} '.format(one_day_ago))
print('lastweek {} '.format(one_week_ago))


now 2019-11-13 15:43:23.311430
one hour ago 2019-11-13 14:43:23.311430
yesterday 2019-11-12 15:43:23.311430 
lastweek 2019-11-06 15:43:23.311430 


#### Put it all Together for Feature Engineering
This combines a bunch of methods into a function that can be used for feature extraction

In [37]:
## put it all together in a preprocessing function
def date_features(df, start_col='start', end_col='end', cal=cal):
    # coece to date time 
    df['start'] = pd.to_datetime(df[start_col])
    df['end']  = pd.to_datetime(df[end_col])
    # get end day of the week
    df['dayOfWeek'] = df[ end_col].dt.dayofweek
    # get end year
    df['year'] = df[ end_col].dt.year
    # get end month
    df['month'] = df[ end_col].dt.month
    # get end day of the year
    df['dayOfYear'] = df[ end_col].dt.dayofyear
    # get 0 /1 is holiday
    df['holiday'] = list(is_holiday(df[ end_col], cal))
    # get start end diff in days
    df['date_diff'] = df[end_col] - df[start_col]
    df['date_diff']  = df['date_diff'].dt.days
    df = df.drop([start_col, end_col], axis=1)
    return df

df2 = date_features(df)
print(df2)


   dayOfWeek  year  month  dayOfYear  holiday  date_diff
0          1  2013      1         15        0        366
1          3  2013      1         17        0         23
