# Creating features from date and time

In [None]:
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns

**Date and time variables are those that contain information about dates, times, or date and
time.** 

In programming, we refer to these variables as datetime variables. Examples of the
datetime variables are date of birth, time of the accident, and date of last payment. 

**The datetime variables usually contain a multitude of different labels corresponding to a
specific combination of date and time.** 

**We do not utilize the datetime variables in their
raw format when building machine learning models.** Instead, we enrich the dataset
dramatically by deriving multiple features from these variables.

## Extracting date and time parts from a datetime variable

The datetime variables can take dates, time, or date and time as values. The datetime
variables are not used in their raw format to build machine learning algorithms. Instead, we
**create additional features from them, and, in fact, we can enrich the dataset dramatically by
extracting information from the date and time.**



In [None]:
rng_ = pd.date_range('2019-03-05', periods=20, freq='T')
df = pd.DataFrame({'date': rng_})
df.head()

In [None]:
df.dtypes

In [None]:
df['date_part'] = df['date'].dt.date
df.head()

In [None]:
df['time_part'] = df['date'].dt.time
df.head()

## Deriving representations of the year and month

**Some events occur more often at certain times of the year**, for example, recruitment rates
increase after Christmas and slow down toward the summer holidays in Europe.
Businesses and organizations want to evaluate performance and objectives at regular
intervals throughout the year, for example, at every quarter or every semester. Therefore,
deriving these features from a date variable is very useful for both data analysis and
machine learning.

In [None]:
rng_ = pd.date_range('2019-03-05', periods=20, freq='M')
df = pd.DataFrame({'date': rng_})
df.head()

In [None]:
df['year'] = df['date'].dt.year
df.head()

In [None]:
df['month'] = df['date'].dt.month
df.head()

In [None]:
df['quarter'] = df['date'].dt.quarter
df.head()

## Creating representations of day and week

**Some events occur more often on certain days of the week**, for example, loan applications
occur more likely during the week than over weekends, whereas others occur more often
during certain weeks of the year. Businesses and organizations may also want to track some
key performance metrics throughout the week. 


In [None]:
rng_ = pd.date_range('2019-03-05', periods=20, freq='D')
df = pd.DataFrame({'date': rng_})
df.head()

In [None]:
df['day_mo'] = df['date'].dt.day
df.head()

In [None]:
df['day_week'] = df['date'].dt.dayofweek
df.head()

In [None]:
df['day_week_name'] = df['date'].dt.day_name()
df.head()

In [None]:
df['is_weekend'] = np.where(df['day_week_name'].isin(['Sunday', 'Saturday']), 1, 0)
df.head()

In [None]:
df['week'] = df['date'].dt.isocalendar().week
df.head()

## Extracting time parts from a time variable

**Some events occur more often at certain times of the day,** for example, fraudulent activity
occurs more likely during the night or early morning. Also, occasionally, organizations
want to track whether an event occurred after another one, in a very short time window, for
example, if sales increased on the back of displaying a TV or online advertisement.
Therefore, deriving time features is extremely useful.

In [None]:
rng_ = pd.date_range('2019-03-05', periods=20, freq='1h15min10s')
df = pd.DataFrame({'date': rng_})
df.head()

In [None]:
df['hour'] = df['date'].dt.hour
df['min'] = df['date'].dt.minute
df['sec'] = df['date'].dt.second
df.head()

In [None]:
df['is_morning'] = np.where( (df['hour'] < 12) & (df['hour'] > 6), 1, 0 )
df.head()

## Capturing the elapsed time between datetime variables

**The datetime variables offer value individually and they offer more value collectively
when used together with other datetime variables to derive important insights.** The most
common example consists in deriving the age from the date of birth and today variable, or
the day the customer had an accident or requested a loan. Like these examples, we can
combine several datetime variables to derive the time that passed in between and create
more meaningful features.

In [None]:
rng_hr = pd.date_range('2019-03-05', periods=20, freq='H')
rng_month = pd.date_range('2019-03-05', periods=20, freq='M')
df = pd.DataFrame({'date1': rng_hr, 'date2': rng_month})
df.head()

In [None]:
df['elapsed_days'] = (df['date2'] - df['date1']).dt.days
df.head()

In [None]:
df['months_passed'] = ((df['date2'] - df['date1']) / np.timedelta64(1, 'M'))
df['months_passed'] = np.round(df['months_passed'],0)
df.head()

In [None]:
df['diff_seconds'] = (df['date2'] - df['date1'])/np.timedelta64(1,'s')
df['diff_minutes'] = (df['date2'] - df['date1'])/np.timedelta64(1,'m')
df.head()

In [None]:
import datetime

df['to_today'] = (datetime.datetime.today() - df['date1'])
df.head()

## Working with time in different time zones

Some organizations operate internationally; therefore, the information they collect about
events may be recorded together with the time zone of the area where the event took place.
**To be able to compare events that occurred across different time zones, we first need to set
all of the variables within the same zone**.

In [None]:
df = pd.DataFrame()

df['time1'] = pd.concat([
    pd.Series(pd.date_range(start='2015-06-10 09:00', freq='H', periods=3, tz='Europe/Berlin')),
    pd.Series( pd.date_range(start='2015-09-10 09:00', freq='H', periods=3, tz='US/Central'))], axis=0)

In [None]:
df['time2'] = pd.concat([
    pd.Series(pd.date_range(start='2015-07-01 09:00', freq='H', periods=3,tz='Europe/Berlin')),
    pd.Series(pd.date_range(start='2015-08-01 09:00', freq='H', periods=3, tz='US/Central'))], axis=0)

In [None]:
df

In [None]:
df['time1_utc'] = pd.to_datetime(df['time1'], utc=True)
df['time2_utc'] = pd.to_datetime(df['time2'], utc=True)
df

In [None]:
df['elapsed_days'] = (df['time2_utc'] - df['time1_utc']).dt.days
df['elapsed_days'].head()

In [None]:
df['time1_london'] = df['time1_utc'].dt.tz_convert('Europe/London')
df['time2_berlin'] = df['time1_utc'].dt.tz_convert('Europe/Berlin')
df[['time1_london', 'time2_berlin']]