# Date and Time Features

Date columns usually provide valuable information about the model target, they are neglected as an input or used nonsensically for the machine learning algorithms. It might be the reason for this, that dates can be present in numerous formats, which make it hard to understand by algorithms, even they are simplified to a format like "01–01–2017".

Building an ordinal relationship between the values is very challenging for a machine learning algorithm if you leave the date columns without manipulation. Here, I suggest three types of preprocessing for dates:

* Extract date components like day of week, day of year, hour, minute, seconds, quarter, day of month etc.
* Extract time-based features like evenings, noons, night time etc.
* Extract seasonal features like rainy season, dry Season, harmattan period, winter, summer, autumn.
* Place specific features like national holidays, religious breaks, festive periods etc.
* Calculate time elapsed between two related Date features. 

If you transform the date column into the extracted columns like above, the information of them become disclosed and machine learning algorithms can easily understand them.

In [45]:
# import libraries


In [46]:
# read ufo.csv dataset


Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00
3,Abilene,,DISK,KS,6/1/1931 13:00
4,New York Worlds Fair,,LIGHT,NY,4/18/1933 19:00


In [47]:
# show info


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18241 entries, 0 to 18240
Data columns (total 5 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   City             18216 non-null  object
 1   Colors Reported  2882 non-null   object
 2   Shape Reported   15597 non-null  object
 3   State            18241 non-null  object
 4   Time             18241 non-null  object
dtypes: object(5)
memory usage: 712.7+ KB


**Time column datatype is object (string) so we want to cast it to datetime dtype to extract features from it, check pandas datetime functionalties.
https://pandas.pydata.org/pandas-docs/stable/reference/series.html#datetimelike-properties<br><br>
and this is the date formats<br>
https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior**

In [48]:
# transform time column to date time type


Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,1930-06-01 22:00:00
1,Willingboro,,OTHER,NJ,1930-06-30 20:00:00
2,Holyoke,,OVAL,CO,1931-02-15 14:00:00
3,Abilene,,DISK,KS,1931-06-01 13:00:00
4,New York Worlds Fair,,LIGHT,NY,1933-04-18 19:00:00


In [49]:
# show info


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18241 entries, 0 to 18240
Data columns (total 5 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   City             18216 non-null  object        
 1   Colors Reported  2882 non-null   object        
 2   Shape Reported   15597 non-null  object        
 3   State            18241 non-null  object        
 4   Time             18241 non-null  datetime64[ns]
dtypes: datetime64[ns](1), object(4)
memory usage: 712.7+ KB


**Ok the datatype of Time column is now datetime, so lets extract features.**

In [50]:
# extract year, month, month_name(), week, day, week_day, day_name, hour & minute
# show head


Unnamed: 0,City,Colors Reported,Shape Reported,State,Time,Year,Month,Month_Name,Week,Day,Week_Day,Day_Name,Hour,Minute
0,Ithaca,,TRIANGLE,NY,1930-06-01 22:00:00,1930,6,June,22,1,6,Sunday,22,0
1,Willingboro,,OTHER,NJ,1930-06-30 20:00:00,1930,6,June,27,30,0,Monday,20,0
2,Holyoke,,OVAL,CO,1931-02-15 14:00:00,1931,2,February,7,15,6,Sunday,14,0
3,Abilene,,DISK,KS,1931-06-01 13:00:00,1931,6,June,23,1,0,Monday,13,0
4,New York Worlds Fair,,LIGHT,NY,1933-04-18 19:00:00,1933,4,April,16,18,1,Tuesday,19,0


In [51]:
# create new data contains month_name & counts


Unnamed: 0,Month_Name,Count
0,April,1045
1,August,1948
2,December,1034
3,February,817
4,January,862
5,July,2345
6,June,3059
7,March,1096
8,May,1168
9,November,1509


In [52]:
# show bar plot from the created data


In [53]:
# sns.countplot(x='Month_Name', data=df, palette='viridis')

In [54]:
# create new data contains day_name & counts


Unnamed: 0,Day_Name,Count
0,Friday,2669
1,Monday,2300
2,Saturday,2687
3,Sunday,2689
4,Thursday,2598
5,Tuesday,2822
6,Wednesday,2476


In [55]:
# show bar plot from the created data


In [56]:
# sns.countplot(x='Day_Name', data=df, palette='viridis')

In [57]:
# show histogram for month_name colored by day_name


In [58]:
# sns.countplot(y='Month_Name', data=df, palette='viridis', hue='Day_Name')

In [59]:
# show head


Unnamed: 0,City,Colors Reported,Shape Reported,State,Time,Year,Month,Month_Name,Week,Day,Week_Day,Day_Name,Hour,Minute
0,Ithaca,,TRIANGLE,NY,1930-06-01 22:00:00,1930,6,June,22,1,6,Sunday,22,0
1,Willingboro,,OTHER,NJ,1930-06-30 20:00:00,1930,6,June,27,30,0,Monday,20,0
2,Holyoke,,OVAL,CO,1931-02-15 14:00:00,1931,2,February,7,15,6,Sunday,14,0
3,Abilene,,DISK,KS,1931-06-01 13:00:00,1931,6,June,23,1,0,Monday,13,0
4,New York Worlds Fair,,LIGHT,NY,1933-04-18 19:00:00,1933,4,April,16,18,1,Tuesday,19,0


**Ok, Lets extract day periods from hours column**

In [60]:
# create fn to return time period in the day
# 6 --> 12 = morning, 12 --> 19 = afternoon, 19 --> 6 = night 
# create new column called Period
# show head


Unnamed: 0,City,Colors Reported,Shape Reported,State,Time,Year,Month,Month_Name,Week,Day,Week_Day,Day_Name,Hour,Minute,Period
0,Ithaca,,TRIANGLE,NY,1930-06-01 22:00:00,1930,6,June,22,1,6,Sunday,22,0,night
1,Willingboro,,OTHER,NJ,1930-06-30 20:00:00,1930,6,June,27,30,0,Monday,20,0,night
2,Holyoke,,OVAL,CO,1931-02-15 14:00:00,1931,2,February,7,15,6,Sunday,14,0,afternoon
3,Abilene,,DISK,KS,1931-06-01 13:00:00,1931,6,June,23,1,0,Monday,13,0,afternoon
4,New York Worlds Fair,,LIGHT,NY,1933-04-18 19:00:00,1933,4,April,16,18,1,Tuesday,19,0,night


In [61]:
# show histogram for day_name colored by Period


In [62]:
# sns.countplot(x='Day_Name', hue='Period', data=df)

**Extract Season from month**

In [63]:
# create fn to return season from months
# 1 --> 3 = Winter, 4 --> 6 = Spring, 7 --> 9 = Summer, 10 --> 12 = Autumn 
# create new column called Season
# show head


Unnamed: 0,City,Colors Reported,Shape Reported,State,Time,Year,Month,Month_Name,Week,Day,Week_Day,Day_Name,Hour,Minute,Period,Season
0,Ithaca,,TRIANGLE,NY,1930-06-01 22:00:00,1930,6,June,22,1,6,Sunday,22,0,night,
1,Willingboro,,OTHER,NJ,1930-06-30 20:00:00,1930,6,June,27,30,0,Monday,20,0,night,
2,Holyoke,,OVAL,CO,1931-02-15 14:00:00,1931,2,February,7,15,6,Sunday,14,0,afternoon,Winter
3,Abilene,,DISK,KS,1931-06-01 13:00:00,1931,6,June,23,1,0,Monday,13,0,afternoon,
4,New York Worlds Fair,,LIGHT,NY,1933-04-18 19:00:00,1933,4,April,16,18,1,Tuesday,19,0,night,Spring


In [64]:
# show histogram for season


In [65]:
# sns.countplot(x='Season', data=df)

**Calculate number of years elapsed from now**

In [66]:
# from datetime import datetime


Unnamed: 0,City,Colors Reported,Shape Reported,State,Time,Year,Month,Month_Name,Week,Day,Week_Day,Day_Name,Hour,Minute,Period,Season,Elapsed_Years
0,Ithaca,,TRIANGLE,NY,1930-06-01 22:00:00,1930,6,June,22,1,6,Sunday,22,0,night,,93.632822
1,Willingboro,,OTHER,NJ,1930-06-30 20:00:00,1930,6,June,27,30,0,Monday,20,0,night,,93.553651
2,Holyoke,,OVAL,CO,1931-02-15 14:00:00,1931,2,February,7,15,6,Sunday,14,0,afternoon,Winter,92.924617
3,Abilene,,DISK,KS,1931-06-01 13:00:00,1931,6,June,23,1,0,Monday,13,0,afternoon,,92.634513
4,New York Worlds Fair,,LIGHT,NY,1933-04-18 19:00:00,1933,4,April,16,18,1,Tuesday,19,0,night,Spring,90.752886


In [67]:
# px.box(x= df.Elapsed_Years)

**Calculate number of months elapsed from custom date i.e World War 2**

In [68]:
# create column called 'Elapsed_Months from WW2'


Unnamed: 0,City,Colors Reported,Shape Reported,State,Time,Year,Month,Month_Name,Week,Day,Week_Day,Day_Name,Hour,Minute,Period,Season,Elapsed_Years,Elapsed_Months from WW2
0,Ithaca,,TRIANGLE,NY,1930-06-01 22:00:00,1930,6,June,22,1,6,Sunday,22,0,night,,93.632822,183.037297
1,Willingboro,,OTHER,NJ,1930-06-30 20:00:00,1930,6,June,27,30,0,Monday,20,0,night,,93.553651,182.087243
2,Holyoke,,OVAL,CO,1931-02-15 14:00:00,1931,2,February,7,15,6,Sunday,14,0,afternoon,Winter,92.924617,174.538834
3,Abilene,,DISK,KS,1931-06-01 13:00:00,1931,6,June,23,1,0,Monday,13,0,afternoon,,92.634513,171.057585
4,New York Worlds Fair,,LIGHT,NY,1933-04-18 19:00:00,1933,4,April,16,18,1,Tuesday,19,0,night,Spring,90.752886,148.478066


In [69]:
# px.box(x= df['Elapsed_Months from WW2'])

# Great Work!