# Feature Engineering - Extracting Date

* Though date columns usually provide valuable information about the model target, they are neglected as an input or used nonsensically for the machine learning algorithms.

### Here we will discuss 3 trpes of extracting the date

* Extracting the parts of the date into different columns: Year, month, day, etc.
* Extracting the time period between the current date and columns in terms of years, months, days, etc.
* Extracting some specific features from the date: Name of the weekday, Weekend or not, holiday or not, etc.

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
from datetime import date

data = pd.DataFrame({'date':
['01-01-2017',
'04-12-2008',
'23-06-1988',
'25-08-1999',
'20-02-1993',
]})

In [3]:
data

Unnamed: 0,date
0,01-01-2017
1,04-12-2008
2,23-06-1988
3,25-08-1999
4,20-02-1993


In [4]:
## First we need to check the data type of the date column, if it not the in datetime format we need to conver that into datetime

In [5]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   date    5 non-null      object
dtypes: object(1)
memory usage: 168.0+ bytes


Here date column is in object type so we need to convert that into datetime

In [6]:
#Transform string to date
data['date'] = pd.to_datetime(data.date, format="%d-%m-%Y")

In [7]:
data['date'].dtype

dtype('<M8[ns]')

### Extracting Year

In [8]:
data['year'] = data['date'].dt.year

### #Extracting Month

In [9]:
data['month'] = data['date'].dt.month

### Extracting Month

In [10]:
data['month'] = data['date'].dt.month

### Extracting passed years since the date (i.e current year - year in the column)

In [11]:
data['passed_years'] = date.today().year - data['date'].dt.year

### Extracting passed months since the date (i.e current month - month in the column)

In [12]:
data['passed_months'] = (date.today().year - data['date'].dt.year) * 12 + date.today().month - data['date'].dt.month

### Extracting passed days since the date (i.e current date - date in the column)

In [13]:
#creating a new column today_date
data['today_date'] = date.today()
# or
# data['today_date'] = pd.to_datetime('today').date()

In [14]:
# converting that today_date column to datetime
data['today_date'] = pd.to_datetime(data.today_date)

In [15]:
data['passed_days'] = data['today_date'] - data['date']

# other approach

data['no_of_days_passed'] = (data['today_date'] - data['date']).dt.days

In [16]:
# dropping that today date column
data.drop('today_date', axis = 1, inplace = True)

### Extracting the weekday name of the date

In [17]:
data['day_name'] = data['date'].dt.day_name()

In [18]:
data

Unnamed: 0,date,year,month,passed_years,passed_months,passed_days,no_of_days_passed,day_name
0,2017-01-01,2017,1,4,53,1635 days,1635,Sunday
1,2008-12-04,2008,12,13,150,4585 days,4585,Thursday
2,1988-06-23,1988,6,33,396,12054 days,12054,Thursday
3,1999-08-25,1999,8,22,262,7974 days,7974,Wednesday
4,1993-02-20,1993,2,28,340,10351 days,10351,Saturday
