## Date Manipulation on Financial Services Consumer Complaints
In this exercise, we will learn how to extract time-related information from two existing date columns using pandas in order to create six new columns:

In [1]:
import pandas as pd

In [2]:
file_url = ('https://raw.githubusercontent.com/PacktWorkshops/The-Data-Science-Workshop/master/Chapter12/Dataset/Consumer_Complaints.csv')

In [3]:
df = pd.read_csv(file_url)
df.head()

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


Unnamed: 0,Complaint ID,Product,Sub-product,Issue,Sub-issue,State,ZIP code,Submitted via,Date received,Date sent to company,Company,Company response,Timely response?,Consumer disputed?
0,1114245,Debt collection,Medical,Disclosure verification of debt,Not given enough info to verify debt,FL,32219.0,Web,11/13/2014,11/13/2014,"Choice Recovery, Inc.",Closed with explanation,Yes,
1,1114488,Debt collection,Medical,Disclosure verification of debt,Right to dispute notice not received,TX,75006.0,Web,11/13/2014,11/13/2014,"Expert Global Solutions, Inc.",In progress,Yes,
2,1114255,Bank account or service,Checking account,Deposits and withdrawals,,NY,11102.0,Web,11/13/2014,11/13/2014,"FNIS (Fidelity National Information Services, ...",In progress,Yes,
3,1115106,Debt collection,"Other (phone, health club, etc.)",Communication tactics,Frequent or repeated calls,GA,31721.0,Web,11/13/2014,11/13/2014,"Expert Global Solutions, Inc.",In progress,Yes,
4,1115890,Credit reporting,,Incorrect information on credit report,Information is not mine,FL,33461.0,Web,11/12/2014,11/13/2014,TransUnion,In progress,Yes,


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 312912 entries, 0 to 312911
Data columns (total 14 columns):
 #   Column                Non-Null Count   Dtype  
---  ------                --------------   -----  
 0   Complaint ID          312912 non-null  int64  
 1   Product               312912 non-null  object 
 2   Sub-product           219052 non-null  object 
 3   Issue                 312908 non-null  object 
 4   Sub-issue             85586 non-null   object 
 5   State                 308387 non-null  object 
 6   ZIP code              309074 non-null  float64
 7   Submitted via         312912 non-null  object 
 8   Date received         312912 non-null  object 
 9   Date sent to company  312912 non-null  object 
 10  Company               312912 non-null  object 
 11  Company response      312912 non-null  object 
 12  Timely response?      312912 non-null  object 
 13  Consumer disputed?    284734 non-null  object 
dtypes: float64(1), int64(1), object(12)
memory usage: 33

In [7]:
# convert columns to datetime
df['Date received'] = pd.to_datetime(df['Date received'])
df['Date sent to company'] = pd.to_datetime(df['Date sent to company'])

In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 312912 entries, 0 to 312911
Data columns (total 14 columns):
 #   Column                Non-Null Count   Dtype         
---  ------                --------------   -----         
 0   Complaint ID          312912 non-null  int64         
 1   Product               312912 non-null  object        
 2   Sub-product           219052 non-null  object        
 3   Issue                 312908 non-null  object        
 4   Sub-issue             85586 non-null   object        
 5   State                 308387 non-null  object        
 6   ZIP code              309074 non-null  float64       
 7   Submitted via         312912 non-null  object        
 8   Date received         312912 non-null  datetime64[ns]
 9   Date sent to company  312912 non-null  datetime64[ns]
 10  Company               312912 non-null  object        
 11  Company response      312912 non-null  object        
 12  Timely response?      312912 non-null  object        
 13 

In [9]:
# create new columns using datetime attributes 
# YearReceived column
df['YearReceived'] = df['Date received'].dt.year

# MonthReceived column
df['MonthReceived'] = df['Date received'].dt.month

# DayReceived column
df['DayReceived'] = df['Date received'].dt.day

# DowReceived column
df['DowReceived'] = df['Date received'].dt.dayofweek


In [10]:
df.head()

Unnamed: 0,Complaint ID,Product,Sub-product,Issue,Sub-issue,State,ZIP code,Submitted via,Date received,Date sent to company,Company,Company response,Timely response?,Consumer disputed?,YearReceived,MonthReceived,DayReceived,DowReceived
0,1114245,Debt collection,Medical,Disclosure verification of debt,Not given enough info to verify debt,FL,32219.0,Web,2014-11-13,2014-11-13,"Choice Recovery, Inc.",Closed with explanation,Yes,,2014,11,13,3
1,1114488,Debt collection,Medical,Disclosure verification of debt,Right to dispute notice not received,TX,75006.0,Web,2014-11-13,2014-11-13,"Expert Global Solutions, Inc.",In progress,Yes,,2014,11,13,3
2,1114255,Bank account or service,Checking account,Deposits and withdrawals,,NY,11102.0,Web,2014-11-13,2014-11-13,"FNIS (Fidelity National Information Services, ...",In progress,Yes,,2014,11,13,3
3,1115106,Debt collection,"Other (phone, health club, etc.)",Communication tactics,Frequent or repeated calls,GA,31721.0,Web,2014-11-13,2014-11-13,"Expert Global Solutions, Inc.",In progress,Yes,,2014,11,13,3
4,1115890,Credit reporting,,Incorrect information on credit report,Information is not mine,FL,33461.0,Web,2014-11-12,2014-11-13,TransUnion,In progress,Yes,,2014,11,12,2


Create a new column called IsWeekendReceived, which will contain binary values indicating whether the DowReceived column is over or equal to 5 (0 corresponds to Monday, 5 and 6 correspond to Saturday and Sunday respectively):

In [11]:
# create feature to indicate whether the date was during a weekend or not
df['IsWeekendReceived'] = df['DowReceived'] >= 5

In [12]:
df.head()

Unnamed: 0,Complaint ID,Product,Sub-product,Issue,Sub-issue,State,ZIP code,Submitted via,Date received,Date sent to company,Company,Company response,Timely response?,Consumer disputed?,YearReceived,MonthReceived,DayReceived,DowReceived,IsWeekendReceived
0,1114245,Debt collection,Medical,Disclosure verification of debt,Not given enough info to verify debt,FL,32219.0,Web,2014-11-13,2014-11-13,"Choice Recovery, Inc.",Closed with explanation,Yes,,2014,11,13,3,False
1,1114488,Debt collection,Medical,Disclosure verification of debt,Right to dispute notice not received,TX,75006.0,Web,2014-11-13,2014-11-13,"Expert Global Solutions, Inc.",In progress,Yes,,2014,11,13,3,False
2,1114255,Bank account or service,Checking account,Deposits and withdrawals,,NY,11102.0,Web,2014-11-13,2014-11-13,"FNIS (Fidelity National Information Services, ...",In progress,Yes,,2014,11,13,3,False
3,1115106,Debt collection,"Other (phone, health club, etc.)",Communication tactics,Frequent or repeated calls,GA,31721.0,Web,2014-11-13,2014-11-13,"Expert Global Solutions, Inc.",In progress,Yes,,2014,11,13,3,False
4,1115890,Credit reporting,,Incorrect information on credit report,Information is not mine,FL,33461.0,Web,2014-11-12,2014-11-13,TransUnion,In progress,Yes,,2014,11,12,2,False


We have created a new feature stating whether each complaint was received during a weekend or not. Now we will feature engineer a new column with the numbers of days between Date sent to company and Date received.

In [13]:
df['RoutingDays'] = df['Date sent to company'] - df['Date received']

In [14]:
df['RoutingDays'].dtype

dtype('<m8[ns]')

The result of subtracting two datetime columns is a new datetime column (dtype('<M8[ns]'), which is a specific datetime type for the numpy package). We need to convert this data type into an int to get the number of days between these two days.

In [15]:
# tranform RoutingDays using .dt.days attribute
df['RoutingDays'] = df['RoutingDays'].dt.days

In [16]:
df.head()

Unnamed: 0,Complaint ID,Product,Sub-product,Issue,Sub-issue,State,ZIP code,Submitted via,Date received,Date sent to company,Company,Company response,Timely response?,Consumer disputed?,YearReceived,MonthReceived,DayReceived,DowReceived,IsWeekendReceived,RoutingDays
0,1114245,Debt collection,Medical,Disclosure verification of debt,Not given enough info to verify debt,FL,32219.0,Web,2014-11-13,2014-11-13,"Choice Recovery, Inc.",Closed with explanation,Yes,,2014,11,13,3,False,0
1,1114488,Debt collection,Medical,Disclosure verification of debt,Right to dispute notice not received,TX,75006.0,Web,2014-11-13,2014-11-13,"Expert Global Solutions, Inc.",In progress,Yes,,2014,11,13,3,False,0
2,1114255,Bank account or service,Checking account,Deposits and withdrawals,,NY,11102.0,Web,2014-11-13,2014-11-13,"FNIS (Fidelity National Information Services, ...",In progress,Yes,,2014,11,13,3,False,0
3,1115106,Debt collection,"Other (phone, health club, etc.)",Communication tactics,Frequent or repeated calls,GA,31721.0,Web,2014-11-13,2014-11-13,"Expert Global Solutions, Inc.",In progress,Yes,,2014,11,13,3,False,0
4,1115890,Credit reporting,,Incorrect information on credit report,Information is not mine,FL,33461.0,Web,2014-11-12,2014-11-13,TransUnion,In progress,Yes,,2014,11,12,2,False,1


In this exercise, you put into practice different techniques to feature engineer new variables from datetime columns on a real-world dataset. From the two Date sent to company and Date received columns, you successfully created six new features that will provide additional valuable information.

For instance, we were able to find patterns such as the number of complaints tends to be higher in November or on a Friday. We also found that routing the complaints takes more time when they are received during the weekend, which may be due to the limited number of staff at that time of the week.