-----
<div class="alert alert-block alert-info">
<h1> Supplement: Introduction to Date and Time in Python </h1>

<div class="alert alert-block alert-warning">  
    
<h3><b>Let's stop and think about the possible reasons for dealing with Date and Time</b></h3>
    
    
Almost in all occasions, we need to have the correct format for date and/or time in a DataFrame so we can apply functions on them. The <b>purpose</b> of such operations can be:

- to give summary information of the data (EDA)
- for time series plots or to plot information aggregated by time (Visualization) 
- to create input features for models (we don't want the orginal date and time, why?)

This tutorial will try to serve those purposes. 
    
</div>

<b> References </b>

[The Python Coding Book](https://thepythoncodingbook.com/)

[Working with datetime in Pandas DataFrame](https://towardsdatascience.com/working-with-datetime-in-pandas-dataframe-663f7af6c587) **link no longer works**

<b> I. Use datetime module </b>

<b>A.  Data types in `datetime`</b>

Let’s start using the datetime module and introduce the two key data types that you’ll need to get started:

- `datetime.datetime`
- `datetime.timedelta`

In [1]:
import datetime

In [2]:
#get the current date and time
time_now = datetime.datetime.now()
time_now

datetime.datetime(2025, 7, 9, 13, 2, 50, 826046)

The numbers above showed the values from `datetime.datetime`:
the year, month, day, hour, minute, second, and microsecond, in this order.

Now let's check the attributes:

In [None]:
#time_now.year
#time_now.month
#time_now.day
#time_now.weekday()
#time_now.isoweekday()
#time_now.isoformat()

2

Displaying dates and times in Python using `str()` and `print()`

In [9]:
print(time_now)

2025-07-09 13:02:50.826046


In [10]:
str(time_now)

'2025-07-09 13:02:50.826046'

When you subtract one datetime.datetime object from another, the value returned is an object of type `datetime.timedelta`:

In [11]:
time_1 = datetime.datetime(2022, 7, 13, 8, 55, 33, 208249)
time_2 = datetime.datetime(2022, 7, 13, 9, 55, 33, 408249)
gap = time_2-time_1
gap

datetime.timedelta(seconds=3600, microseconds=200000)

In [12]:
type(gap)

datetime.timedelta

There are attributes that allow you to extract just one of the relevant values.

- `gap.seconds` to show the whole number of seconds included in the datetime.timedelta.
- `gap.total_seconds()` returns number of seconds with decimals.

In [13]:
gap.seconds

3600

In [14]:
gap.total_seconds()

3600.2

<b> B. Converting Dates and Times in Python To and From Strings </b>

- `strptime()`: string parse time
- `strftime()`: string format time


`strptime()`

The `strptime()` method converts a string containing a date and time into a datetime.datetime object. You can remember this using the p between str and time in the method name, which refers to parsing a string into a date and time. The method name only refers to time, but as you’re using the datetime module, this method deals with both dates and times.

In [15]:
# Let's try some very common ways of how dates are recorded

date_v1 = "20/02/1991"
date_v2 = "1991-02-20"
date_v3 = "20 February, 1991"

There are many possible ways of displaying years, months, and days. There are also codes to refer to time data. Click [here](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes) for a complete list. 

In [16]:
date_v1 = "20/02/1991"
date_v1_dt = datetime.datetime.strptime(date_v1, "%d/%m/%Y")
print(date_v1_dt)

1991-02-20 00:00:00


<div class="alert alert-block alert-danger">
    
<b>Practice 1</b>
    
- Try to convert `date_v2` and `date_v3` to `datetime.datetime` type and call them `date_v2_dt` and `date_v3_dt`.
- Run `date_v1==date_v2==date_v2` and `date_v1_dt==date_v2_dt==date_v3_dt`

</div>

`strftime()`

This method performs the reverse of the method you’ve just learned about. The f in the method name shows that you’re obtaining a string from a date and time object.

In [19]:
new_date = datetime.datetime(year=2008, month=12, day=3)
new_date

datetime.datetime(2008, 12, 3, 0, 0)

You can convert this `datetime.datetime` object into a string using `strftime()`. You can choose to create a string using any format you wish:

In [22]:
new_date.strftime("%Y-%m-%d")
#new_date.strftime("Python 3.0 was released in %B %Y")

'2008-12-03'

If you wish, you can also use `strftime()` using the same style as `strptime()`:

In [27]:
new_date_str = datetime.datetime.strftime(new_date, "%Y-%m-%d")
type(new_date_str)

str

<b> II. Use Pandas </b>

Most of the time we deal with columns in pandas DataFrames that store information of date and time. Pandas comes with functions and objects of its own to handle this. 

<b>A. use `pd.to_datetime` in Pandas</b>

<b>a. Convert strings to datetime</b>

Pandas `to_datetime()` is able to parse any valid date string to datetime without any additional arguments. 

In [23]:
import pandas as pd
import numpy as np

In [24]:
df = pd.DataFrame({'date': ['3/10/2000', '3/11/2000', '3/12/2000'],
                   'value': [2, 3, 4]})
df['date'] = pd.to_datetime(df['date'])
df

Unnamed: 0,date,value
0,2000-03-10,2
1,2000-03-11,3
2,2000-03-12,4


In [17]:
# customer format
df = pd.DataFrame({'date': ['2016-6-10 20:30:0', 
                            '2016-7-1 19:45:30', 
                            '2013-10-12 4:5:1'],
                   'value': [2, 3, 4]})
df['date'] = pd.to_datetime(df['date'], format="%Y-%d-%m %H:%M:%S")
df

Unnamed: 0,date,value
0,2016-10-06 20:30:00,2
1,2016-01-07 19:45:30,3
2,2013-12-10 04:05:01,4


You will end up with a TypeError if the date string does not meet the timestamp format. `to_datetime()` has an argument called `errors` that allows you to ignore the error or force an invalid value to `NaT`.

In [25]:
df = pd.DataFrame({'date': ['3/10/2000', 'a/11/2000', '3/12/2000', '2000-03-22'],
                   'value': [2, 3, 4, 5]})
df['date'] = pd.to_datetime(df['date'],errors='coerce')
df

Unnamed: 0,date,value
0,2000-03-10,2
1,NaT,3
2,2000-03-12,4
3,NaT,5


In [None]:
df = pd.DataFrame({'date': ['2000-03-22', '3/10/2000', 'a/11/2000', '3/12/2000'],
                   'value': [2, 3, 4, 5]})
df['date'] = pd.to_datetime(df['date'],errors='coerce')
df

Unnamed: 0,date,value
0,2000-03-22,2
1,NaT,3
2,NaT,4
3,NaT,5


<div class="alert alert-block alert-danger">
    
<b>Practice 2</b>
    
Create a data frame to store the date, starting time, and the course name of your quizzes next week. 

</div>

<b>b. Assemble a datetime from multiple columns</b>

`to_datetime()` can be used to assemble a datetime from multiple columns as well. The keys (column labels) can be common words like [‘year’, ‘month’, ‘day’, ‘minute’, ‘second’, ‘ms’]) or plurals.

In [34]:
df = pd.DataFrame({'year': [2015, 2016],
                   'month': [2, 3],
                   'day': [4, 5]})
df['date'] = pd.to_datetime(df)
df

Unnamed: 0,year,month,day,date
0,2015,2,4,2015-02-04
1,2016,3,5,2016-03-05


<b>c. datetime data types have attributes </b> like year, month and etc. that can be called by `pandas.Series.dt.@@`

In [41]:
df = pd.DataFrame({'name': ['Tom', 'Andy', 'Lucas'],
                 'DoB': ['08-05-1997', '04-28-1996', '12-16-1995']})
df['DoB'] = pd.to_datetime(df['DoB'])
df

Unnamed: 0,name,DoB
0,Tom,1997-08-05
1,Andy,1996-04-28
2,Lucas,1995-12-16


In [42]:
df['year']= df['DoB'].dt.year
df['month']= df['DoB'].dt.month
df['day']= df['DoB'].dt.day
df['day_of_year'] = df['DoB'].dt.day_of_year
df['day_of_week'] = df['DoB'].dt.dayofweek
df['is_leap_year'] = df['DoB'].dt.is_leap_year
df

Unnamed: 0,name,DoB,year,month,day,day_of_year,day_of_week,is_leap_year
0,Tom,1997-08-05,1997,8,5,217,1,False
1,Andy,1996-04-28,1996,4,28,119,6,True
2,Lucas,1995-12-16,1995,12,16,350,5,False


Note that Pandas dt.dayofweek attribute returns the day of the week and it is assumed the week starts on Monday, which is denoted by 0 and ends on Sunday which is denoted by 6. To replace the number with the full name, we can create a mapping and pass it to map() :

In [43]:
dw_mapping={
    0: 'Monday', 
    1: 'Tuesday', 
    2: 'Wednesday', 
    3: 'Thursday', 
    4: 'Friday',
    5: 'Saturday', 
    6: 'Sunday'
} 
df['day_of_week_name']=df['DoB'].dt.dayofweek.map(dw_mapping)
df

Unnamed: 0,name,DoB,year,month,day,day_of_year,day_of_week,is_leap_year,day_of_week_name
0,Tom,1997-08-05,1997,8,5,217,1,False,Tuesday
1,Andy,1996-04-28,1996,4,28,119,6,True,Sunday
2,Lucas,1995-12-16,1995,12,16,350,5,False,Saturday


Get the age from the date of birth

In [44]:
today = pd.to_datetime('today')
df['age'] = today.year - df['DoB'].dt.year
df

Unnamed: 0,name,DoB,year,month,day,day_of_year,day_of_week,is_leap_year,day_of_week_name,age
0,Tom,1997-08-05,1997,8,5,217,1,False,Tuesday,28
1,Andy,1996-04-28,1996,4,28,119,6,True,Sunday,29
2,Lucas,1995-12-16,1995,12,16,350,5,False,Saturday,30


<b> d. Use datetime as index to better serve our purposes </b>

It will be easier to 

- select the data
- perform aggregation 
- fit certain time series models (later)

I will use the `kaggle-uber-other-federal.csv` for examples. 

<div class="alert alert-block alert-success">
Reflect!
<ul>
<li> `to_datetime` allows us to use the correct data type for date and time. 
<li> attributes of the object allows us to create features for models or aggregated information for models and plots. 
<li> this works similarly on a time column. But for our purposes, it's usually more useful to have date and time combined.     
</div>

In [46]:

df_uber=pd.read_csv('../data/kaggle-uber-other-federal.csv')
df_uber.head()

Unnamed: 0,Date,Time,PU_Address,DO_Address,Routing Details,PU_Address.1,Status
0,07/01/2014,07:15 AM,"Brooklyn Museum, 200 Eastern Pkwy., BK NY;","1 Brookdale Plaza, BK NY;","PU: Brooklyn Museum, 200 Eastern Pkwy., BK NY;...","Brooklyn Museum, 200 Eastern Pkwy., BK NY; DO:...",Cancelled
1,07/01/2014,07:30 AM,"33 Robert Dr., Short Hills NJ;","John F Kennedy International Airport, vitona A...","PU: 33 Robert Dr., Short Hills NJ; DO: John F ...","33 Robert Dr., Short Hills NJ; DO: John F Kenn...",Arrived
2,07/01/2014,08:00 AM,"60 Glenmore Ave., BK NY;","2171 Nostrand Ave., BK NY;","PU: 60 Glenmore Ave., BK NY; DO: 2171 Nostrand...","60 Glenmore Ave., BK NY; DO: 2171 Nostrand Ave...",Assigned
3,07/01/2014,09:00 AM,"128 East 31 St., BK NY;","369 93rd St., BK NY;","PU: 128 East 31 St., BK NY; DO: 369 93rd St., ...","128 East 31 St., BK NY; DO: 369 93rd St., BK NY;",Assigned
4,07/01/2014,09:30 AM,"139-39 35 Ave., Flushing NY;",La Guardia Airport;,"PU: 139-39 35 Ave., Flushing NY; DO: La Guardi...","139-39 35 Ave., Flushing NY; DO: La Guardia Ai...",Assigned


In [47]:
#Combine the strings then change to datetime
df_uber['Datetime'] = df_uber['Date'] + ' ' + df_uber['Time']
df_uber['Datetime'] = pd.to_datetime(df_uber['Datetime'])
df_uber.head(10)

Unnamed: 0,Date,Time,PU_Address,DO_Address,Routing Details,PU_Address.1,Status,Datetime
0,07/01/2014,07:15 AM,"Brooklyn Museum, 200 Eastern Pkwy., BK NY;","1 Brookdale Plaza, BK NY;","PU: Brooklyn Museum, 200 Eastern Pkwy., BK NY;...","Brooklyn Museum, 200 Eastern Pkwy., BK NY; DO:...",Cancelled,2014-07-01 07:15:00
1,07/01/2014,07:30 AM,"33 Robert Dr., Short Hills NJ;","John F Kennedy International Airport, vitona A...","PU: 33 Robert Dr., Short Hills NJ; DO: John F ...","33 Robert Dr., Short Hills NJ; DO: John F Kenn...",Arrived,2014-07-01 07:30:00
2,07/01/2014,08:00 AM,"60 Glenmore Ave., BK NY;","2171 Nostrand Ave., BK NY;","PU: 60 Glenmore Ave., BK NY; DO: 2171 Nostrand...","60 Glenmore Ave., BK NY; DO: 2171 Nostrand Ave...",Assigned,2014-07-01 08:00:00
3,07/01/2014,09:00 AM,"128 East 31 St., BK NY;","369 93rd St., BK NY;","PU: 128 East 31 St., BK NY; DO: 369 93rd St., ...","128 East 31 St., BK NY; DO: 369 93rd St., BK NY;",Assigned,2014-07-01 09:00:00
4,07/01/2014,09:30 AM,"139-39 35 Ave., Flushing NY;",La Guardia Airport;,"PU: 139-39 35 Ave., Flushing NY; DO: La Guardi...","139-39 35 Ave., Flushing NY; DO: La Guardia Ai...",Assigned,2014-07-01 09:30:00
5,07/01/2014,12:00 PM,"545 17 St., BK NY;",La Guardia Airport;,"PU: 545 17 St., BK NY; DO: La Guardia Airport;","545 17 St., BK NY; DO: La Guardia Airport;",Arrived,2014-07-01 12:00:00
6,07/01/2014,12:30 PM,"127 Guernsey St., BK NY;","121 Dekalb Ave., BK NY;","PU: 127 Guernsey St., BK NY; DO: 121 Dekalb Av...","127 Guernsey St., BK NY; DO: 121 Dekalb Ave., ...",Assigned,2014-07-01 12:30:00
7,07/01/2014,01:00 PM,"657 St Marks Ave., BK NY;","240 South 3rd St., BK NY;","PU: 657 St Marks Ave., BK NY; DO: 240 South 3r...","657 St Marks Ave., BK NY; DO: 240 South 3rd St...",Assigned,2014-07-01 13:00:00
8,07/01/2014,02:30 PM,"1611 47th St., BK NY;","1048 49th St., BK NY;","PU: 1611 47th St., BK NY; DO: 1048 49th St., B...","1611 47th St., BK NY; DO: 1048 49th St., BK NY;",Arrived,2014-07-01 14:30:00
9,07/01/2014,02:45 PM,"364 87th Street, BK NY;",John F Kennedy International Airport;,"PU: 364 87th Street, BK NY; DO: John F Kennedy...","364 87th Street, BK NY; DO: John F Kennedy Int...",Assigned,2014-07-01 14:45:00


In [48]:
# Then it becomes convenient to extract information we need
df_uber['year']= df_uber['Datetime'].dt.year
df_uber['month']= df_uber['Datetime'].dt.month
df_uber['day_of_week'] = df_uber['Datetime'].dt.dayofweek.map(dw_mapping)
df_uber['hour_of_day']= df_uber['Datetime'].dt.hour

df_uber.head()

Unnamed: 0,Date,Time,PU_Address,DO_Address,Routing Details,PU_Address.1,Status,Datetime,year,month,day_of_week,hour_of_day
0,07/01/2014,07:15 AM,"Brooklyn Museum, 200 Eastern Pkwy., BK NY;","1 Brookdale Plaza, BK NY;","PU: Brooklyn Museum, 200 Eastern Pkwy., BK NY;...","Brooklyn Museum, 200 Eastern Pkwy., BK NY; DO:...",Cancelled,2014-07-01 07:15:00,2014,7,Tuesday,7
1,07/01/2014,07:30 AM,"33 Robert Dr., Short Hills NJ;","John F Kennedy International Airport, vitona A...","PU: 33 Robert Dr., Short Hills NJ; DO: John F ...","33 Robert Dr., Short Hills NJ; DO: John F Kenn...",Arrived,2014-07-01 07:30:00,2014,7,Tuesday,7
2,07/01/2014,08:00 AM,"60 Glenmore Ave., BK NY;","2171 Nostrand Ave., BK NY;","PU: 60 Glenmore Ave., BK NY; DO: 2171 Nostrand...","60 Glenmore Ave., BK NY; DO: 2171 Nostrand Ave...",Assigned,2014-07-01 08:00:00,2014,7,Tuesday,8
3,07/01/2014,09:00 AM,"128 East 31 St., BK NY;","369 93rd St., BK NY;","PU: 128 East 31 St., BK NY; DO: 369 93rd St., ...","128 East 31 St., BK NY; DO: 369 93rd St., BK NY;",Assigned,2014-07-01 09:00:00,2014,7,Tuesday,9
4,07/01/2014,09:30 AM,"139-39 35 Ave., Flushing NY;",La Guardia Airport;,"PU: 139-39 35 Ave., Flushing NY; DO: La Guardi...","139-39 35 Ave., Flushing NY; DO: La Guardia Ai...",Assigned,2014-07-01 09:30:00,2014,7,Tuesday,9


In [49]:
# It's also convenient for us to set it as the index

df_uber = df_uber.set_index(['Datetime'])
df_uber.head()

Unnamed: 0_level_0,Date,Time,PU_Address,DO_Address,Routing Details,PU_Address.1,Status,year,month,day_of_week,hour_of_day
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2014-07-01 07:15:00,07/01/2014,07:15 AM,"Brooklyn Museum, 200 Eastern Pkwy., BK NY;","1 Brookdale Plaza, BK NY;","PU: Brooklyn Museum, 200 Eastern Pkwy., BK NY;...","Brooklyn Museum, 200 Eastern Pkwy., BK NY; DO:...",Cancelled,2014,7,Tuesday,7
2014-07-01 07:30:00,07/01/2014,07:30 AM,"33 Robert Dr., Short Hills NJ;","John F Kennedy International Airport, vitona A...","PU: 33 Robert Dr., Short Hills NJ; DO: John F ...","33 Robert Dr., Short Hills NJ; DO: John F Kenn...",Arrived,2014,7,Tuesday,7
2014-07-01 08:00:00,07/01/2014,08:00 AM,"60 Glenmore Ave., BK NY;","2171 Nostrand Ave., BK NY;","PU: 60 Glenmore Ave., BK NY; DO: 2171 Nostrand...","60 Glenmore Ave., BK NY; DO: 2171 Nostrand Ave...",Assigned,2014,7,Tuesday,8
2014-07-01 09:00:00,07/01/2014,09:00 AM,"128 East 31 St., BK NY;","369 93rd St., BK NY;","PU: 128 East 31 St., BK NY; DO: 369 93rd St., ...","128 East 31 St., BK NY; DO: 369 93rd St., BK NY;",Assigned,2014,7,Tuesday,9
2014-07-01 09:30:00,07/01/2014,09:30 AM,"139-39 35 Ave., Flushing NY;",La Guardia Airport;,"PU: 139-39 35 Ave., Flushing NY; DO: La Guardi...","139-39 35 Ave., Flushing NY; DO: La Guardia Ai...",Assigned,2014,7,Tuesday,9


Select data with a specific time and perform aggregation

In [50]:
# How many records are from 2014?
len(df_uber.loc['2014'])

99

In [51]:
# How many records are from July 1 to July 15?
len(df_uber.loc['2014-07-01':'2014-07-15'])

76

In [52]:
# How many records are from 9am to 11:59am?
len(df_uber.between_time('9:00','11:49'))

17

In [53]:
# Total number of cancelled trips on each day of the week

# First we can check the number of rows from each day
df_uber.groupby('day_of_week').size()

day_of_week
Friday       16
Monday       16
Saturday      7
Sunday       12
Thursday     17
Tuesday      15
Wednesday    16
dtype: int64

In [54]:
pd.crosstab(df_uber['day_of_week'], df_uber['Status'])

Status,Arrived,Assigned,Cancelled
day_of_week,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Friday,12,4,0
Monday,7,4,5
Saturday,4,2,1
Sunday,4,7,1
Thursday,9,7,1
Tuesday,7,7,1
Wednesday,15,1,0


Some other small interesting things...


 - Creating a datetime sequence with fixed intervals in pandas
 
 

In [55]:
b1 = np.random.rand(10)
b2 = pd.date_range('2022-07-01', periods=10, freq='1d')
df = pd.DataFrame({'M':b1}, index=b2)
df

Unnamed: 0,M
2022-07-01,0.671932
2022-07-02,0.203859
2022-07-03,0.200646
2022-07-04,0.042131
2022-07-05,0.746942
2022-07-06,0.053376
2022-07-07,0.203122
2022-07-08,0.010056
2022-07-09,0.80491
2022-07-10,0.369966


In [56]:
b3 = np.random.rand(52)
b4 = pd.date_range('2022-07-01', periods=52, freq='W')
df = pd.DataFrame({'M':b3}, index=b4)
df['2022-07-10':'2022-08-10']

Unnamed: 0,M
2022-07-10,0.087903
2022-07-17,0.418284
2022-07-24,0.71673
2022-07-31,0.211922
2022-08-07,0.107251


- Date Arithmetic using `pd.Timestamp`. More details [here](https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.html)

In [57]:
appointment = pd.Timestamp('2022-07-13')
appointment.day_name()

'Wednesday'

Opps, I need to reschedule to 3 days later

In [58]:
appointment = pd.Timestamp('2022-07-13')
appointment += pd.Timedelta('3 days')
appointment.day_name()

'Saturday'

My bad, it should be 3 business days later

In [59]:
appointment = pd.Timestamp('2022-07-13')
appointment += pd.offsets.BDay(3)
appointment.day_name()

'Monday'

- Fill in missing dates

In [61]:
df = pd.DataFrame({'date': ['3/10/2000', '3/11/2000', '3/15/2000'],
                   'value': [2, 3, 4]})
df['date'] = pd.to_datetime(df['date'])
df = df.set_index(['date'])
df

Unnamed: 0_level_0,value
date,Unnamed: 1_level_1
2000-03-10,2
2000-03-11,3
2000-03-15,4


In [63]:
index_new = pd.date_range('03-10-2000', '03-15-2000')
type(index_new)

pandas.core.indexes.datetimes.DatetimeIndex

In [None]:
df = df.reindex(index_new, fill_value=np.nan)
df

Unnamed: 0,value
2000-03-10,2.0
2000-03-11,3.0
2000-03-12,
2000-03-13,
2000-03-14,
2000-03-15,4.0
