# Working with Dates and Times

In [1]:
import pandas as pd
import datetime as dt

## Review of Python's datetime Module
- The `datetime` module is built into the core Python programming language.
- The common alias for the `datetime` module is `dt`.
- A module is a Python source file; think of like an internal library that Python loads on demand.
- The `datetime` module includes `date` and `datetime` classes for representing dates and datetimes.
- The `date` constructor accepts arguments for year, month, and day. Python defaults to 0 for any missing values.
- The `datetime` constructor accepts arguments for year, month, day, hour, minute, and second.

In [2]:
sometime = dt.datetime(2025, 12, 15, 8, 13, 59)
print(sometime.year, sometime.month, sometime.day, sometime.hour, sometime.minute, sometime.second)


2025 12 15 8 13 59


## The Timestamp and DatetimeIndex Objects

- Pandas ships with several classes related to datetimes.
- The **Timestamp** is similar to Python's **datetime** object (but with expanded functionality).
- A **DatetimeIndex** is an index of **Timestamp** objects.
- The **Timestamp** constructor accepts a string, a **datetime** object, or equivalent arguments to the **datetime** clas.

In [3]:
pd.Timestamp(2027, 3, 12, 18, 23, 49)

Timestamp('2027-03-12 18:23:49')

In [4]:
pd.Timestamp(dt.datetime(2028, 10, 23, 14, 35))

Timestamp('2028-10-23 14:35:00')

In [5]:
pd.Timestamp("2025/04/01")

Timestamp('2025-04-01 00:00:00')

In [6]:
pd.Timestamp("25 Mar 2023")

Timestamp('2023-03-25 00:00:00')

In [7]:
index = pd.DatetimeIndex([
    dt.date(2026, 1, 10),
    dt.date(2026, 2, 20)
])

index[0]

Timestamp('2026-01-10 00:00:00')

In [8]:
type(index[0])

pandas._libs.tslibs.timestamps.Timestamp

## Create Range of Dates with pd.date_range Function
- The `date_range` function generates and returns a **DatetimeIndex** holding a sequence of dates.
- The function requires 2 of the 3 following parameters: `start`, `end`, and `period`.
- With `start` and `end`, Pandas will assume a daily period/interval.
- Every element within a **DatetimeIndex** is a **Timestamp**.

In [9]:
pd.date_range(start="2025-01-01", end="2025-01-07")

DatetimeIndex(['2025-01-01', '2025-01-02', '2025-01-03', '2025-01-04',
               '2025-01-05', '2025-01-06', '2025-01-07'],
              dtype='datetime64[ns]', freq='D')

In [10]:
pd.date_range(start="2025-01-01", end="2025-01-07", freq="D")

DatetimeIndex(['2025-01-01', '2025-01-02', '2025-01-03', '2025-01-04',
               '2025-01-05', '2025-01-06', '2025-01-07'],
              dtype='datetime64[ns]', freq='D')

In [11]:
pd.date_range(start="2025-01-01", end="2025-01-07", freq="2D")

DatetimeIndex(['2025-01-01', '2025-01-03', '2025-01-05', '2025-01-07'], dtype='datetime64[ns]', freq='2D')

In [12]:
pd.date_range(start="2025-01-01", end="2025-01-07", freq="B") # Business Days

DatetimeIndex(['2025-01-01', '2025-01-02', '2025-01-03', '2025-01-06',
               '2025-01-07'],
              dtype='datetime64[ns]', freq='B')

In [13]:
pd.date_range(start="2012-09-09", freq="3D", periods=40)

DatetimeIndex(['2012-09-09', '2012-09-12', '2012-09-15', '2012-09-18',
               '2012-09-21', '2012-09-24', '2012-09-27', '2012-09-30',
               '2012-10-03', '2012-10-06', '2012-10-09', '2012-10-12',
               '2012-10-15', '2012-10-18', '2012-10-21', '2012-10-24',
               '2012-10-27', '2012-10-30', '2012-11-02', '2012-11-05',
               '2012-11-08', '2012-11-11', '2012-11-14', '2012-11-17',
               '2012-11-20', '2012-11-23', '2012-11-26', '2012-11-29',
               '2012-12-02', '2012-12-05', '2012-12-08', '2012-12-11',
               '2012-12-14', '2012-12-17', '2012-12-20', '2012-12-23',
               '2012-12-26', '2012-12-29', '2013-01-01', '2013-01-04'],
              dtype='datetime64[ns]', freq='3D')

## The dt Attribute
- The `dt` attribute reveals a `DatetimeProperties` object with attributes/methods for working with datetimes. It is similar to the `str` attribute for string methods.
- The `DatetimeProperties` object has attributes like `day`, `month`, and `year` to reveal information about each date in the **Series**.
- The `day_name` method returns the written day of the week.
- Attributes like `is_month_end` and `is_quarter_start` return Boolean **Series**.

In [15]:
bunch_of_dates = pd.Series(pd.date_range(start="2000-01-01", end="2020-12-31", freq="24D 3h"))

bunch_of_dates.head()

0   2000-01-01 00:00:00
1   2000-01-25 03:00:00
2   2000-02-18 06:00:00
3   2000-03-13 09:00:00
4   2000-04-06 12:00:00
dtype: datetime64[ns]

In [16]:
bunch_of_dates.dt.day

0       1
1      25
2      18
3      13
4       6
       ..
313     3
314    27
315    21
316    14
317     8
Length: 318, dtype: int32

In [17]:
bunch_of_dates.dt.is_month_end

0      False
1      False
2      False
3      False
4      False
       ...  
313    False
314    False
315    False
316    False
317    False
Length: 318, dtype: bool

## Selecting Rows from a DataFrame with a DateTimeIndex
- The `iloc` accessor is available for index position-based extraction.
- The `loc` accessor accepts strings or **Timestamps** to extract by index label/value. Note that Python's `datetime` objects will not work.
- Use list slicing to extract a sequence of dates. The `truncate` method is another alternative.

In [18]:
stocks = pd.read_csv("ibm.csv", parse_dates=["Date"], index_col="Date").sort_index()
stocks.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1962-01-02,5.0461,5.0461,4.98716,4.98716,593562.955237
1962-01-03,4.98716,5.03292,4.98716,5.03292,445175.034277
1962-01-04,5.03292,5.03292,4.98052,4.98052,399513.586679
1962-01-05,4.97389,4.97389,4.87511,4.88166,559321.480565
1962-01-08,4.88166,4.88166,4.75059,4.78972,833273.771393


In [19]:
stocks.loc["2014-03-04"]

Open      1.288700e+02
High      1.298270e+02
Low       1.288020e+02
Close     1.293290e+02
Volume    6.825202e+06
Name: 2014-03-04 00:00:00, dtype: float64

In [21]:
stocks.loc[pd.Timestamp("2014-03-04")]

Open      1.288700e+02
High      1.298270e+02
Low       1.288020e+02
Close     1.293290e+02
Volume    6.825202e+06
Name: 2014-03-04 00:00:00, dtype: float64

In [22]:
stocks.loc[pd.Timestamp(2014, 3, 4):pd.Timestamp(2014, 12, 31)]

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2014-03-04,128.870,129.827,128.802,129.329,6.825202e+06
2014-03-05,129.407,130.344,129.319,129.807,5.027617e+06
2014-03-06,129.963,130.676,129.631,130.159,5.503611e+06
2014-03-07,130.676,131.047,129.837,130.198,5.936539e+06
2014-03-10,130.090,130.666,128.890,129.309,6.623102e+06
...,...,...,...,...,...
2014-12-24,115.119,115.188,114.183,114.359,2.646416e+06
2014-12-26,114.651,115.257,114.495,114.700,2.706324e+06
2014-12-29,114.485,114.700,112.661,113.412,4.715249e+06
2014-12-30,113.090,113.656,112.934,113.109,4.004903e+06


## The DateOffset Object
- A **DateOffset** object adds time to a **Timestamp** to arrive at a new **Timestamp**.
- The **DateOffset** constructor accepts `days`, `weeks`, `months`, `years` parameters, and more.
- We can pass a **DateOffset** object to the `freq` parameter of the `pd.date_range` function.

In [23]:
stocks = pd.read_csv("ibm.csv", parse_dates=["Date"], index_col="Date").sort_index()
stocks.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1962-01-02,5.0461,5.0461,4.98716,4.98716,593562.955237
1962-01-03,4.98716,5.03292,4.98716,5.03292,445175.034277
1962-01-04,5.03292,5.03292,4.98052,4.98052,399513.586679
1962-01-05,4.97389,4.97389,4.87511,4.88166,559321.480565
1962-01-08,4.88166,4.88166,4.75059,4.78972,833273.771393


In [24]:
stocks.index + pd.DateOffset(days=5)

DatetimeIndex(['1962-01-07', '1962-01-08', '1962-01-09', '1962-01-10',
               '1962-01-13', '1962-01-14', '1962-01-15', '1962-01-16',
               '1962-01-17', '1962-01-20',
               ...
               '2023-10-03', '2023-10-04', '2023-10-07', '2023-10-08',
               '2023-10-09', '2023-10-10', '2023-10-11', '2023-10-14',
               '2023-10-15', '2023-10-16'],
              dtype='datetime64[ns]', name='Date', length=15546, freq=None)

In [29]:
birthdays = pd.date_range(start="1991-04-12", end="2023-04-12", freq=pd.DateOffset(years=1))
birthdays

DatetimeIndex(['1991-04-12', '1992-04-12', '1993-04-12', '1994-04-12',
               '1995-04-12', '1996-04-12', '1997-04-12', '1998-04-12',
               '1999-04-12', '2000-04-12', '2001-04-12', '2002-04-12',
               '2003-04-12', '2004-04-12', '2005-04-12', '2006-04-12',
               '2007-04-12', '2008-04-12', '2009-04-12', '2010-04-12',
               '2011-04-12', '2012-04-12', '2013-04-12', '2014-04-12',
               '2015-04-12', '2016-04-12', '2017-04-12', '2018-04-12',
               '2019-04-12', '2020-04-12', '2021-04-12', '2022-04-12',
               '2023-04-12'],
              dtype='datetime64[ns]', freq='<DateOffset: years=1>')

In [30]:
stocks[stocks.index.isin(birthdays)]

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1991-04-12,18.1678,18.1678,17.6954,17.752,21645360.0
1993-04-12,8.305,8.48788,8.28665,8.34598,8518730.0
1994-04-12,8.79793,8.81911,8.59455,8.65281,11495080.0
1995-04-12,14.1607,14.3431,14.0933,14.2562,11586400.0
1996-04-12,18.8636,18.9241,18.0233,18.2517,47247830.0
1999-04-12,60.0758,60.1099,59.2716,60.0543,13666930.0
2000-04-12,78.3089,78.3089,73.1365,74.4931,13522950.0
2001-04-12,63.3393,64.0624,62.5165,63.2075,14290480.0
2002-04-12,57.5306,57.7941,55.4548,56.2414,24056150.0
2004-04-12,61.4342,61.9984,61.3659,61.5933,4736957.0


## Specialized Date Offsets
- Pandas nests more specialized date offsets in `pd.tseries.offsets`.
- We can add a different amount of time to each date (for example, month end, quarter end, year begin)

In [31]:
stocks = pd.read_csv("ibm.csv", parse_dates=["Date"], index_col="Date").sort_index()
stocks.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1962-01-02,5.0461,5.0461,4.98716,4.98716,593562.955237
1962-01-03,4.98716,5.03292,4.98716,5.03292,445175.034277
1962-01-04,5.03292,5.03292,4.98052,4.98052,399513.586679
1962-01-05,4.97389,4.97389,4.87511,4.88166,559321.480565
1962-01-08,4.88166,4.88166,4.75059,4.78972,833273.771393


In [32]:
stocks.index + pd.tseries.offsets.MonthEnd()

DatetimeIndex(['1962-01-31', '1962-01-31', '1962-01-31', '1962-01-31',
               '1962-01-31', '1962-01-31', '1962-01-31', '1962-01-31',
               '1962-01-31', '1962-01-31',
               ...
               '2023-09-30', '2023-09-30', '2023-10-31', '2023-10-31',
               '2023-10-31', '2023-10-31', '2023-10-31', '2023-10-31',
               '2023-10-31', '2023-10-31'],
              dtype='datetime64[ns]', name='Date', length=15546, freq=None)

## Timedeltas
- A **Timedelta** is a pandas object that represents a duration (an amount of time).
- Subtracting two **Timestamp** objects will yield a **Timedelta** object (this applies to subtracting a **Series** from another **Series**).
- The **Timedelta** constructor accepts parameters for time as well as string descriptions.

In [33]:
pd.Timestamp("2023-03-31 12:30:48") - pd.Timestamp("2023-03-20 19:25:59")

Timedelta('10 days 17:04:49')

In [34]:
pd.Timedelta(days=3, hours=2, minutes=5)

Timedelta('3 days 02:05:00')

In [35]:
ecommerce = pd.read_csv("ecommerce.csv", index_col="ID", parse_dates=["order_date", "delivery_date"], date_format="%m/%d/%y")
ecommerce.head()

Unnamed: 0_level_0,order_date,delivery_date
ID,Unnamed: 1_level_1,Unnamed: 2_level_1
1,1998-05-24,1999-02-05
2,1992-04-22,1998-03-06
4,1991-02-10,1992-08-26
5,1992-07-21,1997-11-20
7,1993-09-02,1998-06-10


In [36]:
ecommerce["Delivery Time"] = ecommerce["delivery_date"] - ecommerce["order_date"]
ecommerce.head()

Unnamed: 0_level_0,order_date,delivery_date,Delivery Time
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,1998-05-24,1999-02-05,257 days
2,1992-04-22,1998-03-06,2144 days
4,1991-02-10,1992-08-26,563 days
5,1992-07-21,1997-11-20,1948 days
7,1993-09-02,1998-06-10,1742 days


In [37]:
ecommerce["If It Took Twice As Long"] = ecommerce["delivery_date"] + ecommerce["Delivery Time"]
ecommerce.head()

Unnamed: 0_level_0,order_date,delivery_date,Delivery Time,If It Took Twice As Long
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,1998-05-24,1999-02-05,257 days,1999-10-20
2,1992-04-22,1998-03-06,2144 days,2004-01-18
4,1991-02-10,1992-08-26,563 days,1994-03-12
5,1992-07-21,1997-11-20,1948 days,2003-03-22
7,1993-09-02,1998-06-10,1742 days,2003-03-18


In [38]:
ecommerce["Delivery Time"].max()

Timedelta('3583 days 00:00:00')

In [39]:
ecommerce = pd.read_csv("ecommerce.csv", index_col="ID", parse_dates=["order_date", "delivery_date"], date_format="%m/%d/%y")
ecommerce.head()

Unnamed: 0_level_0,order_date,delivery_date
ID,Unnamed: 1_level_1,Unnamed: 2_level_1
1,1998-05-24,1999-02-05
2,1992-04-22,1998-03-06
4,1991-02-10,1992-08-26
5,1992-07-21,1997-11-20
7,1993-09-02,1998-06-10


In [41]:
ecommerce['delivery_diff'] = ecommerce['delivery_date'].diff()

ecommerce.head()

Unnamed: 0_level_0,order_date,delivery_date,delivery_diff
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,1998-05-24,1999-02-05,NaT
2,1992-04-22,1998-03-06,-336 days
4,1991-02-10,1992-08-26,-2018 days
5,1992-07-21,1997-11-20,1912 days
7,1993-09-02,1998-06-10,202 days
