# Intro: Temporal Data Representations in Python

In this tutorial, we will learn about 3 related date/time data representations in Python.  Namely, `datetime.datetime`, `numpy.datetime64` and `pandas.Timestamp`.

## `datetime.datetime`

The standard way of processing temporal data is to use the standard `datetime` package. It supplies a few fundamental data structures to represent temporal data: `datetime.date`, `datetime.time` and `datetime.datetime`.  We will mostly lookat `datetime.datetime` here.  To create a `datetime` object, we can simply construct it by providing the year, month, day, hour, minute and second.

In [1]:
import datetime

d = datetime.datetime(2023, 1, 1, 1, 55, 30)
d.isoformat()

'2023-01-01T01:55:30'

Note that the `isoformat()` method allows us to serialize the datetime object in [iso 8601](https://en.wikipedia.org/wiki/ISO_8601) format.  We can also omit some fields.

In [2]:
d = datetime.datetime(2023, 1, 1)
d.isoformat()

'2023-01-01T00:00:00'

We can also convert any string representing date/time in ISO 8601 format into a `datetime` object using the `fromisoformat()` method.

In [3]:
d2 = datetime.datetime.fromisoformat("2023-01-01T01:55:30")
d2.isoformat()

'2023-01-01T01:55:30'

Once we have a `datetime` object, we can easily extract different temporal attributes such as `year`, `month`, etc.

In [4]:
print(f"""year:{d.year}
month: {d.month}
day: {d.day}
hour: {d.hour}
minute: {d.minute}
second: {d.second}""")

year:2023
month: 1
day: 1
hour: 0
minute: 0
second: 0


## `numpy.datetime64`

Numpy extends the standard `datetime.datetime` representation with its `numpy.datetime64` data structure. It is able to represent a larger range of date/time. To create a `numpy.datetime64` object from a ISO 8601 format, we can simply use its constructor:

In [5]:
import numpy as np

d = np.datetime64("2023-01-01T01:55:30")
d

numpy.datetime64('2023-01-01T01:55:30')

We can also use the `item()` method to convert the `numpy.datetime64` object back to a `datetime.datetime` object. This allows us to access the same set of temporal attributes easily.

In [6]:
print(f"""isoformat: {d.item().isoformat()}
year:{d.item().year}
month: {d.item().month}
day: {d.item().day}
hour: {d.item().hour}
minute: {d.item().minute}
second: {d.item().second}""")

isoformat: 2023-01-01T01:55:30
year:2023
month: 1
day: 1
hour: 1
minute: 55
second: 30


## `pandas.Timestamp`

Pandas provides its date/time data structure that wraps around `numpy.datetime64`, namely `pandas.Timestamp`. It also improves the API to make working with temporal data in a data science setting easier.

To create a `pandas.Timestamp` object from ISO 8601 string, we can simply use its constructor:

In [7]:
import pandas as pd

d = pd.to_datetime("2023-01-01T01:55:30-0500")
d

Timestamp('2023-01-01 01:55:30-0500', tz='pytz.FixedOffset(-300)')

It supports the same set of temporal attributes as `datetime.datetime`:

In [8]:
print(f"""{d.isoformat()}
year:{d.year}
month: {d.month}
day: {d.day}
hour: {d.hour}
minute: {d.minute}
second: {d.second}""")

2023-01-01T01:55:30-05:00
year:2023
month: 1
day: 1
hour: 1
minute: 55
second: 30


What makes `pandas` special is that it can easily batch process a series of date/time data. For example, we can use `pandas.to_datetime` to convert a list of ISO 8601 strings to a series of `pandas.Timestamp` objects:

In [9]:
dates = pd.to_datetime(["2023-01-01T01:55:30-0500", "2023-01-03T12:51:00-0500", "2023-01-11T15:25:09-0500"])
dates

DatetimeIndex(['2023-01-01 01:55:30-05:00', '2023-01-03 12:51:00-05:00',
               '2023-01-11 15:25:09-05:00'],
              dtype='datetime64[ns, pytz.FixedOffset(-300)]', freq=None)

We can also batch extract different temporal attributes from this series easily just like working with a single `pandas.Timestamp` object.

In [10]:
print(dates.year)
print(dates.month)
print(dates.day)
print(dates.hour)
print(dates.minute)
print(dates.second)

Int64Index([2023, 2023, 2023], dtype='int64')
Int64Index([1, 1, 1], dtype='int64')
Int64Index([1, 3, 11], dtype='int64')
Int64Index([1, 12, 15], dtype='int64')
Int64Index([55, 51, 25], dtype='int64')
Int64Index([30, 0, 9], dtype='int64')


This capability is super convenient! This means we can very easily extract different properties of a column of temporal data and store the result as new columns.  Let's walk through this process using a very simple dataset.

In [11]:
data = pd.DataFrame({
    "date": ["2023-01-01T8:00:00", "2023-01-03T9:00:00", "2023-01-05T10:00:00"],
    "sale": [24, 25, 26]
})
data

Unnamed: 0,date,sale
0,2023-01-01T8:00:00,24
1,2023-01-03T9:00:00,25
2,2023-01-05T10:00:00,26


Here we have a table of 3 rows and 2 columns. The column "date" contains temporal data that include date and time information. To extract the day, hour and day name information from the "date" column into their own columns, we can using the following:

In [12]:
data.date = pd.to_datetime(data.date)
data["day"] = data.date.dt.day
data["hour"] = data.date.dt.hour
data["day_name"] = data.date.dt.day_name()
data

Unnamed: 0,date,sale,day,hour,day_name
0,2023-01-01 08:00:00,24,1,8,Sunday
1,2023-01-03 09:00:00,25,3,9,Tuesday
2,2023-01-05 10:00:00,26,5,10,Thursday


## Summary:

Python provides multiple connected ways of representing date/time information. Among them, processing temporal data with `pandas` is perhaps the most convenient in data science settings. To learn more about each representation, here are the list to their documentation:
* [`datetime.datetime` doc](https://docs.python.org/3/library/datetime.html)
* [`numpy.datetime64` doc](https://numpy.org/doc/stable/reference/arrays.datetime.html)
* [`pandas.Timestamp` doc](https://pandas.pydata.org/docs/user_guide/timeseries.html)