# CPSC380: 5_TimeSeries_2_Pandas_DateTime

In this notebook, you will learn the followings:
 - Numpy datetime
 - Pandas datetime objects (timestamp, time period, timedelta)
 - Date sequence generation

Read more:
 - "Python for Data Analysis" textbook by Wes McKinney textbook (Chapter 11)
 - https://jakevdp.github.io/PythonDataScienceHandbook/03.11-working-with-time-series.html
 - https://towardsdatascience.com/timestamp-vs-timedelta-vs-time-period-afad0a48a7d1

In [60]:
import numpy as np
import pandas as pd
import datetime

## 1. NumPy's ``datetime64``

The ``datetime64`` dtype encodes dates as 64-bit integers.

In [61]:
date = np.array('2015-07-04', dtype=np.datetime64)
date

array('2015-07-04', dtype='datetime64[D]')

In [62]:
print(type(date), type(date.dtype), "\n")
print(dir(date.dtype))

<class 'numpy.ndarray'> <class 'numpy.dtype[datetime64]'> 

['__bool__', '__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', 'alignment', 'base', 'byteorder', 'char', 'descr', 'fields', 'flags', 'hasobject', 'isalignedstruct', 'isbuiltin', 'isnative', 'itemsize', 'kind', 'metadata', 'name', 'names', 'ndim', 'newbyteorder', 'num', 'shape', 'str', 'subdtype', 'type']


In [63]:
# do vectorized operations
date + np.arange(12)

array(['2015-07-04', '2015-07-05', '2015-07-06', '2015-07-07',
       '2015-07-08', '2015-07-09', '2015-07-10', '2015-07-11',
       '2015-07-12', '2015-07-13', '2015-07-14', '2015-07-15'],
      dtype='datetime64[D]')

In [64]:
# day-based datetime
print(np.datetime64('2015-07-04'))

# minute-based datetime
print(np.datetime64('2015-07-04 12:00'))

# nanosecond-based time
print(np.datetime64('2015-07-04 12:59:59.50', 'ns'))

print(np.datetime64('today')) # today's date
print(np.datetime64('now')) # timestamp right now 

2015-07-04
2015-07-04T12:00
2015-07-04T12:59:59.500000000
2023-04-04
2023-04-04T15:27:57


In [65]:
# timedelta
diff=np.datetime64('now') - date
print(type(diff.dtype), diff)

<class 'numpy.dtype[timedelta64]'> 244654077 seconds


## 2. Pandas time series data structure

- **time stamps**: It is essentially a replacement for Python's native ``datetime``, but is based on the more efficient ``numpy.datetime64`` data type. The associated Index structure is ``DatetimeIndex``.
- **time periods**: This encodes a fixed-frequency interval based on ``numpy.datetime64``. The associated index structure is ``PeriodIndex``.
- **time deltas** : ``Timedelta`` is a more efficient replacement for Python's native ``datetime.timedelta`` type, and is based on ``numpy.timedelta64``. The associated index structure is ``TimedeltaIndex``.

### 2.1. Timestamp and DatetimeIndex

Pandas Timestamp references to a specific instant in time that has nanosecond precision(one thousand-millionth of a second).

**pd.Timestamp versus np.datetime64**:
- **numpy.datetime64** is essentially a thin wrapper for **int64**. It has almost no date/time specific functionality.

- **pd.Timestamp** is a wrapper around a **numpy.datetime64**. It is backed by the same int64 value, but supports the entire datetime.datetime interface, along with useful pandas-specific functionality.

In [66]:
print(dir(pd.Timestamp))

['__add__', '__array_priority__', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__pyx_vtable__', '__radd__', '__reduce__', '__reduce_cython__', '__reduce_ex__', '__repr__', '__rsub__', '__setattr__', '__setstate__', '__setstate_cython__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__weakref__', '_date_repr', '_freq', '_freqstr', '_repr_base', '_round', '_set_freq', '_short_repr', '_time_repr', 'asm8', 'astimezone', 'ceil', 'combine', 'ctime', 'date', 'day', 'day_name', 'day_of_week', 'day_of_year', 'dayofweek', 'dayofyear', 'days_in_month', 'daysinmonth', 'dst', 'floor', 'fold', 'freq', 'freqstr', 'fromisoformat', 'fromordinal', 'fromtimestamp', 'hour', 'is_leap_year', 'is_month_end', 'is_month_start', 'is_quarter_end', 'is_quarter_start', 'is_year_end', 'is_year_start', 'isocalendar', 'isoform

In [67]:
print(pd.Timestamp(year=2020, month=6, day=9, hour=8, minute=30, 
                   second=20, microsecond=79, nanosecond=99))
print(pd.Timestamp('2020-6-9'))
print(pd.Timestamp(2020, 6, 9, 12))
print(pd.Timestamp('2020/6-9'))
print(pd.Timestamp('June 9, 2020 13:45'))
print(pd.Timestamp('2017-01-01T12'))
print(pd.Timestamp(300))
pd.Timestamp(1513393355.5)

2020-06-09 08:30:20.000079099
2020-06-09 00:00:00
2020-06-09 12:00:00
2020-06-09 00:00:00
2020-06-09 13:45:00
2017-01-01 12:00:00
1970-01-01 00:00:00.000000300


Timestamp('1970-01-01 00:00:01.513393355')

`Timestamp` and `DatetimeIndex` objects can be invoked directly, but it is more common to use the `pd.to_datetime()` function, which can parse a wide variety of formats. 

In [68]:
# Passing a single date to pd.to_datetime() yields a Timestamp; 
date = pd.to_datetime(datetime.datetime(2015, 7, 3))
date

Timestamp('2015-07-03 00:00:00')

In [69]:
#passing a series of dates by default yields a DatetimeIndex:
dates = pd.to_datetime([datetime.datetime(2015, 7, 3), '4th of July, 2015',
                       '2015-Jul-6', '07-07-2015', '20150708', '20080708'])
dates

DatetimeIndex(['2015-07-03', '2015-07-04', '2015-07-06', '2015-07-07',
               '2015-07-08', '2008-07-08'],
              dtype='datetime64[ns]', freq=None)

### 2.2. Timeperiod and PeriodIndex

- Time Periods references a **specific length of time** between a start and end timestamp which is invariable and does not overlap.
- A value and a frequency parameter are passed to the Period() constructor, the frequency parameter specified as freq takes a predefined set of strings
 
 Period constructor link: https://www.skytowner.com/explore/pandas_period_constructor

In [70]:
print(dir(pd.Period))

['__add__', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__pyx_vtable__', '__radd__', '__reduce__', '__reduce_ex__', '__repr__', '__rsub__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__weakref__', '_add_offset', '_add_timedeltalike_scalar', '_dtype', '_from_ordinal', '_get_to_timestamp_base', '_maybe_convert_freq', '_require_matching_freq', 'asfreq', 'day', 'day_of_week', 'day_of_year', 'dayofweek', 'dayofyear', 'days_in_month', 'daysinmonth', 'end_time', 'freq', 'freqstr', 'hour', 'is_leap_year', 'minute', 'month', 'now', 'ordinal', 'quarter', 'qyear', 'second', 'start_time', 'strftime', 'to_timestamp', 'week', 'weekday', 'weekofyear', 'year']


**Period v.s. Timestamp**: 

**Periods** can be use to check if a specific event occurs within a certain period. Basically a Period represents an **interval** while a **Timestamp** represents a **point** in time.

In [92]:
# return True since the period is 1Day. This test cannot be done with a Timestamp. 
p = pd.Period('2017-06-13', freq = 'D')
print("Starting time:", p.start_time)
print("Ending time:", p.end_time)
test = pd.Timestamp('2017-06-13 22:11')
p.start_time < test < p.end_time

Starting time: 2017-06-13 00:00:00
Ending time: 2017-06-13 23:59:59.999999999


True

In [101]:
p = pd.Period('2023', freq='M')

print("Starting time:", p.start_time)
print("Ending time:", p.end_time)
print("The month this Period falls on:", p.month)
print("The week of the year on the given period:", p.week)

Starting time: 2023-01-01 00:00:00
Ending time: 2023-01-31 23:59:59.999999999
The month this Period falls on: 1
The week of the year on the given period: 5


In [106]:
p = pd.Period('2023-04', freq='M')

print("Starting time:", p.start_time)
print("Ending time:", p.end_time)
print("The month this Period falls on:", p.month)
print("The week of the year on the given period:", p.week)

Starting time: 2023-04-01 00:00:00
Ending time: 2023-04-30 23:59:59.999999999
The month this Period falls on: 4
The week of the year on the given period: 17


In [107]:
p = pd.Period('2023-04-08', freq='M')

print("Starting time:", p.start_time)
print("Ending time:", p.end_time)
print("The month this Period falls on:", p.month)
print("The week of the year on the given period:", p.week)

Starting time: 2023-04-01 00:00:00
Ending time: 2023-04-30 23:59:59.999999999
The month this Period falls on: 4
The week of the year on the given period: 17


In [108]:
p = pd.Period('2023-04-08', freq='Y')

print("Starting time:", p.start_time)
print("Ending time:", p.end_time)
print("The month this Period falls on:", p.month)
print("The week of the year on the given period:", p.week)

Starting time: 2023-01-01 00:00:00
Ending time: 2023-12-31 23:59:59.999999999
The month this Period falls on: 12
The week of the year on the given period: 52


A sequence of Period objects can be generated using the `period_range()` function that takes a start, end and frequency parameters.

In [105]:
period_range_sample = pd.period_range('3/5/2020', '3/5/2023', freq='M')
period_range_sample 

PeriodIndex(['2020-03', '2020-04', '2020-05', '2020-06', '2020-07', '2020-08',
             '2020-09', '2020-10', '2020-11', '2020-12', '2021-01', '2021-02',
             '2021-03', '2021-04', '2021-05', '2021-06', '2021-07', '2021-08',
             '2021-09', '2021-10', '2021-11', '2021-12', '2022-01', '2022-02',
             '2022-03', '2022-04', '2022-05', '2022-06', '2022-07', '2022-08',
             '2022-09', '2022-10', '2022-11', '2022-12', '2023-01', '2023-02',
             '2023-03'],
            dtype='period[M]')

Converting between frequencies can be done with the `asfreq()` method. 

In [74]:
monthly_freq = pd.Period('2021-06',freq='M')
monthly_freq #Period('2020-06', 'M')

# convert at the start of the month
monthly_freq.asfreq('W', how='start') #Period('2020-06-01/2020-06-07', 'W-SUN')

# convert at the end of the month
monthly_freq.asfreq('W', how='end')   #Period('2020-06-29/2020-07-05', 'W-SUN')

Period('2021-06-28/2021-07-04', 'W-SUN')

convert ``DatetimeIndex`` to a ``PeriodIndex`` with the ``to_period()`` function with the addition of a frequency code

In [109]:
dates = pd.to_datetime([datetime.datetime(2015, 7, 3), '4th of July, 2015',
                       '2015-Jul-6', '07-07-2015', '20150708', '20080708'])
dates
dates.to_period('D')

PeriodIndex(['2015-07-03', '2015-07-04', '2015-07-06', '2015-07-07',
             '2015-07-08', '2008-07-08'],
            dtype='period[D]')

### 2.3. Timedelta and TimedeltaIndex

- `Timedelta` is nothing but the **difference in time** which can refer to an amount of time or the exact length of time between two intervals and is based on `numpy.timedelta64`.

In [110]:
print(dir(pd.Timedelta))

['__abs__', '__add__', '__array_priority__', '__bool__', '__class__', '__delattr__', '__dict__', '__dir__', '__divmod__', '__doc__', '__eq__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__inv__', '__le__', '__lt__', '__mod__', '__module__', '__mul__', '__ne__', '__neg__', '__new__', '__pos__', '__pyx_vtable__', '__radd__', '__rdivmod__', '__reduce__', '__reduce_cython__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rmod__', '__rmul__', '__rsub__', '__rtruediv__', '__setattr__', '__setstate__', '__setstate_cython__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__weakref__', '_d', '_ensure_components', '_h', '_has_ns', '_m', '_ms', '_ns', '_repr_base', '_round', '_s', '_us', 'asm8', 'ceil', 'components', 'days', 'delta', 'floor', 'freq', 'is_populated', 'isoformat', 'max', 'microseconds', 'min', 'nanoseconds', 'resolution', 'resolution_string', 'round', 'seconds', 'to_numpy', 'to_pyt

In [76]:
# creating timedelta objects
pd.Timedelta('3 days')  # Timedelta('3 days 00:00:00')
#pd.Timedelta('8 days 7 hours') #Timedelta('8 days 07:00:00')
#pd.Timedelta('7 days 00:00:59')

Timedelta('3 days 00:00:00')

In [77]:
#time delta object 1/2
td1 = pd.Timedelta(weeks=3, days=6, hours=12, minutes=30, seconds=10.93, milliseconds=66, microseconds=3)
td2 = pd.Timedelta(weeks=7, days=5, hours=10, minutes=40, seconds=8.64, milliseconds=33, microseconds=8)

# add timedelta to timedelta
td1 + td1 #datetime.timedelta(days=55, seconds=3621, microseconds=992006)

# subtract timedelta from timedelta
td2 - td1 #datetime.timedelta(days=26, seconds=79797, microseconds=677005)

# date object
date = pd.Timestamp('2020-6-9')

# add timedelta to date
td1 + date #datetime.date(2020, 5, 30)

Timestamp('2020-07-06 12:30:10.996003')

In [78]:
# A TimedeltaIndex is created, for example, when a date is subtracted from another:
dates = pd.to_datetime([datetime.datetime(2015, 7, 3), '4th of July, 2015',
                       '2015-Jul-6', '07-07-2015', '20150708'])
dates
dates - dates[0]

TimedeltaIndex(['0 days', '1 days', '3 days', '4 days', '5 days'], dtype='timedelta64[ns]', freq=None)

## 3. Date sequence generation

To make the creation of regular date sequences more convenient, Pandas offers a few functions: 
 - ``pd.date_range()`` for timestamps
 - ``pd.period_range()`` for periods
 - ``pd.timedelta_range()`` for time deltas.
 

**Note**: The following codes are used for frequency and offset
 
| Code   | Description         | Code   | Description          |
|--------|---------------------|--------|----------------------|
| ``D``  | Calendar day        | ``B``  | Business day         |
| ``W``  | Weekly              |        |                      |
| ``M``  | Month end           | ``BM`` | Business month end   |
| ``Q``  | Quarter end         | ``BQ`` | Business quarter end |
| ``A``  | Year end            | ``BA`` | Business year end    |
| ``H``  | Hours               | ``BH`` | Business hours       |
| ``T``  | Minutes             |        |                      |
| ``S``  | Seconds             |        |                      |
| ``L``  | Milliseonds         |        |                      |
| ``U``  | Microseconds        |        |                      |
| ``N``  | nanoseconds         |        |                      |

### 3.1. ``pd.date_range()`` 

#range(start, end, step)

In [79]:
# accepts a start date, an end date, and an optional frequency code
pd.date_range('2015-07-03', '2015-07-10')

DatetimeIndex(['2015-07-03', '2015-07-04', '2015-07-05', '2015-07-06',
               '2015-07-07', '2015-07-08', '2015-07-09', '2015-07-10'],
              dtype='datetime64[ns]', freq='D')

In [80]:
# the date range can be specified with a startpoint and a number of periods
pd.date_range('2015-07-03', periods=8)

DatetimeIndex(['2015-07-03', '2015-07-04', '2015-07-05', '2015-07-06',
               '2015-07-07', '2015-07-08', '2015-07-09', '2015-07-10'],
              dtype='datetime64[ns]', freq='D')

In [81]:
pd.date_range('2021-11-19', periods=8, freq='T')

DatetimeIndex(['2021-11-19 00:00:00', '2021-11-19 00:01:00',
               '2021-11-19 00:02:00', '2021-11-19 00:03:00',
               '2021-11-19 00:04:00', '2021-11-19 00:05:00',
               '2021-11-19 00:06:00', '2021-11-19 00:07:00'],
              dtype='datetime64[ns]', freq='T')

In [82]:
pd.date_range(start='2012-04-01', periods=10)
pd.date_range(end='2012-06-01', periods=10)

DatetimeIndex(['2012-05-23', '2012-05-24', '2012-05-25', '2012-05-26',
               '2012-05-27', '2012-05-28', '2012-05-29', '2012-05-30',
               '2012-05-31', '2012-06-01'],
              dtype='datetime64[ns]', freq='D')

In [83]:
pd.date_range('2000-01-01', '2000-12-01', freq='BM')

DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31', '2000-04-28',
               '2000-05-31', '2000-06-30', '2000-07-31', '2000-08-31',
               '2000-09-29', '2000-10-31', '2000-11-30'],
              dtype='datetime64[ns]', freq='BM')

In [84]:
pd.date_range('2012-05-02 12:56:31', periods=5,  freq='T')
pd.date_range('2012-05-02 12:56:31', periods=5, normalize=True) # remove time

DatetimeIndex(['2012-05-02', '2012-05-03', '2012-05-04', '2012-05-05',
               '2012-05-06'],
              dtype='datetime64[ns]', freq='D')

In [85]:
rng = pd.date_range('2012-01-01', '2012-06-01', freq='WOM-3FRI')
list(rng)

[Timestamp('2012-01-20 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-02-17 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-03-16 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-04-20 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-05-18 00:00:00', freq='WOM-3FRI')]

### 3.2. ``pd.period_range()`` 

In [114]:
ps= pd.period_range('2023-04-08', periods=8, freq='M')
print(ps)

print("Starting time:", ps[0].start_time)
print("Ending time:", ps[0].end_time)
print("Starting time:", ps[1].start_time)
print("Ending time:", ps[1].end_time)

PeriodIndex(['2023-04', '2023-05', '2023-06', '2023-07', '2023-08', '2023-09',
             '2023-10', '2023-11'],
            dtype='period[M]')
Starting time: 2023-04-01 00:00:00
Ending time: 2023-04-30 23:59:59.999999999
Starting time: 2023-05-01 00:00:00
Ending time: 2023-05-31 23:59:59.999999999


#### date_range v.s. period_range

In [118]:
# date_range
print(pd.date_range('2015-07-03', periods=8, freq='S'),"\n")

# period_range
ps=pd.period_range('2015-07-03', periods=8, freq='S')
print("Starting time:", ps[0].start_time)
print("Ending time:", ps[0].end_time)
print("Starting time:", ps[1].start_time)
print("Ending time:", ps[1].end_time)

DatetimeIndex(['2015-07-03 00:00:00', '2015-07-03 00:00:01',
               '2015-07-03 00:00:02', '2015-07-03 00:00:03',
               '2015-07-03 00:00:04', '2015-07-03 00:00:05',
               '2015-07-03 00:00:06', '2015-07-03 00:00:07'],
              dtype='datetime64[ns]', freq='S') 

Starting time: 2015-07-03 00:00:00
Ending time: 2015-07-03 00:00:00.999999999
Starting time: 2015-07-03 00:00:01
Ending time: 2015-07-03 00:00:01.999999999


### 3.3. ``pd.timedelta_range()`` 

In [88]:
pd.timedelta_range(0, periods=10, freq='H')

TimedeltaIndex(['0 days 00:00:00', '0 days 01:00:00', '0 days 02:00:00',
                '0 days 03:00:00', '0 days 04:00:00', '0 days 05:00:00',
                '0 days 06:00:00', '0 days 07:00:00', '0 days 08:00:00',
                '0 days 09:00:00'],
               dtype='timedelta64[ns]', freq='H')

In [121]:
pd.timedelta_range(0, periods=9, freq="2H30T")

TimedeltaIndex(['0 days 00:00:00', '0 days 02:30:00', '0 days 05:00:00',
                '0 days 07:30:00', '0 days 10:00:00', '0 days 12:30:00',
                '0 days 15:00:00', '0 days 17:30:00', '0 days 20:00:00'],
               dtype='timedelta64[ns]', freq='150T')

In [130]:
tds= pd.timedelta_range(0, periods=9, freq="2H30T")
print(tds)
print("The difference between two dates or times:", tds[0].total_seconds())
print("The difference between two dates or times:", tds[1].total_seconds())
print("The difference between two dates or times:", tds[2].total_seconds())


TimedeltaIndex(['0 days 00:00:00', '0 days 02:30:00', '0 days 05:00:00',
                '0 days 07:30:00', '0 days 10:00:00', '0 days 12:30:00',
                '0 days 15:00:00', '0 days 17:30:00', '0 days 20:00:00'],
               dtype='timedelta64[ns]', freq='150T')
The difference between two dates or times: 0.0
The difference between two dates or times: 9000.0
The difference between two dates or times: 18000.0
