# Time Series (Part 2): Pandas DateTime Objects 

In this notebook, you will learn the followings:
 - Numpy datetime
 - Pandas datetime objects (timestamp, time period, timedelta)
 - Date sequence generation

Read more:
 - "Python for Data Analysis" textbook by Wes McKinney textbook (Chapter 11)
 - https://jakevdp.github.io/PythonDataScienceHandbook/03.11-working-with-time-series.html
 - https://towardsdatascience.com/timestamp-vs-timedelta-vs-time-period-afad0a48a7d1

In [1]:
import numpy as np
import pandas as pd
import datetime

## 1. NumPy's ``datetime64``

The ``datetime64`` dtype encodes dates as 64-bit integers.

In [2]:
date = np.array('2015-07-04', dtype=np.datetime64)
date

array('2015-07-04', dtype='datetime64[D]')

In [3]:
# do vectorized operations
date + np.arange(12)

array(['2015-07-04', '2015-07-05', '2015-07-06', '2015-07-07',
       '2015-07-08', '2015-07-09', '2015-07-10', '2015-07-11',
       '2015-07-12', '2015-07-13', '2015-07-14', '2015-07-15'],
      dtype='datetime64[D]')

In [4]:
# day-based datetime
print(np.datetime64('2015-07-04'))

# minute-based datetime
print(np.datetime64('2015-07-04 12:00'))

# nanosecond-based time
print(np.datetime64('2015-07-04 12:59:59.50', 'ns'))

2015-07-04
2015-07-04T12:00
2015-07-04T12:59:59.500000000


## 2. Pandas time series data structure

- **time stamps**: It is essentially a replacement for Python's native ``datetime``, but is based on the more efficient ``numpy.datetime64`` data type. The associated Index structure is ``DatetimeIndex``.
- **time Periods**: This encodes a fixed-frequency interval based on ``numpy.datetime64``. The associated index structure is ``PeriodIndex``.
- **time deltas** : ``Timedelta`` is a more efficient replacement for Python's native ``datetime.timedelta`` type, and is based on ``numpy.timedelta64``. The associated index structure is ``TimedeltaIndex``.

### 2.1. Timestamp and DatetimeIndex

Pandas Timestamp references to a specific instant in time that has nanosecond precision(one thousand-millionth of a second).

In [5]:
print(pd.Timestamp(year=2020, month=6, day=9, hour=8, minute=30, 
                   second=20, microsecond=79, nanosecond=99))
print(pd.Timestamp('2020-6-9'))
print(pd.Timestamp(2020, 6, 9, 12))
print(pd.Timestamp('2020/6-9'))
print(pd.Timestamp('June 9, 2020 13:45'))
print(pd.Timestamp('2017-01-01T12'))
print(pd.Timestamp(300))
print(pd.Timestamp(1513393355.5))

2020-06-09 08:30:20.000079099
2020-06-09 00:00:00
2020-06-09 12:00:00
2020-06-09 00:00:00
2020-06-09 13:45:00
2017-01-01 12:00:00
1970-01-01 00:00:00.000000300
1970-01-01 00:00:01.513393355


`Timestamp` and `DatetimeIndex` objects can be invoked directly, but it is more common to use the `pd.to_datetime()` function, which can parse a wide variety of formats. 

In [6]:
# Passing a single date to pd.to_datetime() yields a Timestamp; 
date = pd.to_datetime(datetime.datetime(2015, 7, 3))
date

Timestamp('2015-07-03 00:00:00')

In [7]:
#passing a series of dates by default yields a DatetimeIndex:
dates = pd.to_datetime([datetime.datetime(2015, 7, 3), '4th of July, 2015',
                       '2015-Jul-6', '07-07-2015', '20150708'])
dates

DatetimeIndex(['2015-07-03', '2015-07-04', '2015-07-06', '2015-07-07',
               '2015-07-08'],
              dtype='datetime64[ns]', freq=None)

### 2.2. Timeperiod and PeriodIndex

 - Time Periods references a **specific length of time** between a start and end timestamp which is invariable and does not overlap.
 - A value and a frequency parameter are passed to the Period() constructor, the frequency parameter specified as freq takes a predefined set of strings

In [8]:
print(pd.Period(2020, freq='A-OCT'))
print(pd.Period('2020-05', freq='M'))

2020
2020-05


A sequence of Period objects can be generated using the `period_range()` function that takes a start, end and frequency parameters.

In [9]:
period_range_sample = pd.period_range('3/5/2020', '13/5/2021', freq='M')
period_range_sample 

PeriodIndex(['2020-03', '2020-04', '2020-05', '2020-06', '2020-07', '2020-08',
             '2020-09', '2020-10', '2020-11', '2020-12', '2021-01', '2021-02',
             '2021-03', '2021-04', '2021-05'],
            dtype='period[M]', freq='M')

Converting between frequencies can be done with the `asfreq()` method. 

In [10]:
monthly_freq = pd.Period('2020-06',freq='M')
monthly_freq #Period('2020-06', 'M')

# convert at the start of the month
#monthly_freq.asfreq('W', how='start') #Period('2020-06-01/2020-06-07', 'W-SUN')

# convert at the end of the month
#monthly_freq.asfreq('W', how='end')   #Period('2020-06-29/2020-07-05', 'W-SUN')

Period('2020-06', 'M')

convert ``DatetimeIndex`` to a ``PeriodIndex`` with the ``to_period()`` function with the addition of a frequency code

In [11]:
dates = pd.to_datetime([datetime.datetime(2015, 7, 3), '4th of July, 2015',
                       '2015-Jul-6', '07-07-2015', '20150708'])
dates.to_period('D')

PeriodIndex(['2015-07-03', '2015-07-04', '2015-07-06', '2015-07-07',
             '2015-07-08'],
            dtype='period[D]', freq='D')

### 2.3. Timedelta and TimedeltaIndex

 - `Timedelta` is nothing but the **difference in time** which can refer to an amount of time or the exact length of time between two intervals and is based on `numpy.timedelta64`.

In [12]:
# creating timedelta objects
pd.Timedelta('3 days')  # Timedelta('3 days 00:00:00')
#pd.Timedelta('8 days 7 hours') #Timedelta('8 days 07:00:00')
#pd.Timedelta('7 days 00:00:59')

Timedelta('3 days 00:00:00')

In [13]:
#time delta object 1/2
td1 = pd.Timedelta(weeks=3, days=6, hours=12, minutes=30, seconds=10.93, milliseconds=66, microseconds=3)
td2 = pd.Timedelta(weeks=7, days=5, hours=10, minutes=40, seconds=8.64, milliseconds=33, microseconds=8)

# add timedelta to timedelta
td1 + td1 #datetime.timedelta(days=55, seconds=3621, microseconds=992006)

# subtract timedelta from timedelta
td2 - td1 #datetime.timedelta(days=26, seconds=79797, microseconds=677005)

# date object
date = pd.Timestamp('2020-6-9')

# add timedelta to date
td1 + date #datetime.date(2020, 5, 30)

Timestamp('2020-07-06 12:30:10.996003')

In [14]:
# A TimedeltaIndex is created, for example, when a date is subtracted from another:
dates = pd.to_datetime([datetime.datetime(2015, 7, 3), '4th of July, 2015',
                       '2015-Jul-6', '07-07-2015', '20150708'])
dates - dates[0]

TimedeltaIndex(['0 days', '1 days', '3 days', '4 days', '5 days'], dtype='timedelta64[ns]', freq=None)

## 3. Date sequence generation

To make the creation of regular date sequences more convenient, Pandas offers a few functions: 
 - ``pd.date_range()`` for timestamps
 - ``pd.period_range()`` for periods
 - ``pd.timedelta_range()`` for time deltas.
 

**Note**: The following codes are used for frequency and offset
 
| Code   | Description         | Code   | Description          |
|--------|---------------------|--------|----------------------|
| ``D``  | Calendar day        | ``B``  | Business day         |
| ``W``  | Weekly              |        |                      |
| ``M``  | Month end           | ``BM`` | Business month end   |
| ``Q``  | Quarter end         | ``BQ`` | Business quarter end |
| ``A``  | Year end            | ``BA`` | Business year end    |
| ``H``  | Hours               | ``BH`` | Business hours       |
| ``T``  | Minutes             |        |                      |
| ``S``  | Seconds             |        |                      |
| ``L``  | Milliseonds         |        |                      |
| ``U``  | Microseconds        |        |                      |
| ``N``  | nanoseconds         |        |                      |

### 3.1. ``pd.date_range()`` 

In [15]:
# accepts a start date, an end date, and an optional frequency code
pd.date_range('2015-07-03', '2015-07-10')

DatetimeIndex(['2015-07-03', '2015-07-04', '2015-07-05', '2015-07-06',
               '2015-07-07', '2015-07-08', '2015-07-09', '2015-07-10'],
              dtype='datetime64[ns]', freq='D')

In [16]:
# the date range can be specified with a startpoint and a number of periods
pd.date_range('2015-07-03', periods=8)

DatetimeIndex(['2015-07-03', '2015-07-04', '2015-07-05', '2015-07-06',
               '2015-07-07', '2015-07-08', '2015-07-09', '2015-07-10'],
              dtype='datetime64[ns]', freq='D')

In [17]:
pd.date_range('2015-07-03', periods=8, freq='H')

DatetimeIndex(['2015-07-03 00:00:00', '2015-07-03 01:00:00',
               '2015-07-03 02:00:00', '2015-07-03 03:00:00',
               '2015-07-03 04:00:00', '2015-07-03 05:00:00',
               '2015-07-03 06:00:00', '2015-07-03 07:00:00'],
              dtype='datetime64[ns]', freq='H')

In [18]:
pd.date_range(start='2012-04-01', periods=10)
pd.date_range(end='2012-06-01', periods=10)

DatetimeIndex(['2012-05-23', '2012-05-24', '2012-05-25', '2012-05-26',
               '2012-05-27', '2012-05-28', '2012-05-29', '2012-05-30',
               '2012-05-31', '2012-06-01'],
              dtype='datetime64[ns]', freq='D')

In [19]:
pd.date_range('2000-01-01', '2000-12-01', freq='BM')

DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31', '2000-04-28',
               '2000-05-31', '2000-06-30', '2000-07-31', '2000-08-31',
               '2000-09-29', '2000-10-31', '2000-11-30'],
              dtype='datetime64[ns]', freq='BM')

In [20]:
pd.date_range('2012-05-02 12:56:31', periods=5)
#pd.date_range('2012-05-02 12:56:31', periods=5, normalize=True) # remove time

DatetimeIndex(['2012-05-02 12:56:31', '2012-05-03 12:56:31',
               '2012-05-04 12:56:31', '2012-05-05 12:56:31',
               '2012-05-06 12:56:31'],
              dtype='datetime64[ns]', freq='D')

In [21]:
rng = pd.date_range('2012-01-01', '2012-06-01', freq='WOM-3FRI')
list(rng)

[Timestamp('2012-01-20 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-02-17 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-03-16 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-04-20 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-05-18 00:00:00', freq='WOM-3FRI')]

### 3.2. ``pd.period_range()`` 

In [22]:
pd.period_range('2015-07', periods=8, freq='M')

PeriodIndex(['2015-07', '2015-08', '2015-09', '2015-10', '2015-11', '2015-12',
             '2016-01', '2016-02'],
            dtype='period[M]', freq='M')

### 3.3. ``pd.timedelta_range()`` 

In [23]:
pd.timedelta_range(0, periods=10, freq='H')

TimedeltaIndex(['00:00:00', '01:00:00', '02:00:00', '03:00:00', '04:00:00',
                '05:00:00', '06:00:00', '07:00:00', '08:00:00', '09:00:00'],
               dtype='timedelta64[ns]', freq='H')

In [24]:
pd.timedelta_range(0, periods=9, freq="2H30T")

TimedeltaIndex(['00:00:00', '02:30:00', '05:00:00', '07:30:00', '10:00:00',
                '12:30:00', '15:00:00', '17:30:00', '20:00:00'],
               dtype='timedelta64[ns]', freq='150T')

In [25]:
pd.timedelta_range(0, periods=9, freq="2H30T")

TimedeltaIndex(['00:00:00', '02:30:00', '05:00:00', '07:30:00', '10:00:00',
                '12:30:00', '15:00:00', '17:30:00', '20:00:00'],
               dtype='timedelta64[ns]', freq='150T')