In [1]:
from pandas import DataFrame, Series
import pandas as pd
import sys
import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline


from datetime import datetime
from datetime import timedelta
from dateutil.parser import parse

# Date Ranges, Frequencies, and Shifting

Generic time series in pandas are assumed to be irregular; that is, they have no fixed
frequency. For many applications this is sufficient. However, it’s often desirable to work
relative to a fixed frequency, such as daily, monthly, or every 15 minutes, even if that
means introducing missing values into a time series. Fortunately pandas has a full suite
of standard time series frequencies and tools for resampling, inferring frequencies, and
generating fixed frequency date ranges. For example, in the example time series, converting
it to be fixed daily frequency can be accomplished by calling resample:

In [4]:
dates = [datetime(2011, 1, 2), datetime(2011, 1, 5), datetime(2011, 1, 7),
    datetime(2011, 1, 8), datetime(2011, 1, 10), datetime(2011, 1, 12)]

In [5]:
ts = Series(np.random.randn(6), index=dates)

In [6]:
ts

2011-01-02   -0.364584
2011-01-05    0.366947
2011-01-07    0.247626
2011-01-08   -2.218301
2011-01-10   -1.283301
2011-01-12    1.492874
dtype: float64

In [7]:
ts.resample('D')

DatetimeIndexResampler [freq=<Day>, axis=0, closed=left, label=left, convention=start, base=0]

Conversion between frequencies or resampling is a big enough topic to have its own
section later. Here I’ll show you how to use the base frequencies and multiples thereof.

# Generating Date Ranges

While I used it previously without explanation, you may have guessed that pan
das.date_range is responsible for generating a DatetimeIndex with an indicated length
according to a particular frequency:

In [10]:
index = pd.date_range('4/1/2012', '6/1/2012')

In [11]:
index

DatetimeIndex(['2012-04-01', '2012-04-02', '2012-04-03', '2012-04-04',
               '2012-04-05', '2012-04-06', '2012-04-07', '2012-04-08',
               '2012-04-09', '2012-04-10', '2012-04-11', '2012-04-12',
               '2012-04-13', '2012-04-14', '2012-04-15', '2012-04-16',
               '2012-04-17', '2012-04-18', '2012-04-19', '2012-04-20',
               '2012-04-21', '2012-04-22', '2012-04-23', '2012-04-24',
               '2012-04-25', '2012-04-26', '2012-04-27', '2012-04-28',
               '2012-04-29', '2012-04-30', '2012-05-01', '2012-05-02',
               '2012-05-03', '2012-05-04', '2012-05-05', '2012-05-06',
               '2012-05-07', '2012-05-08', '2012-05-09', '2012-05-10',
               '2012-05-11', '2012-05-12', '2012-05-13', '2012-05-14',
               '2012-05-15', '2012-05-16', '2012-05-17', '2012-05-18',
               '2012-05-19', '2012-05-20', '2012-05-21', '2012-05-22',
               '2012-05-23', '2012-05-24', '2012-05-25', '2012-05-26',
      

By default, date_range generates daily timestamps. If you pass only a start or end date,
you must pass a number of periods to generate:

In [12]:
pd.date_range(start='4/1/2012', periods=20)

DatetimeIndex(['2012-04-01', '2012-04-02', '2012-04-03', '2012-04-04',
               '2012-04-05', '2012-04-06', '2012-04-07', '2012-04-08',
               '2012-04-09', '2012-04-10', '2012-04-11', '2012-04-12',
               '2012-04-13', '2012-04-14', '2012-04-15', '2012-04-16',
               '2012-04-17', '2012-04-18', '2012-04-19', '2012-04-20'],
              dtype='datetime64[ns]', freq='D')

In [13]:
pd.date_range(end='6/1/2012', periods=20)

DatetimeIndex(['2012-05-13', '2012-05-14', '2012-05-15', '2012-05-16',
               '2012-05-17', '2012-05-18', '2012-05-19', '2012-05-20',
               '2012-05-21', '2012-05-22', '2012-05-23', '2012-05-24',
               '2012-05-25', '2012-05-26', '2012-05-27', '2012-05-28',
               '2012-05-29', '2012-05-30', '2012-05-31', '2012-06-01'],
              dtype='datetime64[ns]', freq='D')

The start and end dates define strict boundaries for the generated date index. For example,
if you wanted a date index containing the last business day of each month, you
would pass the 'BM' frequency (business end of month) and only dates falling on or
inside the date interval will be included:

In [14]:
pd.date_range('1/1/2000', '12/1/2000', freq='BM')

DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31', '2000-04-28',
               '2000-05-31', '2000-06-30', '2000-07-31', '2000-08-31',
               '2000-09-29', '2000-10-31', '2000-11-30'],
              dtype='datetime64[ns]', freq='BM')

date_range by default preserves the time (if any) of the start or end timestamp:

date_range by default preserves the time (if any) of the start or end timestamp:

In [15]:
pd.date_range('5/2/2012 12:56:31', periods=5)

DatetimeIndex(['2012-05-02 12:56:31', '2012-05-03 12:56:31',
               '2012-05-04 12:56:31', '2012-05-05 12:56:31',
               '2012-05-06 12:56:31'],
              dtype='datetime64[ns]', freq='D')

Sometimes you will have start or end dates with time information but want to generate
a set of timestamps normalized to midnight as a convention. To do this, there is a
normalize option:

In [16]:
pd.date_range('5/2/2012 12:56:31', periods=5, normalize=True)

DatetimeIndex(['2012-05-02', '2012-05-03', '2012-05-04', '2012-05-05',
               '2012-05-06'],
              dtype='datetime64[ns]', freq='D')

# Frequencies and Date Offsets

Frequencies in pandas are composed of a base frequency and a multiplier. Base frequencies
are typically referred to by a string alias, like 'M' for monthly or 'H' for hourly.
For each base frequency, there is an object defined generally referred to as a date offset.
For example, hourly frequency can be represented with the Hour class:

In [17]:
from pandas.tseries.offsets import Hour, Minute

In [22]:
hour = Hour()

In [23]:
hour

<Hour>

In [24]:
four_hours = Hour(4)

In [25]:
four_hours

<4 * Hours>

In most applications, you would never need to explicitly create one of these objects,
instead using a string alias like 'H' or '4H'. Putting an integer before the base frequency
creates a multiple:

In [26]:
pd.date_range('1/1/2000', '1/3/2000 23:59', freq='4h')

DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 04:00:00',
               '2000-01-01 08:00:00', '2000-01-01 12:00:00',
               '2000-01-01 16:00:00', '2000-01-01 20:00:00',
               '2000-01-02 00:00:00', '2000-01-02 04:00:00',
               '2000-01-02 08:00:00', '2000-01-02 12:00:00',
               '2000-01-02 16:00:00', '2000-01-02 20:00:00',
               '2000-01-03 00:00:00', '2000-01-03 04:00:00',
               '2000-01-03 08:00:00', '2000-01-03 12:00:00',
               '2000-01-03 16:00:00', '2000-01-03 20:00:00'],
              dtype='datetime64[ns]', freq='4H')

Many offsets can be combined together by addition:

In [27]:
Hour(2) + Minute(30)

<150 * Minutes>

Similarly, you can pass frequency strings like '2h30min' which will effectively be parsed
to the same expression:

In [28]:
pd.date_range('1/1/2000', periods=10, freq='1h30min')

DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 01:30:00',
               '2000-01-01 03:00:00', '2000-01-01 04:30:00',
               '2000-01-01 06:00:00', '2000-01-01 07:30:00',
               '2000-01-01 09:00:00', '2000-01-01 10:30:00',
               '2000-01-01 12:00:00', '2000-01-01 13:30:00'],
              dtype='datetime64[ns]', freq='90T')

Some frequencies describe points in time that are not evenly spaced. For example,
'M' (calendar month end) and 'BM' (last business/weekday of month) depend on the
number of days in a month and, in the latter case, whether the month ends on a weekend
or not. For lack of a better term, I call these anchored offsets.

See Table 10-4 for a listing of frequency codes and date offset classes available in pandas.

Table 10-4. Base Time Series Frequencies

Alias Offset Type Description

D Day Calendar daily

B BusinessDay Business daily

H Hour Hourly

T or min Minute Minutely

S Second Secondly

L or ms Milli Millisecond (1/1000th of 1 second)

U Micro Microsecond (1/1000000th of 1 second)

M MonthEnd Last calendar day of month

BM BusinessMonthEnd Last business day (weekday) of month

MS MonthBegin First calendar day of month

BMS BusinessMonthBegin First weekday of month

W-MON, W-TUE, ... Week Weekly on given day of week: MON, TUE, WED, THU, FRI, SAT,
or SUN.

WOM-1MON, WOM-2MON, ... WeekOfMonth Generate weekly dates in the first, second, third, or fourth week
of the month. For example, WOM-3FRI for the 3rd Friday of
each month.

Q-JAN, Q-FEB, ... QuarterEnd Quarterly dates anchored on last calendar day of each month,
for year ending in indicated month: JAN, FEB, MAR, APR, MAY,
JUN, JUL, AUG, SEP, OCT, NOV, or DEC.

BQ-JAN, BQ-FEB, ... BusinessQuarterEnd Quarterly dates anchored on last weekday day of each month,
for year ending in indicated month

QS-JAN, QS-FEB, ... QuarterBegin Quarterly dates anchored on first calendar day of each month,
for year ending in indicated month

BQS-JAN, BQS-FEB, ... BusinessQuarterBegin Quarterly dates anchored on first weekday day of each month,
for year ending in indicated month

A-JAN, A-FEB, ... YearEnd Annual dates anchored on last calendar day of given month:
JAN, FEB, MAR, APR, MAY, JUN, JUL, AUG, SEP, OCT, NOV, or DEC.

BA-JAN, BA-FEB, ... BusinessYearEnd Annual dates anchored on last weekday of given month

AS-JAN, AS-FEB, ... YearBegin Annual dates anchored on first day of given month

BAS-JAN, BAS-FEB, ... BusinessYearBegin Annual dates anchored on first weekday of given month

## Week of month dates

One useful frequency class is “week of month”, starting with WOM. This enables you to
get dates like the third Friday of each month:

In [30]:
rng = pd.date_range('1/1/2012', '9/1/2012', freq='WOM-3FRI')

In [31]:
list(rng)

[Timestamp('2012-01-20 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-02-17 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-03-16 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-04-20 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-05-18 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-06-15 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-07-20 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-08-17 00:00:00', freq='WOM-3FRI')]

Traders of US equity options will recognize these dates as the standard dates of monthly
expiry.

## Shifting (Leading and Lagging) Data
“Shifting” refers to moving data backward and forward through time. Both Series and
DataFrame have a shift method for doing naive shifts forward or backward, leaving
the index unmodified:

In [32]:
ts = Series(np.random.randn(4), index=pd.date_range('1/1/2000', periods=4, freq='M'))

In [33]:
ts

2000-01-31   -1.144800
2000-02-29    0.449459
2000-03-31   -0.212088
2000-04-30    0.329910
Freq: M, dtype: float64

In [34]:
ts.shift(2)

2000-01-31         NaN
2000-02-29         NaN
2000-03-31   -1.144800
2000-04-30    0.449459
Freq: M, dtype: float64

In [35]:
ts.shift(-2)

2000-01-31   -0.212088
2000-02-29    0.329910
2000-03-31         NaN
2000-04-30         NaN
Freq: M, dtype: float64

In [37]:
ts.shift(-4)


2000-01-31   NaN
2000-02-29   NaN
2000-03-31   NaN
2000-04-30   NaN
Freq: M, dtype: float64

A common use of shift is computing percent changes in a time series or multiple time
series as DataFrame columns. This is expressed as

In [38]:
ts / ts.shift(1) - 1

2000-01-31         NaN
2000-02-29   -1.392609
2000-03-31   -1.471874
2000-04-30   -2.555535
Freq: M, dtype: float64

Because naive shifts leave the index unmodified, some data is discarded. Thus if the
frequency is known, it can be passed to shift to advance the timestamps instead of
simply the data:

In [39]:
ts.shift(2, freq='M')

2000-03-31   -1.144800
2000-04-30    0.449459
2000-05-31   -0.212088
2000-06-30    0.329910
Freq: M, dtype: float64

Other frequencies can be passed, too, giving you a lot of flexibility in how to lead and
lag the data:

In [40]:
ts.shift(3, freq='D')

2000-02-03   -1.144800
2000-03-03    0.449459
2000-04-03   -0.212088
2000-05-03    0.329910
dtype: float64

In [41]:
ts.shift(1, freq='3D')

2000-02-03   -1.144800
2000-03-03    0.449459
2000-04-03   -0.212088
2000-05-03    0.329910
dtype: float64

In [42]:
ts.shift(1, freq='90T')

2000-01-31 01:30:00   -1.144800
2000-02-29 01:30:00    0.449459
2000-03-31 01:30:00   -0.212088
2000-04-30 01:30:00    0.329910
Freq: M, dtype: float64

### Shifting dates with offsets

The pandas date offsets can also be used with datetime or Timestamp objects:

In [43]:
from pandas.tseries.offsets import Day, MonthEnd

In [44]:
now = datetime(2011, 11, 17)

In [45]:
now

datetime.datetime(2011, 11, 17, 0, 0)

In [46]:
now + 3 * Day()

Timestamp('2011-11-20 00:00:00')

If you add an anchored offset like MonthEnd, the first increment will roll forward a date
to the next date according to the frequency rule:

In [47]:
now + MonthEnd()

Timestamp('2011-11-30 00:00:00')

In [48]:
now + MonthEnd(2)

Timestamp('2011-12-31 00:00:00')

Anchored offsets can explicitly “roll” dates forward or backward using their rollfor
ward and rollback methods, respectively:

In [49]:
offset = MonthEnd()

In [50]:
offset.rollforward(now)

Timestamp('2011-11-30 00:00:00')

In [51]:
offset.rollback(now)

Timestamp('2011-10-31 00:00:00')

A clever use of date offsets is to use these methods with groupby:

In [52]:
ts = Series(np.random.randn(20),
    index=pd.date_range('1/15/2000', periods=20, freq='4d'))

In [53]:
ts

2000-01-15    1.947874
2000-01-19    1.239176
2000-01-23    0.278788
2000-01-27   -1.645369
2000-01-31   -1.333169
2000-02-04    3.063327
2000-02-08    0.153273
2000-02-12    1.462193
2000-02-16   -0.933951
2000-02-20    0.019431
2000-02-24    1.341127
2000-02-28   -3.151558
2000-03-03   -0.269122
2000-03-07    1.165272
2000-03-11   -1.109055
2000-03-15    2.328336
2000-03-19    0.701602
2000-03-23   -0.396542
2000-03-27    0.547278
2000-03-31   -1.017771
Freq: 4D, dtype: float64

In [54]:
ts.groupby(offset.rollforward).mean()

2000-01-31    0.09746
2000-02-29    0.27912
2000-03-31    0.24375
dtype: float64

Of course, an easier and faster way to do this is using resample (much more on this later):

In [56]:
# ts.resample('M', how='mean')
ts.resample('M').mean()

2000-01-31    0.09746
2000-02-29    0.27912
2000-03-31    0.24375
Freq: M, dtype: float64