In [2]:
import pandas as pd
import numpy  as np
from datetime import datetime

In [3]:
dates = [datetime(2011, 1, 2), datetime(2011, 1, 5),
         datetime(2011, 1, 7), datetime(2011, 1, 8),
         datetime(2011, 1, 10), datetime(2011, 1, 12)]
ts = pd.Series(np.random.randn(6), index=dates)
ts

2011-01-02   -2.050288
2011-01-05    1.679805
2011-01-07    0.826978
2011-01-08   -0.307938
2011-01-10   -0.881782
2011-01-12    0.819613
dtype: float64

Generic time series in pandas are assumed to be irregular; that is, they have no fixed
frequency. For many applications this is sufficient. However, it’s often desirable to
work relative to a fixed frequency, such as daily, monthly, or every 15 minutes, even if
that means introducing missing values into a time series. Fortunately pandas has a
full suite of standard time series frequencies and tools for resampling, inferring fre‐
quencies, and generating fixed-frequency date ranges. For example, you can convert
the sample time series to be fixed daily frequency by calling resample:

In [5]:
resampler = ts.resample('D')
resampler
# The string 'D' is interpreted as daily frequency.

<pandas.core.resample.DatetimeIndexResampler object at 0x000001FE3286F490>

### Generating Date Ranges

While I used it previously without explanation, pandas.date_range is responsible for
generating a DatetimeIndex with an indicated length according to a particular
frequency:

In [7]:
index = pd.date_range('2012-04-01', '2012-06-01')
index

DatetimeIndex(['2012-04-01', '2012-04-02', '2012-04-03', '2012-04-04',
               '2012-04-05', '2012-04-06', '2012-04-07', '2012-04-08',
               '2012-04-09', '2012-04-10', '2012-04-11', '2012-04-12',
               '2012-04-13', '2012-04-14', '2012-04-15', '2012-04-16',
               '2012-04-17', '2012-04-18', '2012-04-19', '2012-04-20',
               '2012-04-21', '2012-04-22', '2012-04-23', '2012-04-24',
               '2012-04-25', '2012-04-26', '2012-04-27', '2012-04-28',
               '2012-04-29', '2012-04-30', '2012-05-01', '2012-05-02',
               '2012-05-03', '2012-05-04', '2012-05-05', '2012-05-06',
               '2012-05-07', '2012-05-08', '2012-05-09', '2012-05-10',
               '2012-05-11', '2012-05-12', '2012-05-13', '2012-05-14',
               '2012-05-15', '2012-05-16', '2012-05-17', '2012-05-18',
               '2012-05-19', '2012-05-20', '2012-05-21', '2012-05-22',
               '2012-05-23', '2012-05-24', '2012-05-25', '2012-05-26',
      

By default, date_range generates daily timestamps. If you pass only a start or end
date, you must pass a number of periods to generate:

In [8]:
pd.date_range(start='2012-04-01', periods=20)

DatetimeIndex(['2012-04-01', '2012-04-02', '2012-04-03', '2012-04-04',
               '2012-04-05', '2012-04-06', '2012-04-07', '2012-04-08',
               '2012-04-09', '2012-04-10', '2012-04-11', '2012-04-12',
               '2012-04-13', '2012-04-14', '2012-04-15', '2012-04-16',
               '2012-04-17', '2012-04-18', '2012-04-19', '2012-04-20'],
              dtype='datetime64[ns]', freq='D')

In [9]:
pd.date_range(end='2012-06-01', periods=20)

DatetimeIndex(['2012-05-13', '2012-05-14', '2012-05-15', '2012-05-16',
               '2012-05-17', '2012-05-18', '2012-05-19', '2012-05-20',
               '2012-05-21', '2012-05-22', '2012-05-23', '2012-05-24',
               '2012-05-25', '2012-05-26', '2012-05-27', '2012-05-28',
               '2012-05-29', '2012-05-30', '2012-05-31', '2012-06-01'],
              dtype='datetime64[ns]', freq='D')

The start and end dates define strict boundaries for the generated date index. For example, if you wanted a date index containing the last business day of each month, you would pass the 'BM' frequency (business end of month; see more complete listing of frequencies in Table 11-4) and only dates falling on or inside the date interval will be included:

In [10]:
pd.date_range('2000-01-01', '2000-12-01', freq='BM')

DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31', '2000-04-28',
               '2000-05-31', '2000-06-30', '2000-07-31', '2000-08-31',
               '2000-09-29', '2000-10-31', '2000-11-30'],
              dtype='datetime64[ns]', freq='BM')

Alias                   Offset type         Description

D                       Day                 Calendar daily

B                       BusinessDay         Business daily

H                       Hour                Hourly

T or min                Minute              Minutely

S                       Second              Secondly

L or ms                 Milli               Millisecond (1/1,000 of 1 second)

U                       Micro               Microsecond (1/1,000,000 of 1 second)

M                       MonthEnd            Last calendar day of month

BM                      BusinessMonthEnd    Last business day (weekday) of month

MS                      MonthBegin          First calendar day of month

BMS                     BusinessMonthBegin  First weekday of month

W-MON, W-TUE,..         Week                Weekly on given day of week (MON, TUE, WED, THU,FRI, SAT, or SUN)

WOM-1MON, WOM-2MON,..   WeekOfMonth         Generate weekly dates in the first, second, third, or fourth 
                                            week of the month (e.g., WOM-3FRI for the
                                            third Friday of each month)

Q-JAN, Q-FEB, ...       QuarterEnd          Quarterly dates anchored on last calendar day of each
                                            month, for year ending in indicated month (JAN, FEB,
                                            MAR, APR, MAY, JUN, JUL, AUG, SEP, OCT, NOV, or DEC)
                                            
BQ-JAN, BQ-FEB, ...     BusinessQuarterEnd  Quarterly dates anchored on last weekday day of each
                                            month, for year ending in indicated month
                                            
QS-JAN, QS-FEB, ...     QuarterBegin        Quarterly dates anchored on first calendar day of each
                                            month, for year ending in indicated month

BQS-JAN, BQS-FEB, ...   BusinessQuarterBegin Quarterly dates anchored on first weekday day of each
                                            month, for year ending in indicated month

A-JAN, A-FEB, ...       YearEnd             Annual dates anchored on last calendar day of given
                                            month (JAN, FEB, MAR, APR, MAY, JUN, JUL, AUG, SEP,
                                            OCT, NOV, or DEC)

BA-JAN, BA-FEB, ...     BusinessYearEnd     Annual dates anchored on last weekday of given month

AS-JAN, AS-FEB, ...     YearBegin           Annual dates anchored on first day of given month

BAS-JAN, BAS-FEB, ...   BusinessYearBegin   Annual dates anchored on first weekday of given month

date_range by default preserves the time (if any) of the start or end timestamp:

In [11]:
pd.date_range('2012-05-02 12:56:31', periods=5)

DatetimeIndex(['2012-05-02 12:56:31', '2012-05-03 12:56:31',
               '2012-05-04 12:56:31', '2012-05-05 12:56:31',
               '2012-05-06 12:56:31'],
              dtype='datetime64[ns]', freq='D')

Sometimes you will have start or end dates with time information but want to gener‐
ate a set of timestamps normalized to midnight as a convention. To do this, there is a
normalize option:

In [12]:
pd.date_range('2012-05-02 12:56:31', periods=5, normalize=True)

DatetimeIndex(['2012-05-02', '2012-05-03', '2012-05-04', '2012-05-05',
               '2012-05-06'],
              dtype='datetime64[ns]', freq='D')

### Frequencies and Date Offsets

Frequencies in pandas are composed of a base frequency and a multiplier. Base fre‐
quencies are typically referred to by a string alias, like 'M' for monthly or 'H' for
hourly. For each base frequency, there is an object defined generally referred to as a
date offset. For example, hourly frequency can be represented with the Hour class:

In [13]:
from pandas.tseries.offsets import Hour, Minute
hour = Hour()
hour

<Hour>

In [14]:
# You can define a multiple of an offset by passing an integer:
four_hours = Hour(4)
four_hours

<4 * Hours>

In most applications, you would never need to explicitly create one of these objects,
instead using a string alias like 'H' or '4H'. Putting an integer before the base fre‐
quency creates a multiple:

In [15]:
pd.date_range('2000-01-01', '2000-01-03 23:59', freq='4h')

DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 04:00:00',
               '2000-01-01 08:00:00', '2000-01-01 12:00:00',
               '2000-01-01 16:00:00', '2000-01-01 20:00:00',
               '2000-01-02 00:00:00', '2000-01-02 04:00:00',
               '2000-01-02 08:00:00', '2000-01-02 12:00:00',
               '2000-01-02 16:00:00', '2000-01-02 20:00:00',
               '2000-01-03 00:00:00', '2000-01-03 04:00:00',
               '2000-01-03 08:00:00', '2000-01-03 12:00:00',
               '2000-01-03 16:00:00', '2000-01-03 20:00:00'],
              dtype='datetime64[ns]', freq='4H')

In [16]:
# Many offsets can be combined together by addition:
Hour(2) + Minute(30)

<150 * Minutes>

In [17]:
# Similarly, you can pass frequency strings, like '1h30min', that will effectively be
# parsed to the same expression:
pd.date_range('2000-01-01', periods=10, freq='1h30min')

DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 01:30:00',
               '2000-01-01 03:00:00', '2000-01-01 04:30:00',
               '2000-01-01 06:00:00', '2000-01-01 07:30:00',
               '2000-01-01 09:00:00', '2000-01-01 10:30:00',
               '2000-01-01 12:00:00', '2000-01-01 13:30:00'],
              dtype='datetime64[ns]', freq='90T')

Some frequencies describe points in time that are not evenly spaced. For example, 'M'
(calendar month end) and 'BM' (last business/weekday of month) depend on the
number of days in a month and, in the latter case, whether the month ends on a
weekend or not. We refer to these as anchored offsets.

Refer back to Table 11-4 for a listing of frequency codes and date offset classes avail‐
able in pandas.

Users can define their own custom frequency classes to provide
date logic not available in pandas, though the full details of that are
outside the scope of this book.

__Week of month dates__

In [18]:
rng = pd.date_range('2012-01-01', '2012-09-01', freq='WOM-3FRI')
list(rng)

[Timestamp('2012-01-20 00:00:00'),
 Timestamp('2012-02-17 00:00:00'),
 Timestamp('2012-03-16 00:00:00'),
 Timestamp('2012-04-20 00:00:00'),
 Timestamp('2012-05-18 00:00:00'),
 Timestamp('2012-06-15 00:00:00'),
 Timestamp('2012-07-20 00:00:00'),
 Timestamp('2012-08-17 00:00:00')]

### Shifting (Leading and Lagging) Data

“Shifting” refers to moving data backward and forward through time. Both Series and
DataFrame have a shift method for doing naive shifts forward or backward, leaving
the index unmodified:

In [19]:
ts = pd.Series(np.random.randn(4),
               index=pd.date_range('1/1/2000', periods=4, freq='M'))
ts

2000-01-31   -0.895788
2000-02-29    0.087142
2000-03-31    0.252604
2000-04-30    0.411066
Freq: M, dtype: float64

In [20]:
ts.shift(2)

2000-01-31         NaN
2000-02-29         NaN
2000-03-31   -0.895788
2000-04-30    0.087142
Freq: M, dtype: float64

In [21]:
ts.shift(-2)

2000-01-31    0.252604
2000-02-29    0.411066
2000-03-31         NaN
2000-04-30         NaN
Freq: M, dtype: float64

When we shift like this, missing data is introduced either at the start or the end of the
time series.

In [22]:
# A common use of shift is computing percent changes in a time series or multiple
# time series as DataFrame columns. This is expressed as:

ts / ts.shift(1) - 1

2000-01-31         NaN
2000-02-29   -1.097280
2000-03-31    1.898765
2000-04-30    0.627314
Freq: M, dtype: float64

Because naive shifts leave the index unmodified, some data is discarded. Thus if the
frequency is known, it can be passed to shift to advance the timestamps instead of
simply the data:

In [23]:
ts.shift(2, freq='M')

2000-03-31   -0.895788
2000-04-30    0.087142
2000-05-31    0.252604
2000-06-30    0.411066
Freq: M, dtype: float64

In [24]:
# Other frequencies can be passed, too, giving you some 
# flexibility in how to lead and lag the data:
ts.shift(3, freq='D')

2000-02-03   -0.895788
2000-03-03    0.087142
2000-04-03    0.252604
2000-05-03    0.411066
dtype: float64

In [25]:
ts.shift(1, freq='90T')
# The T here stands for minutes.

2000-01-31 01:30:00   -0.895788
2000-02-29 01:30:00    0.087142
2000-03-31 01:30:00    0.252604
2000-04-30 01:30:00    0.411066
dtype: float64

__Shifting dates with offsets__

In [26]:
from pandas.tseries.offsets import Day, MonthEnd
now = datetime(2011, 11, 17)
now + 3 * Day()

Timestamp('2011-11-20 00:00:00')

If you add an anchored offset like MonthEnd, the first increment will “roll forward” a date to the next date according to the frequency rule:

In [27]:
now + MonthEnd()

Timestamp('2011-11-30 00:00:00')

In [28]:
now + MonthEnd(2)

Timestamp('2011-12-31 00:00:00')

Anchored offsets can explicitly “roll” dates forward or backward by simply using their
rollforward and rollback methods, respectively:

In [29]:
offset = MonthEnd()
offset.rollforward(now)

Timestamp('2011-11-30 00:00:00')

In [30]:
offset.rollback(now)

Timestamp('2011-10-31 00:00:00')

In [31]:
# A creative use of date offsets is to use these methods with groupby:
ts = pd.Series(np.random.randn(20),
               index=pd.date_range('1/15/2000', periods=20, freq='4d'))
ts

2000-01-15   -1.604621
2000-01-19    1.242531
2000-01-23   -0.222879
2000-01-27    0.651169
2000-01-31   -0.250334
2000-02-04    0.712936
2000-02-08   -0.620674
2000-02-12   -1.784491
2000-02-16    1.121898
2000-02-20   -2.023921
2000-02-24    1.004744
2000-02-28   -0.698710
2000-03-03   -0.205526
2000-03-07    1.071276
2000-03-11    0.067844
2000-03-15    0.425994
2000-03-19    0.493810
2000-03-23   -0.979779
2000-03-27   -0.569710
2000-03-31   -0.194801
Freq: 4D, dtype: float64

In [32]:
ts.groupby(offset.rollforward).mean()

2000-01-31   -0.036827
2000-02-29   -0.326888
2000-03-31    0.013638
dtype: float64

In [33]:
# Of course, an easier and faster way to do this is using resample
ts.resample('M').mean()

2000-01-31   -0.036827
2000-02-29   -0.326888
2000-03-31    0.013638
Freq: M, dtype: float64