In [1]:
from pandas import DataFrame, Series
import pandas as pd
import sys
import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline


from datetime import datetime
from datetime import timedelta
from dateutil.parser import parse

# Periods and Period Arithmetic

Periods represent time spans, like days, months, quarters, or years. The Period class
represents this data type, requiring a string or integer and a frequency from the above
table:

In [2]:
p = pd.Period(2007, freq='A-DEC')

In [3]:
p

Period('2007', 'A-DEC')

In this case, the Period object represents the full timespan from January 1, 2007 to
December 31, 2007, inclusive. Conveniently, adding and subtracting integers from periods
has the effect of shifting by their frequency:

In [4]:
p + 5

Period('2012', 'A-DEC')

In [5]:
p - 2

Period('2005', 'A-DEC')

If two periods have the same frequency, their difference is the number of units between
them:

In [6]:
pd.Period('2014', freq='A-DEC') - p

7

Regular ranges of periods can be constructed using the period_range function:

In [8]:
rng = pd.period_range('1/1/2000', '6/30/2000', freq='M')

In [9]:
rng

PeriodIndex(['2000-01', '2000-02', '2000-03', '2000-04', '2000-05', '2000-06'], dtype='period[M]', freq='M')

The PeriodIndex class stores a sequence of periods and can serve as an axis index in
any pandas data structure:

In [10]:
Series(np.random.randn(6), index=rng)

2000-01    1.840207
2000-02    0.010933
2000-03    0.360663
2000-04    0.054145
2000-05    1.581837
2000-06   -0.676970
Freq: M, dtype: float64

If you have an array of strings, you can also appeal to the PeriodIndex class itself:

In [11]:
values = ['2001Q3', '2002Q2', '2003Q1']

In [12]:
index = pd.PeriodIndex(values, freq='Q-DEC')

In [13]:
index

PeriodIndex(['2001Q3', '2002Q2', '2003Q1'], dtype='period[Q-DEC]', freq='Q-DEC')

## Period Frequency Conversion

Periods and PeriodIndex objects can be converted to another frequency using their
asfreq method. As an example, suppose we had an annual period and wanted to convert
it into a monthly period either at the start or end of the year. This is fairly straightforward:

In [14]:
p = pd.Period('2007', freq='A-DEC')

In [15]:
p.asfreq('M', how='start')

Period('2007-01', 'M')

In [16]:
p.asfreq('M', how='end')

Period('2007-12', 'M')

You can think of Period('2007', 'A-DEC') as being a cursor pointing to a span of time,
subdivided by monthly periods. See Figure 10-1 for an illustration of this. For a fiscal
year ending on a month other than December, the monthly subperiods belonging are
different:

In [17]:
p = pd.Period('2007', freq='A-JUN')

In [18]:
p.asfreq('M', 'start')

Period('2006-07', 'M')

In [19]:
p.asfreq('M', 'end')

Period('2007-06', 'M')

When converting from high to low frequency, the superperiod will be determined depending
on where the subperiod “belongs”. For example, in A-JUN frequency, the month
Aug-2007 is actually part of the 2008 period:

In [20]:
p = pd.Period('2007-08', 'M')

In [21]:
p.asfreq('A-JUN')

Period('2008', 'A-JUN')

Whole PeriodIndex objects or TimeSeries can be similarly converted with the same
semantics:

In [22]:
rng = pd.period_range('2006', '2009', freq='A-DEC')

In [23]:
ts = Series(np.random.randn(len(rng)), index=rng)

In [24]:
ts

2006   -0.806311
2007   -0.126915
2008    0.350888
2009    0.928792
Freq: A-DEC, dtype: float64

In [25]:
ts.asfreq('M', how='start')

2006-01   -0.806311
2007-01   -0.126915
2008-01    0.350888
2009-01    0.928792
Freq: M, dtype: float64

In [26]:
ts.asfreq('B', how='end')

2006-12-29   -0.806311
2007-12-31   -0.126915
2008-12-31    0.350888
2009-12-31    0.928792
Freq: B, dtype: float64

## Quarterly Period Frequencies

Quarterly data is standard in accounting, finance, and other fields. Much quarterly data
is reported relative to a fiscal year end, typically the last calendar or business day of one
of the 12 months of the year. As such, the period 2012Q4 has a different meaning depending
on fiscal year end. pandas supports all 12 possible quarterly frequencies as QJAN
through Q-DEC:

In [27]:
p = pd.Period('2012Q4', freq='Q-JAN')

In [28]:
p

Period('2012Q4', 'Q-JAN')

In the case of fiscal year ending in January, 2012Q4 runs from November through January,
which you can check by converting to daily frequency. See Figure 10-2 for an
illustration:


In [29]:
p.asfreq('D', 'start')

Period('2011-11-01', 'D')

In [30]:
p.asfreq('D', 'end')

Period('2012-01-31', 'D')

Thus, it’s possible to do period arithmetic very easily; for example, to get the timestamp
at 4PM on the 2nd to last business day of the quarter, you could do:

In [31]:
p4pm = (p.asfreq('B', 'e') - 1).asfreq('T', 's') + 16 * 60

In [32]:
p4pm

Period('2012-01-30 16:00', 'T')

In [33]:
p4pm.to_timestamp()


Timestamp('2012-01-30 16:00:00')

Generating quarterly ranges works as you would expect using period_range. Arithmetic
is identical, too:

In [34]:
rng = pd.period_range('2011Q3', '2012Q4', freq='Q-JAN')

In [35]:
ts = Series(np.arange(len(rng)), index=rng)

In [36]:
ts

2011Q3    0
2011Q4    1
2012Q1    2
2012Q2    3
2012Q3    4
2012Q4    5
Freq: Q-JAN, dtype: int64

In [37]:
new_rng = (rng.asfreq('B', 'e') - 1).asfreq('T', 's') + 16 * 60

In [38]:
ts.index = new_rng.to_timestamp()

In [39]:
ts

2010-10-28 16:00:00    0
2011-01-28 16:00:00    1
2011-04-28 16:00:00    2
2011-07-28 16:00:00    3
2011-10-28 16:00:00    4
2012-01-30 16:00:00    5
dtype: int64

## Converting Timestamps to Periods (and Back)
Series and DataFrame objects indexed by timestamps can be converted to periods using
the to_period method:

In [40]:
rng = pd.date_range('1/1/2000', periods=3, freq='M')

In [43]:
ts = Series(np.random.randn(3), index=rng)