Periods represent timespans, like days, months, quarters, or years. The Period class
represents this data type, requiring a string or integer and a frequency from
Table 11-4:

In [1]:
import pandas as pd
import numpy as np


In [4]:
p = pd.Period(2007, freq='A-DEC')
p

Period('2007', 'A-DEC')

In this case, the Period object represents the full timespan from January 1, 2007, to
December 31, 2007, inclusive. Conveniently, adding and subtracting integers from
periods has the effect of shifting by their frequency:

In [5]:
p + 5

Period('2012', 'A-DEC')

In [6]:
p - 2

Period('2005', 'A-DEC')

In [7]:
# If two periods have the same frequency, their difference is the number of units between them:
pd.Period('2014', freq='A-DEC') - p

<7 * YearEnds: month=12>

In [8]:
# Regular ranges of periods can be constructed with the period_range function:
rng = pd.period_range('2000-01-01', '2000-06-30', freq='M')
rng

PeriodIndex(['2000-01', '2000-02', '2000-03', '2000-04', '2000-05', '2000-06'], dtype='period[M]')

In [9]:
# The PeriodIndex class stores a sequence of periods and can serve as an axis index in
# any pandas data structure:
pd.Series(np.random.randn(6), index=rng)

2000-01    1.187625
2000-02   -1.222827
2000-03    3.127637
2000-04    0.787580
2000-05   -0.192393
2000-06    0.379393
Freq: M, dtype: float64

In [10]:
# If you have an array of strings, you can also use the PeriodIndex class:
values = ['2001Q3', '2002Q2', '2003Q1']
index = pd.PeriodIndex(values, freq='Q-DEC')
index

PeriodIndex(['2001Q3', '2002Q2', '2003Q1'], dtype='period[Q-DEC]')

### Period Frequency Conversion

Periods and PeriodIndex objects can be converted to another frequency with their asfreq method. As an example, suppose we had an annual period and wanted to convert it into a monthly period either at the start or end of the year. This is fairly
straightforward:

In [11]:
p = pd.Period('2007', freq='A-DEC')
p

Period('2007', 'A-DEC')

In [12]:
p.asfreq('M', how='start')

Period('2007-01', 'M')

In [13]:
p.asfreq('M', how='end')

Period('2007-12', 'M')

You can think of Period('2007', 'A-DEC') as being a sort of cursor pointing to a
span of time, subdivided by monthly periods. See Figure 11-1 for an illustration of
this. For a fiscal year ending on a month other than December, the corresponding
monthly subperiods are different:

In [14]:
p = pd.Period('2007', freq='A-JUN')
p

Period('2007', 'A-JUN')

In [15]:
p.asfreq('M', 'start')

Period('2006-07', 'M')

In [16]:
p.asfreq('M', 'end')

Period('2007-06', 'M')

When you are converting from high to low frequency, pandas determines the super‐
period depending on where the subperiod “belongs.” For example, in A-JUN fre‐
quency, the month Aug-2007 is actually part of the 2008 period:

In [17]:
p = pd.Period('Aug-2007', 'M')
p.asfreq('A-JUN')

Period('2008', 'A-JUN')

Whole PeriodIndex objects or time series can be similarly converted with the same
semantics:

In [18]:
rng = pd.period_range('2006', '2009', freq='A-DEC')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts

2006   -1.119508
2007   -0.575157
2008   -0.950641
2009   -0.701734
Freq: A-DEC, dtype: float64

In [19]:
ts.asfreq('M', how='start')

2006-01   -1.119508
2007-01   -0.575157
2008-01   -0.950641
2009-01   -0.701734
Freq: M, dtype: float64

Here, the annual periods are replaced with monthly periods corresponding to the first
month falling within each annual period. If we instead wanted the last business day of
each year, we can use the 'B' frequency and indicate that we want the end of the
period:

In [20]:
ts.asfreq('B', how='end')

2006-12-29   -1.119508
2007-12-31   -0.575157
2008-12-31   -0.950641
2009-12-31   -0.701734
Freq: B, dtype: float64

### Quarterly Period Frequencies

Quarterly data is standard in accounting, finance, and other fields. Much quarterly
data is reported relative to a fiscal year end, typically the last calendar or business day
of one of the 12 months of the year. Thus, the period 2012Q4 has a different meaning
depending on fiscal year end. pandas supports all 12 possible quarterly frequencies as
Q-JAN through Q-DEC:

In [21]:
p = pd.Period('2012Q4', freq='Q-JAN')
p

Period('2012Q4', 'Q-JAN')

In the case of fiscal year ending in January, 2012Q4 runs from November through Jan‐
uary, which you can check by converting to daily frequency. See Figure 11-2 for an
illustration.

In [22]:
p.asfreq('D', 'start')

Period('2011-11-01', 'D')

In [23]:
p.asfreq('D', 'end')

Period('2012-01-31', 'D')

Thus, it’s possible to do easy period arithmetic; for example, to get the timestamp at 4
PM on the second-to-last business day of the quarter, you could do:

In [25]:
p4pm = (p.asfreq('B', 'e') - 1).asfreq('T', 's') + 16 * 60
p4pm

Period('2012-01-30 16:00', 'T')

In [26]:
p4pm.to_timestamp()

Timestamp('2012-01-30 16:00:00')

In [27]:
# You can generate quarterly ranges using period_range. Arithmetic is identical, too:
rng = pd.period_range('2011Q3', '2012Q4', freq='Q-JAN')
ts = pd.Series(np.arange(len(rng)), index=rng)
ts

2011Q3    0
2011Q4    1
2012Q1    2
2012Q2    3
2012Q3    4
2012Q4    5
Freq: Q-JAN, dtype: int32

In [28]:
new_rng = (rng.asfreq('B', 'e') - 1).asfreq('T', 's') + 16 * 60
ts.index = new_rng.to_timestamp()
ts

2010-10-28 16:00:00    0
2011-01-28 16:00:00    1
2011-04-28 16:00:00    2
2011-07-28 16:00:00    3
2011-10-28 16:00:00    4
2012-01-30 16:00:00    5
dtype: int32

### Converting Timestamps to Periods (and Back)

Series and DataFrame objects indexed by timestamps can be converted to periods
with the to_period method:

In [29]:
rng = pd.date_range('2000-01-01', periods=3, freq='M')
ts = pd.Series(np.random.randn(3), index=rng)
ts

2000-01-31    0.245105
2000-02-29   -0.663008
2000-03-31   -0.157637
Freq: M, dtype: float64

In [30]:
pts = ts.to_period()
pts

2000-01    0.245105
2000-02   -0.663008
2000-03   -0.157637
Freq: M, dtype: float64

Since periods refer to non-overlapping timespans, a timestamp can only belong to a
single period for a given frequency. While the frequency of the new PeriodIndex is
inferred from the timestamps by default, you can specify any frequency you want.
There is also no problem with having duplicate periods in the result:

In [31]:
rng = pd.date_range('1/29/2000', periods=6, freq='D')
ts2 = pd.Series(np.random.randn(6), index=rng)
ts2

2000-01-29   -0.310790
2000-01-30    1.880413
2000-01-31   -0.052069
2000-02-01   -0.152303
2000-02-02   -1.335142
2000-02-03   -0.455967
Freq: D, dtype: float64

In [32]:
ts2.to_period('M')

2000-01   -0.310790
2000-01    1.880413
2000-01   -0.052069
2000-02   -0.152303
2000-02   -1.335142
2000-02   -0.455967
Freq: M, dtype: float64

To convert back to timestamps, use to_timestamp:

In [33]:
pts = ts2.to_period()
pts

2000-01-29   -0.310790
2000-01-30    1.880413
2000-01-31   -0.052069
2000-02-01   -0.152303
2000-02-02   -1.335142
2000-02-03   -0.455967
Freq: D, dtype: float64

In [34]:
pts.to_timestamp(how='end')

2000-01-29 23:59:59.999999999   -0.310790
2000-01-30 23:59:59.999999999    1.880413
2000-01-31 23:59:59.999999999   -0.052069
2000-02-01 23:59:59.999999999   -0.152303
2000-02-02 23:59:59.999999999   -1.335142
2000-02-03 23:59:59.999999999   -0.455967
Freq: D, dtype: float64

### Creating a PeriodIndex from Arrays

Fixed frequency datasets are sometimes stored with timespan information spread
across multiple columns. For example, in this macroeconomic dataset, the year and
quarter are in different columns:

In [36]:
data = pd.read_csv('macrodata.csv')
data.head(5)

Unnamed: 0,year,quarter,realgdp,realcons,realinv,realgovt,realdpi,cpi,m1,tbilrate,unemp,pop,infl,realint
0,1959.0,1.0,2710.349,1707.4,286.898,470.045,1886.9,28.98,139.7,2.82,5.8,177.146,0.0,0.0
1,1959.0,2.0,2778.801,1733.7,310.859,481.301,1919.7,29.15,141.7,3.08,5.1,177.83,2.34,0.74
2,1959.0,3.0,2775.488,1751.8,289.226,491.26,1916.4,29.35,140.5,3.82,5.3,178.657,2.74,1.09
3,1959.0,4.0,2785.204,1753.7,299.356,484.052,1931.3,29.37,140.0,4.33,5.6,179.386,0.27,4.06
4,1960.0,1.0,2847.699,1770.5,331.722,462.199,1955.5,29.54,139.6,3.5,5.2,180.007,2.31,1.19


In [37]:
data.year

0      1959.0
1      1959.0
2      1959.0
3      1959.0
4      1960.0
        ...  
198    2008.0
199    2008.0
200    2009.0
201    2009.0
202    2009.0
Name: year, Length: 203, dtype: float64

In [38]:
data.quarter

0      1.0
1      2.0
2      3.0
3      4.0
4      1.0
      ... 
198    3.0
199    4.0
200    1.0
201    2.0
202    3.0
Name: quarter, Length: 203, dtype: float64

In [39]:
# By passing these arrays to PeriodIndex with a frequency, you can combine them to
# form an index for the DataFrame:
index = pd.PeriodIndex(year=data.year, quarter=data.quarter,freq='Q-DEC')
index

PeriodIndex(['1959Q1', '1959Q2', '1959Q3', '1959Q4', '1960Q1', '1960Q2',
             '1960Q3', '1960Q4', '1961Q1', '1961Q2',
             ...
             '2007Q2', '2007Q3', '2007Q4', '2008Q1', '2008Q2', '2008Q3',
             '2008Q4', '2009Q1', '2009Q2', '2009Q3'],
            dtype='period[Q-DEC]', length=203)

In [40]:
data.index = index
data.infl

1959Q1    0.00
1959Q2    2.34
1959Q3    2.74
1959Q4    0.27
1960Q1    2.31
          ... 
2008Q3   -3.16
2008Q4   -8.79
2009Q1    0.94
2009Q2    3.37
2009Q3    3.56
Freq: Q-DEC, Name: infl, Length: 203, dtype: float64