In [1]:
import pandas as pd
import numpy as np

# Periods and Period Arithmetic

Hmm... I'm not sure I get the difference between Periods and Frequencies.

In [2]:
p = pd.Period(2007, freq='A-DEC')

In [3]:
p

Period('2007', 'A-DEC')

This represents the full timespan from January 1, 2007 to December 31, 2007 inclusive.

In [4]:
p + 5

Period('2012', 'A-DEC')

In [5]:
p - 2

Period('2005', 'A-DEC')

Arithmetic with two Period objects

In [6]:
pd.Period('2014', freq='A-DEC') - p

7

Ranges of periods

In [7]:
rng = pd.period_range('2000-01-01', '2000-06-30', freq='M')
rng

PeriodIndex(['2000-01', '2000-02', '2000-03', '2000-04', '2000-05', '2000-06'], dtype='period[M]', freq='M')

`PeriodIndex`

In [10]:
ts = pd.Series(
    np.random.randn(6),
    index=rng
)
ts

2000-01   -0.069179
2000-02   -0.129115
2000-03    0.805275
2000-04   -1.018247
2000-05   -1.503801
2000-06   -1.338446
Freq: M, dtype: float64

In [11]:
ts.index.dtype

period[M]

You can use the `PeriodIndex` class with strings.

In [12]:
values = ['2001Q3', '2002Q2', '2003Q1']
index = pd.PeriodIndex(values, freq='Q-DEC')
index

PeriodIndex(['2001Q3', '2002Q2', '2003Q1'], dtype='period[Q-DEC]', freq='Q-DEC')

In [13]:
index[0]

Period('2001Q3', 'Q-DEC')

## Period Frequency Conversion

I'm still not sure why we need this when we have `pd.date_range` and `resample`

In [14]:
p = pd.Period('2007', freq='A-DEC')

In [15]:
p

Period('2007', 'A-DEC')

In [16]:
p.asfreq('M', how='start')

Period('2007-01', 'M')

In [17]:
p.asfreq('M', how='end')

Period('2007-12', 'M')

And from sub to super frequency

In [18]:
p = pd.Period('Aug-2007', 'M')
p.asfreq('A-JUN')

Period('2008', 'A-JUN')

I guess maybe for calculating fiscal years? Maybe I need this but maybe I don't. We have fiscal year information in our database. However, if I need to do calculations based on that and I only have the dates... this could come in handy.

In [22]:
rng = pd.period_range('2006', '2009', freq='A-JUN')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts

2006   -1.120518
2007   -0.667068
2008    0.039423
2009   -0.359992
Freq: A-JUN, dtype: float64

The frequency for annual stuff takes the input as the ending month. So for `A-JUN`:

In [23]:
ts.asfreq('M', how='start')

2005-07   -1.120518
2006-07   -0.667068
2007-07    0.039423
2008-07   -0.359992
Freq: M, dtype: float64

In [24]:
ts.asfreq('M', how='end')

2006-06   -1.120518
2007-06   -0.667068
2008-06    0.039423
2009-06   -0.359992
Freq: M, dtype: float64

## Quarterly period frequencies

In [25]:
p = pd.Period('2012Q4', freq='Q-JAN')
p

Period('2012Q4', 'Q-JAN')

In [26]:
p.asfreq('D', 'start')

Period('2011-11-01', 'D')

In [27]:
p.asfreq('D', 'end')

Period('2012-01-31', 'D')

Some period arithmetic. Here, 4pm on the second-to-last business day of the quarter

In [32]:
p4pm = (p.asfreq('B', 'e') - 1).asfreq('T', 's') + 16 * 60
p4pm

Period('2012-01-30 16:00', 'T')

In [33]:
p4pm.to_timestamp()

Timestamp('2012-01-30 16:00:00')

Quarterly ranges and quarterly arithmetic is also possible

In [34]:
rng = pd.period_range('2011Q3', '2012Q4', freq='Q-JAN')
ts = pd.Series(np.arange(len(rng)), index=rng)
ts

2011Q3    0
2011Q4    1
2012Q1    2
2012Q2    3
2012Q3    4
2012Q4    5
Freq: Q-JAN, dtype: int64

In [35]:
new_rng = (rng.asfreq('B', 'e') - 1).asfreq('T', 's') + 16 * 60

In [36]:
ts.index = new_rng.to_timestamp()

In [37]:
ts

2010-10-28 16:00:00    0
2011-01-28 16:00:00    1
2011-04-28 16:00:00    2
2011-07-28 16:00:00    3
2011-10-28 16:00:00    4
2012-01-30 16:00:00    5
dtype: int64

## Converting Timestamps to Periods and back

In [38]:
#...