#### Periods
- it represents time spans, like days, months, quarters, or years.
- The `pandas.Period` class represents this data type, requiring a string or integer and a suported frequency

In [2]:
import numpy as np 
import pandas as pd 

p = pd.Period("2011", freq="A-DEC")
p
# in this case the period object represents the full time span from January 1, 2011, to December 31, 2011, inclusive.


  p = pd.Period("2011", freq="A-DEC")


Period('2011', 'Y-DEC')

In [3]:
# Conveniently, adding and subtracting integers from periods has the effect of shifting their frequency.
p + 5

Period('2016', 'Y-DEC')

In [4]:
p - 2

Period('2009', 'Y-DEC')

In [5]:
# if two periods have the same frequency, their difference is the number of units between them as a date offset
pd.Period("2024", freq="Y-DEC") - p

<13 * YearEnds: month=12>

In [6]:
# regular ranges of periods can be constructed with the period_range function
periods = pd.period_range("2000-01-01", "2000-06-30", freq="M")
periods

PeriodIndex(['2000-01', '2000-02', '2000-03', '2000-04', '2000-05', '2000-06'], dtype='period[M]')

In [7]:
# the PeriodIndex class stores a sequence of periods and can serve as an axis index in any pandas data structure

pd.Series(np.random.standard_normal(6), index=periods)

2000-01    0.417138
2000-02    0.938912
2000-03    1.081512
2000-04   -0.509033
2000-05   -1.036323
2000-06    0.849054
Freq: M, dtype: float64

In [8]:
# if you have any array of strings, you can also use the PeriodIndex class, where all of its values are periods

values = ["2001Q3", "2002Q2", "2003Q1"]
index = pd.PeriodIndex(values, freq="Q-DEC")

index

PeriodIndex(['2001Q3', '2002Q2', '2003Q1'], dtype='period[Q-DEC]')

## [ Period Frequency Conversion ]
Periods and PeriodIndex objects can be converted to another frequency with their asfreq method.

In [10]:
# As example, suppose we had an annual period and wanted to convert it into a monthly period either at the start or end of the year.
# this can be done like so

p = pd.Period("2011", freq="Y-DEC")
p

Period('2011', 'Y-DEC')

In [11]:
p.asfreq("M", how="start")

Period('2011-01', 'M')

In [13]:
p.asfreq("M", how="end")

Period('2011-12', 'M')

In [14]:
p.asfreq("M")

Period('2011-12', 'M')

- You can think of Period("2011", "Y-DEC") as being a sort of cursor pointing to a pan of time, subdivided by monthly periods. 
- For a fiscal year ending on a month other than December, the corresponding monthly subperiods are different:

In [16]:
p = pd.Period("2011", freq="Y-JUN")
p

Period('2011', 'Y-JUN')

In [17]:
p.asfreq("M", how="start")

Period('2010-07', 'M')

In [18]:
p.asfreq("M", how="end")

Period('2011-06', 'M')

- when we are converting from high to low frequency, pandas determines the subperiod, depending on where the superperiod "belongs".
- For example, in Y-JUN frequency, the month Aug-2011 is actually part of the 2012 period

In [19]:
p = pd.Period("Aug-2011", "M")

p.asfreq("Y-JUN")

Period('2012', 'Y-JUN')

In [20]:
# whole PeriodIndex objects or time series can be similarly converted with the same semantics
periods = pd.period_range("2006", "2009", freq="Y-DEC")

ts = pd.Series(np.random.standard_normal(len(periods)), index=periods)
ts

2006    0.161109
2007   -1.782804
2008    0.803557
2009   -0.261207
Freq: Y-DEC, dtype: float64

In [21]:
ts.asfreq("M", how="start")

# here the annual periods are replaced with monthly periods corresponding to the first month falling within each annual period.

2006-01    0.161109
2007-01   -1.782804
2008-01    0.803557
2009-01   -0.261207
Freq: M, dtype: float64

In [22]:
# If we instead wanted the last business day of each year, we can use "B" frequency and indicate that we want the end of the period

ts.asfreq("B", how="end")

  ts.asfreq("B", how="end")


2006-12-29    0.161109
2007-12-31   -1.782804
2008-12-31    0.803557
2009-12-31   -0.261207
Freq: B, dtype: float64

## [ Quarterly Period Frequencies ]
- quarterly data is standard in accounting, finance, and other fields.
- much quarterly data is reported relative to a fiscal year end, typically the last calendar or business day of one of the 12 months of the year
- thus, the period 2012Q4 has a different meaning depending on fiscal year end.
- pandas supports all 12 possible quarterly frequencies as Q-JAN through Q-DEC

In [23]:
p = pd.Period("2012Q4", freq="Q-JAN")
p

Period('2012Q4', 'Q-JAN')

In [24]:
# in the case of fiscal year ending in January, 2012Q4 runs from November 2011 through January 2012, which you can check by converting to daily frequency

p.asfreq("D", how="start")

Period('2011-11-01', 'D')

In [25]:
p.asfreq("D", how="end")

Period('2012-01-31', 'D')

In [28]:
# it's possible to do convenient period arithmetic; for example, to get the timestamp at 4PM on the second to last business day of the quarter, you could do
p4pm = (p.asfreq("B", how="end") - 1).asfreq("T", how="start") + 16 * 6
0
p4pm 

  p4pm = (p.asfreq("B", how="end") - 1).asfreq("T", how="start") + 16 * 6
  p4pm = (p.asfreq("B", how="end") - 1).asfreq("T", how="start") + 16 * 6


Period('2012-01-30 01:36', 'min')

In [29]:
p4pm.to_timestamp()

Timestamp('2012-01-30 01:36:00')

In [31]:
# the to_timestamp method returns the Timestamp at the start of the peiod by default
# we can generate quarterly ranges using pandas.period_range
# the arithmetic is identical too

periods = pd.period_range("2011Q3", "2012Q4", freq="Q-JAN")

ts = pd.Series(np.arange(len(periods)), index=periods)
ts

2011Q3    0
2011Q4    1
2012Q1    2
2012Q2    3
2012Q3    4
2012Q4    5
Freq: Q-JAN, dtype: int64

In [33]:
new_periods = (periods.asfreq("B", "end") - 1).asfreq("H", "start") + 16

ts.index = new_periods.to_timestamp()
ts

  new_periods = (periods.asfreq("B", "end") - 1).asfreq("H", "start") + 16
  new_periods = (periods.asfreq("B", "end") - 1).asfreq("H", "start") + 16


2010-10-28 16:00:00    0
2011-01-28 16:00:00    1
2011-04-28 16:00:00    2
2011-07-28 16:00:00    3
2011-10-28 16:00:00    4
2012-01-30 16:00:00    5
dtype: int64

## [ Converting Timestamps to Periods (and Back) ]

In [35]:
# series and dataframe objects indexed by timestamps can be converted to periods with the to_period method
dates = pd.date_range("2000-01-01", periods=3, freq="ME")

ts = pd.Series(np.random.standard_normal(3), index=dates)
ts

2000-01-31   -0.479666
2000-02-29   -2.621426
2000-03-31    0.398265
Freq: ME, dtype: float64

In [36]:
pts = ts.to_period()
pts

2000-01   -0.479666
2000-02   -2.621426
2000-03    0.398265
Freq: M, dtype: float64


- **Periods** represent blocks of time — like a day, month, year, etc.
-  A **timestamp** (like `2023-04-14 10:00`) can only belong to **one period** — e.g., if you're using a **monthly frequency**, this timestamp belongs to **April 2023**, not March or May.
- This is because **periods don't overlap** — each moment belongs to just one.
- When you convert a series of timestamps to a **PeriodIndex**, pandas tries to guess the frequency.
- But you can **manually set the frequency** if needed.
- You can have **multiple timestamps** that map to the **same period**.
- This is **okay and expected**, for example, many timestamps in April will map to the period `2023-04`.


In [37]:
dates = pd.date_range("2000-01-29", periods=6)
ts2 = pd.Series(np.random.standard_normal(6), index=dates)
ts2

2000-01-29    0.349628
2000-01-30    0.864631
2000-01-31   -0.022071
2000-02-01    0.829810
2000-02-02    0.100221
2000-02-03    0.976807
Freq: D, dtype: float64

In [38]:
ts2.to_period("M")

2000-01    0.349628
2000-01    0.864631
2000-01   -0.022071
2000-02    0.829810
2000-02    0.100221
2000-02    0.976807
Freq: M, dtype: float64

In [40]:
# to convert back to timestamps, use the to_timestamp method, which returns a DatetimeIndex
pts = ts2.to_period()
pts

2000-01-29    0.349628
2000-01-30    0.864631
2000-01-31   -0.022071
2000-02-01    0.829810
2000-02-02    0.100221
2000-02-03    0.976807
Freq: D, dtype: float64

In [41]:
pts.to_timestamp(how="end")

2000-01-29 23:59:59.999999999    0.349628
2000-01-30 23:59:59.999999999    0.864631
2000-01-31 23:59:59.999999999   -0.022071
2000-02-01 23:59:59.999999999    0.829810
2000-02-02 23:59:59.999999999    0.100221
2000-02-03 23:59:59.999999999    0.976807
Freq: D, dtype: float64

## [ Creating a PeriodIndex from Arrays ]

In [42]:
# fixed frequency datasets are sometimes stored with time span information spread across multiple columns
# for example, in this macroeconomic dataset, the year and quarter are in different columns

data = pd.read_csv("examples/macrodata.csv")
data.head(10)

Unnamed: 0,year,quarter,realgdp,realcons,realinv,realgovt,realdpi,cpi,m1,tbilrate,unemp,pop,infl,realint
0,1959.0,1.0,2710.349,1707.4,286.898,470.045,1886.9,28.98,139.7,2.82,5.8,177.146,0.0,0.0
1,1959.0,2.0,2778.801,1733.7,310.859,481.301,1919.7,29.15,141.7,3.08,5.1,177.83,2.34,0.74
2,1959.0,3.0,2775.488,1751.8,289.226,491.26,1916.4,29.35,140.5,3.82,5.3,178.657,2.74,1.09
3,1959.0,4.0,2785.204,1753.7,299.356,484.052,1931.3,29.37,140.0,4.33,5.6,179.386,0.27,4.06
4,1960.0,1.0,2847.699,1770.5,331.722,462.199,1955.5,29.54,139.6,3.5,5.2,180.007,2.31,1.19
5,1960.0,2.0,2834.39,1792.9,298.152,460.4,1966.1,29.55,140.2,2.68,5.2,180.671,0.14,2.55
6,1960.0,3.0,2839.022,1785.8,296.375,474.676,1967.8,29.75,140.9,2.36,5.6,181.528,2.7,-0.34
7,1960.0,4.0,2802.616,1788.2,259.764,476.434,1966.6,29.84,141.1,2.29,6.3,182.287,1.21,1.08
8,1961.0,1.0,2819.264,1787.7,266.405,475.854,1984.5,29.81,142.1,2.37,6.8,182.992,-0.4,2.77
9,1961.0,2.0,2872.005,1814.3,286.246,480.328,2014.4,29.92,142.9,2.29,7.0,183.691,1.47,0.81


In [43]:
data["year"]

0      1959.0
1      1959.0
2      1959.0
3      1959.0
4      1960.0
        ...  
198    2008.0
199    2008.0
200    2009.0
201    2009.0
202    2009.0
Name: year, Length: 203, dtype: float64

In [45]:
data["quarter"]

0      1.0
1      2.0
2      3.0
3      4.0
4      1.0
      ... 
198    3.0
199    4.0
200    1.0
201    2.0
202    3.0
Name: quarter, Length: 203, dtype: float64

In [46]:
# by passing these arrays to PeriodIndex with a frequency, we can combine them to form an index for the DataFrame
index = pd.PeriodIndex(year=data["year"], quarter=data["quarter"], freq="Q-DEC")
index

  index = pd.PeriodIndex(year=data["year"], quarter=data["quarter"], freq="Q-DEC")


PeriodIndex(['1959Q1', '1959Q2', '1959Q3', '1959Q4', '1960Q1', '1960Q2',
             '1960Q3', '1960Q4', '1961Q1', '1961Q2',
             ...
             '2007Q2', '2007Q3', '2007Q4', '2008Q1', '2008Q2', '2008Q3',
             '2008Q4', '2009Q1', '2009Q2', '2009Q3'],
            dtype='period[Q-DEC]', length=203)

In [48]:
data.index = index
data["infl"]

1959Q1    0.00
1959Q2    2.34
1959Q3    2.74
1959Q4    0.27
1960Q1    2.31
          ... 
2008Q3   -3.16
2008Q4   -8.79
2009Q1    0.94
2009Q2    3.37
2009Q3    3.56
Freq: Q-DEC, Name: infl, Length: 203, dtype: float64