# Time Series Operations - Intro

In this chapter, we discover (part of) the extensive time series capabilities of the pandas package.

Also see the following two links for further study:

* [https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html)
* [https://pandas.pydata.org/pandas-docs/stable/getting_started/intro_tutorials/09_timeseries.html](https://pandas.pydata.org/pandas-docs/stable/getting_started/intro_tutorials/09_timeseries.html)


# Preparations

In [1]:
import pandas as pd

pd.set_option("display.max_columns", 500)

# Demo using our financial dataset

The financial dataset from our previous examples also contains date related information: the financial year (*u_year*) and the respective fiscal year end (*u_fye*). Here, we are going to focus on the *u_year*.

In [2]:
df = pd.read_csv("../../data/raw/financial_data_intro.csv")
df.head()

Unnamed: 0,u_company_name_id,u_year,u_company_name,cb_naics,u_iso3,u_fye,cb_cusip,cb_at,cb_ni,cb_financial_industry
0,14651,2005,British American Tobacco PLC,312230,GBR,2005-12-31,110448107,32737.984,2707.11,False
1,14651,2006,British American Tobacco PLC,312230,GBR,2006-12-31,110448107,34816.074,3713.506,False
2,14651,2007,British American Tobacco PLC,312230,GBR,2007-12-31,110448107,37161.97,4226.559,False
3,14651,2008,British American Tobacco PLC,312230,GBR,2008-12-31,110448107,40276.807,3591.888,False
4,14651,2009,British American Tobacco PLC,312230,GBR,2009-12-31,110448107,43026.854,4386.107,False


# Calculating lagged and forward values with `shift()`

With the `shift()` method, values from previous or future periods can be accessed. A typical use case are calculations of 'lagged' or 'forward' values.

`shift()` will look a certain number of rows above or below the current row, according to the index of the DataFrame. It can be combined with `groupby` - in our case this is absolutely essential because values from one firm might be transferred to another in our **cross-sectional time series** dataset!

**Make sure that the data is sorted by the relevant date/time column**.

Also consider date-time indexing, if you plan to work more intensely with time series data:
[https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#indexing](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#indexing)

In [3]:
# sort by firm, then by year
df = df.sort_values(["u_company_name_id", "u_year"])
df.head(5)

Unnamed: 0,u_company_name_id,u_year,u_company_name,cb_naics,u_iso3,u_fye,cb_cusip,cb_at,cb_ni,cb_financial_industry
199,2172,2015,Adaptimmune Therapeutics plc,325414,GBR,2015-06-30,00653A107,300.716,-21.592,False
200,2172,2016,Adaptimmune Therapeutics plc,325414,GBR,2016-12-31,00653A107,234.515,-71.579,False
201,2172,2017,Adaptimmune Therapeutics plc,325414,GBR,2017-12-31,00653A107,281.147,-70.138,False
202,2172,2018,Adaptimmune Therapeutics plc,325414,GBR,2018-12-31,00653A107,276.736,-95.514,False
183,2440,2015,Advanced Accelerator Applications SA,325414,FRA,2015-12-31,00790T100,297.648,-18.461,False


In [4]:
# storing the lagged total assets (cb_at) WITHIN the firm
df["cb_at_lag1"] = df.groupby("u_company_name_id")["cb_at"].shift(1)
df[["u_company_name", "u_year", "cb_at", "cb_at_lag1"]].head(20)

Unnamed: 0,u_company_name,u_year,cb_at,cb_at_lag1
199,Adaptimmune Therapeutics plc,2015,300.716,
200,Adaptimmune Therapeutics plc,2016,234.515,300.716
201,Adaptimmune Therapeutics plc,2017,281.147,234.515
202,Adaptimmune Therapeutics plc,2018,276.736,281.147
183,Advanced Accelerator Applications SA,2015,297.648,
184,Advanced Accelerator Applications SA,2016,441.858,297.648
488,Air France - KLM,2005,32142.858,
489,Air France - KLM,2006,35668.458,32142.858
490,Air France - KLM,2007,48505.545,35668.458
491,Air France - KLM,2008,38155.875,48505.545


In [5]:
# calculating the total assets 2 (!) years in the future:
df["cb_at_forward2"] = df.groupby("u_company_name_id")["cb_at"].shift(-2)
df[["u_company_name", "u_year", "cb_at", "cb_at_forward2"]].head(20)

Unnamed: 0,u_company_name,u_year,cb_at,cb_at_forward2
199,Adaptimmune Therapeutics plc,2015,300.716,281.147
200,Adaptimmune Therapeutics plc,2016,234.515,276.736
201,Adaptimmune Therapeutics plc,2017,281.147,
202,Adaptimmune Therapeutics plc,2018,276.736,
183,Advanced Accelerator Applications SA,2015,297.648,
184,Advanced Accelerator Applications SA,2016,441.858,
488,Air France - KLM,2005,32142.858,48505.545
489,Air France - KLM,2006,35668.458,38155.875
490,Air France - KLM,2007,48505.545,37568.465
491,Air France - KLM,2008,38155.875,35438.344


In [6]:
df["cb_at_lag2"] = df.groupby("u_company_name_id")["cb_at"].shift(2)
df.head(3)

Unnamed: 0,u_company_name_id,u_year,u_company_name,cb_naics,u_iso3,u_fye,cb_cusip,cb_at,cb_ni,cb_financial_industry,cb_at_lag1,cb_at_forward2,cb_at_lag2
199,2172,2015,Adaptimmune Therapeutics plc,325414,GBR,2015-06-30,00653A107,300.716,-21.592,False,,281.147,
200,2172,2016,Adaptimmune Therapeutics plc,325414,GBR,2016-12-31,00653A107,234.515,-71.579,False,300.716,276.736,
201,2172,2017,Adaptimmune Therapeutics plc,325414,GBR,2017-12-31,00653A107,281.147,-70.138,False,234.515,,300.716


In [7]:
df["cb_at_diff1_percent"] = (df["cb_at"] - df["cb_at_lag1"]) / df["cb_at_lag1"] * 100

# Calculate changes (differences)

The `diff()` method can be used to calculate changes over time. Again, make sure to use `groupby()` if necessary!

In [8]:
# First difference of cb_at
df["cb_at_diff1"] = df.groupby("u_company_name_id")["cb_at"].diff(1)
df[["u_company_name", "u_year", "cb_at", "cb_at_diff1"]].head(20)

Unnamed: 0,u_company_name,u_year,cb_at,cb_at_diff1
199,Adaptimmune Therapeutics plc,2015,300.716,
200,Adaptimmune Therapeutics plc,2016,234.515,-66.201
201,Adaptimmune Therapeutics plc,2017,281.147,46.632
202,Adaptimmune Therapeutics plc,2018,276.736,-4.411
183,Advanced Accelerator Applications SA,2015,297.648,
184,Advanced Accelerator Applications SA,2016,441.858,144.21
488,Air France - KLM,2005,32142.858,
489,Air France - KLM,2006,35668.458,3525.6
490,Air France - KLM,2007,48505.545,12837.087
491,Air France - KLM,2008,38155.875,-10349.67


# Calculate percentage changes

Combining `shift()` and `diff()`, we can calculate percentage changes

In [9]:
df["cb_at_diff1_percent"] = df.groupby("u_company_name_id")["cb_at"].diff(1) / df.groupby(
    "u_company_name_id"
)["cb_at"].shift(1)
df[["u_company_name", "u_year", "cb_at", "cb_at_diff1_percent"]].head(20)

Unnamed: 0,u_company_name,u_year,cb_at,cb_at_diff1_percent
199,Adaptimmune Therapeutics plc,2015,300.716,
200,Adaptimmune Therapeutics plc,2016,234.515,-0.220145
201,Adaptimmune Therapeutics plc,2017,281.147,0.198844
202,Adaptimmune Therapeutics plc,2018,276.736,-0.015689
183,Advanced Accelerator Applications SA,2015,297.648,
184,Advanced Accelerator Applications SA,2016,441.858,0.484498
488,Air France - KLM,2005,32142.858,
489,Air France - KLM,2006,35668.458,0.109685
490,Air France - KLM,2007,48505.545,0.3599
491,Air France - KLM,2008,38155.875,-0.213371


# Window functions

Window functions can aggregate data over a certain number of rows (typically over time). Common examples are moving averages or running totals.

## Rolling window functions

Rolling windows always have a fixed size. The window *slides* through the data and collects a certain number of rows at a time. This can be used for moving averages, for example. The `rolling()` method invokes an object of the `Rolling` class, which offers a set of useful functions, including `count()`, `sum()`, `mean()`, etc.

See [https://pandas.pydata.org/pandas-docs/stable/reference/window.html#rolling-window-functions](https://pandas.pydata.org/pandas-docs/stable/reference/window.html#rolling-window-functions)

In [10]:
# calculating the moving average of cb_at over 3 periods (two years before and current year)
df.groupby("u_company_name_id")["cb_at"].rolling(3).mean()

u_company_name_id     
2172               199           NaN
                   200           NaN
                   201    272.126000
                   202    264.132667
2440               183           NaN
                             ...    
109031             159           NaN
                   160     84.692667
                   161     79.948333
                   162     74.099000
                   163     75.185000
Name: cb_at, Length: 824, dtype: float64

In [11]:
df.groupby("u_company_name_id")["cb_at"].rolling(3).mean().to_numpy()

array([           nan,            nan, 2.72126000e+02, 2.64132667e+02,
                  nan,            nan,            nan,            nan,
       3.87722870e+04, 4.07766260e+04, 4.14099617e+04, 3.70542280e+04,
       3.64113417e+04, 3.55653040e+04, 3.31227303e+04, 2.94934840e+04,
       2.58826487e+04, 2.62935403e+04, 2.89361443e+04,            nan,
                  nan, 3.54068000e+02, 4.06460333e+02, 5.76962333e+02,
       7.96681000e+02, 9.86724333e+02, 9.46442667e+02, 8.40913333e+02,
       7.20015667e+02, 6.48340667e+02, 5.43104000e+02, 5.10269667e+02,
       5.41465667e+02, 5.98609000e+02,            nan,            nan,
       4.82726667e+01, 2.76513333e+01,            nan,            nan,
       1.37374297e+06, 1.42320149e+06, 1.23890514e+06, 9.98786350e+05,
       8.32824814e+05, 8.59116133e+05, 9.09508688e+05, 9.57142429e+05,
       9.59122051e+05, 9.43181408e+05, 9.79334745e+05, 1.01458680e+06,
                  nan,            nan, 4.12775550e+04, 3.93767200e+04,
      

In [12]:
df["rol_mean_cb_at"] = df.groupby("u_company_name_id")["cb_at"].rolling(3).mean().to_numpy()
df[["u_company_name", "u_year", "cb_at", "rol_mean_cb_at"]].head(20)

Unnamed: 0,u_company_name,u_year,cb_at,rol_mean_cb_at
199,Adaptimmune Therapeutics plc,2015,300.716,
200,Adaptimmune Therapeutics plc,2016,234.515,
201,Adaptimmune Therapeutics plc,2017,281.147,272.126
202,Adaptimmune Therapeutics plc,2018,276.736,264.132667
183,Advanced Accelerator Applications SA,2015,297.648,
184,Advanced Accelerator Applications SA,2016,441.858,
488,Air France - KLM,2005,32142.858,
489,Air France - KLM,2006,35668.458,
490,Air France - KLM,2007,48505.545,38772.287
491,Air France - KLM,2008,38155.875,40776.626


In [13]:
# unfortunately, the index of this result does not fit with our main DataFrame.
# Therefore, we pass the underlying numpy array (because we know it is sorted correctly!)
df["cb_at_mavg3"] = df.groupby("u_company_name_id")["cb_at"].rolling(3).mean().to_numpy()
df[["u_company_name", "u_year", "cb_at", "cb_at_mavg3"]].head(20)

Unnamed: 0,u_company_name,u_year,cb_at,cb_at_mavg3
199,Adaptimmune Therapeutics plc,2015,300.716,
200,Adaptimmune Therapeutics plc,2016,234.515,
201,Adaptimmune Therapeutics plc,2017,281.147,272.126
202,Adaptimmune Therapeutics plc,2018,276.736,264.132667
183,Advanced Accelerator Applications SA,2015,297.648,
184,Advanced Accelerator Applications SA,2016,441.858,
488,Air France - KLM,2005,32142.858,
489,Air France - KLM,2006,35668.458,
490,Air France - KLM,2007,48505.545,38772.287
491,Air France - KLM,2008,38155.875,40776.626


## Expanding window functions

Expanding windows contain all observations up to the current row. A typical use case is the calculation of running totals. The `expanding()` method creates an object of the `Expanding` class, which offers the same capabilities as the `Rolling` class.

See [https://pandas.pydata.org/pandas-docs/stable/reference/window.html#expanding-window-functions](https://pandas.pydata.org/pandas-docs/stable/reference/window.html#expanding-window-functions).

In [14]:
# calculating the running total of net income (cb_ni)
df["cb_ni_running_total"] = (
    df.groupby("u_company_name_id")["cb_ni"].expanding(min_periods=3).sum().to_numpy()
)
df[["u_company_name", "u_year", "cb_ni", "cb_ni_running_total"]].head(10)

Unnamed: 0,u_company_name,u_year,cb_ni,cb_ni_running_total
199,Adaptimmune Therapeutics plc,2015,-21.592,
200,Adaptimmune Therapeutics plc,2016,-71.579,
201,Adaptimmune Therapeutics plc,2017,-70.138,-163.309
202,Adaptimmune Therapeutics plc,2018,-95.514,-258.823
183,Advanced Accelerator Applications SA,2015,-18.461,
184,Advanced Accelerator Applications SA,2016,-26.69,
488,Air France - KLM,2005,1108.291,
489,Air France - KLM,2006,1191.623,
490,Air France - KLM,2007,1182.214,3482.128
491,Air France - KLM,2008,-1079.445,2402.683


# Exercise

1. Load the first sheet of the Excel file "wdi_timeseries.xlsx" into a pandas DataFrame (see [here](https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html) for help with `pandas.read_excel()`)
2. Calculate the yearly change in life expectancy (*SP_DYN_LE00_IN*) per country.
3. Calculate the yearly change in life expectancy (*SP_DYN_LE00_IN*) per country as **percentage**.
4. Calculate the moving average of the market capitalization (*CM_MKT_LCAP_CD*) per country over 2 years, including the current year.
5. BONUS: Calculate the moving average of the market capitalization (*CM_MKT_LCAP_CD*) per country over 3 years, starting 1 year before the current year and including the year after the current year.