## Lecture 15 – Part II          
                                             
 Date and time manipulations               
   - tidyquant to get macro data           
   - plotting time-series data             
       - scale_x_date()                    
   - Aggregating time-series data          
       - mean/median or last day           
   - Plotting multiple time-series         
       - stacked plots with facet_wrap()   
       - standardizing multiple variables  
           and plot them together          
   - Unit root tests                       
       - Philip-Perron test                
       - do differenced variables:         
           simple difference,              
           percentage change   
___

In [None]:
import pandas as pd
import numpy as np
from plotnine import *
from datetime import datetime
from mizani.breaks import date_breaks
from mizani.formatters import date_format
from fredapi import Fred
import warnings

%matplotlib inline
warnings.filterwarnings("ignore")

To access time series data through Fred API, first you have to create a free account at the [Fred website](https://fred.stlouisfed.org/#), then request a free API key [here](https://fred.stlouisfed.org/docs/api/api_key.html)

In [None]:
#fred = Fred(api_key='insert api key here')
fred = Fred(api_key='4a7c4684a660546c9e0ae87a7b47cc45')

Get three data-tables:

  1. US GDP levels - quarterly from 1979Q1
  2. Inflation (CPI level) - monthly from 1978-12 
  3. SP500 closing prices - daily from 1997-12-31 - 2018-12-3

In [None]:
gdp = (
    fred.get_series_latest_release("GDP")
    .loc[lambda x: x.index >= "1979-01-01"]
    .to_frame()
    .reset_index()
    .rename(columns={"index":"date",0:"gdp"})
)
gdp.head()

In [None]:
inflation = (
    fred.get_series_latest_release("USACPIALLMINMEI")
    .loc[lambda x: x.index >= "1978-01-01"]
    .to_frame()
    .reset_index()
    .rename(columns={"index": "date", 0: "price"})
)
inflation.head()

We want year-on-year changes

In [None]:
inflation = inflation.assign(inflation=lambda x: x["price"] - x["price"].shift(12)).loc[
    lambda x: x["date"] >= "1979-01-01"
]

inflation.head()

SP500 Stock Prices


In [None]:
sp500 = (
    pd.read_csv("https://osf.io/fpkm4/download")
    .filter(["date", "p_SP500"])
    .rename(columns={"p_SP500": "price"})
)
sp500["date"] = pd.to_datetime(sp500["date"]) # since we import this from a csv file, we have to tell python, the type
sp500.head()

### Plot time-series

GDP

In [None]:
(
    ggplot(gdp, aes(x="date", y="gdp"))
    + geom_line(color="red", size=1)
    + labs(x="Year", y="GDP (billions)")
    + theme_bw()
)

 Highly exponentially trending (and there is seasonality)...


Inflation

In [None]:
(
    ggplot(inflation, aes(x="date", y="inflation"))
    + geom_line(color="red", size=1)
    + labs(x="Year", y="Inflation")
    + theme_bw()
)

 Seems like stationary, but it is not... (we will see)

SP500 prices

In [None]:
(
    ggplot(sp500, aes(x="date", y="price"))
    + geom_line(color="red", size=0.5)
    + labs(x="Date", y="Price ($)")
    + theme_bw()
)

Classical random walk

### De-tour: date-time variable on axis:

Yearly tickers with limits and minor breaks


In [None]:
(
    ggplot(sp500, aes(x="date", y="price"))
    + geom_line(color="red", size=0.5)
    + labs(x="Date", y="Price ($)")
    + scale_x_date(
        date_breaks="3 year",
        date_minor_breaks="1 year",
        date_labels="%Y",
        limits=(datetime(1997, 1, 1), datetime(2020, 1, 1)),
    )
    + theme_bw()
)

Monthly tickers with limits and minor breaks

In [None]:
(
    ggplot(sp500.loc[lambda x: x["date"] > "2018-01-01"], aes(x="date", y="price"))
    + geom_line(color="red", size=0.5)
    + labs(x="Date", y="Price ($)")
    + scale_x_date(
        date_breaks="3 month",
        date_minor_breaks="1 month",
        date_labels="%b %Y",
        limits=(datetime(2018, 1, 1), datetime(2019, 1, 1)),
    )
    + theme_bw()
)

### Task:

Use monthly tickers between 2008-2010 and change the frequency of the breaks

use `%m` with `-` sign or `%B` instead of `%b`


Monthly tickers with limits and minor breaks

In [None]:
(
    ggplot(
        sp500.loc[lambda x: x["date"] > "2008-01-01"].loc[
            lambda x: x["date"] < "2011-01-01"
        ],
        aes(x="date", y="price"),
    )
    + geom_line(color="red", size=0.5)
    + labs(x="Date", y="Price ($)")
    + scale_x_date(
        date_breaks="6 month",
        date_minor_breaks="3 month",
        date_labels="%Y-%m",
        limits=(datetime(2008, 1, 1), datetime(2011, 1, 1)),
    )
    + theme_bw()
)

### Aggregation: put everything into the same frequency

Base data-table is GDP

In [None]:
df = gdp.rename(columns={"date": "time"})

 1st: Aggregate inflat to quarterly frequency:

 Add years and quarters

In [None]:
inflation["quarter"] = inflation["date"].dt.quarter.astype(str)
inflation["year"] = inflation["date"].dt.year.astype(str)

Average for inflation (median or other measure is also good if reasonable)

In [None]:
agg_inflation = inflation.groupby(["year","quarter"])["inflation"].mean().reset_index()

In [None]:
agg_inflation["date"] = agg_inflation["year"] + "-Q" + agg_inflation["quarter"]
agg_inflation["time"] = pd.to_datetime(agg_inflation["date"])

Join to df


In [None]:
df = df.merge(agg_inflation.filter(["time","inflation"]), on = "time")

### Task:

Aggregate SP500 prices to quarterly frequency, with the last closing price at each period


Add years and quarters

In [None]:
sp500["quarter"] = sp500["date"].dt.quarter.astype(str)
sp500["year"] = sp500["date"].dt.year.astype(str)

Last day for each quarters ('closing price')

In [None]:
agg_sp500 = (
    sp500.sort_values(by=["date"])
    .groupby(["year", "quarter"])
    .agg(date=("date", "max"))
    .merge(sp500, on="date", how="left")
)

In [None]:
agg_sp500["date"] = agg_sp500["year"] + "-Q" + agg_sp500["quarter"]
agg_sp500["time"] = pd.to_datetime(agg_sp500["date"])

In [None]:
df = df.merge(agg_sp500.filter(["time","price"]), on = "time", how = "left")

In [None]:
# Filter data from 1997-10
df = df.loc[lambda x: x["time"]>="1997-10-01"]

In [None]:
df

### Visualization of the data

NO 1: check the time-series in different graphs:

need a trick to create a new stacked data to color by a variable

In [None]:
df_long = df.melt(id_vars="time", var_name="type", value_name="values")

(
    ggplot(df_long, aes(x="time", y="values", color="type"))
    + geom_line()
    + facet_wrap(
        "~type",
        scales="free",
        ncol=1,
        labeller={
            "price": "SP500 price",
            "gdp": "GDP (millions)",
            "inflation": "Inflation (%)",
        },
    )
    + labs(x="Years", y="")
    + scale_x_date(
        date_breaks="3 year",
        date_minor_breaks="1 year",
        date_labels="%Y",
        limits=(datetime(1997, 1, 1), datetime(2020, 1, 1)),
    )
    + scale_color_discrete(guide=False)
    + theme_bw()
)

## Analyzing time-series properties:

In [None]:
from arch.unitroot import PhillipsPerron

In [None]:
print(PhillipsPerron(df["gdp"], lags=4, test_type="rho"))
print(PhillipsPerron(df["inflation"], lags=4, test_type="rho"))
print(PhillipsPerron(df["price"].dropna(), lags=4, test_type="rho"))

The PP tests suggest that all three variables are non-stationary (tehy contain a unit root).

Lets check percent change for gdp and price and differenced value for inflation:


In [None]:
df = df.assign(
    dgdp=lambda x: 100 * (x["gdp"] - x["gdp"].shift(1)) / x["gdp"],
    dinflation=lambda x: (x["inflation"] - x["inflation"].shift(1)),
    return_=lambda x: 100 * (x["price"] - x["price"].shift(1)) / x["price"],
)

### Task

Check again the Philip-Perron tests!

In [None]:
print(PhillipsPerron(df["dgdp"].dropna(), lags=4, test_type="rho"))
print(PhillipsPerron(df["dinflation"].dropna(), lags=4, test_type="rho"))
print(PhillipsPerron(df["return_"].dropna(), lags=4, test_type="rho"))

NO2 visualise standardised series

In [None]:
stdd = lambda x: (x - np.nanmean(x)) / x.std()
df_long = (
    df.dropna()
    .set_index("time")
    .filter(["dgdp", "dinflation", "return_"])
    .apply(stdd)
    .reset_index()
    .melt(id_vars="time", value_vars=["dgdp", "dinflation", "return_"])
)

In [None]:
(
    (ggplot(df_long, aes(x="time", y="value", color="variable")) + geom_line())
    + scale_color_manual(
        name="Variable",
        values={"dgdp": "red", "dinflation": "blue", "return_": "green"},
        labels={
            "dgdp": "GDP change",
            "dinflation": "Inflation change",
            "return_": "SP500 return",
        },
    )
    + labs(x="Years", y="Standardized values")
    + scale_x_date(
        date_breaks="3 year",
        date_minor_breaks="1 year",
        date_labels="%Y",
        limits=(datetime(1997, 1, 1), datetime(2020, 1, 1)),
    )
    + theme_bw()
    + theme(legend_position="top", legend_title=element_blank())
)

### For association, can check scatter plots: 


GDP and inflation

In [None]:
(
    ggplot(df, aes(x="dgdp", y="dinflation"))
    + geom_point(size=1, color="red")
    + geom_smooth(method="lm", formula="y~x", se=False)
    + labs(x="GDP quarterly change (%)", y="Inflation YoY change (%)")
    + theme_bw()
)

GDP and SP500 returns

In [None]:
(
    ggplot(df, aes(x="dgdp", y="return_"))
    + geom_point(size=1, color="red")
    + geom_smooth(method="lm", formula="y~x", se=False)
    + labs(x="GDP quarterly change (%)", y="SP500 quarterly returns (%)")
    + theme_bw()
)

SP500 returns and inflation

In [None]:
(
    ggplot(df, aes(x="return_", y="dinflation"))
    + geom_point(size=1, color="red")
    + geom_smooth(method="lm", formula="y~x", se=False)
    + labs(x="GDP quarterly change (%)", y="Inflation YoY change (%)")
    + theme_bw()
)