# Statistical models in R
This notebook is covering:
1. OLS-models.

In [2]:
import pandas as pd
import plotly.express as px

In [3]:
# rpy2 is a Python package that allows you to run R code from Python
%pip install rpy2

Note: you may need to restart the kernel to use updated packages.


In [4]:
# Load the rpy2 extension to use R in Jupyter
%load_ext rpy2.ipython

The magic function `%%R` is used for running R code in Jupyter

In [5]:
%%R
# Install required packages
if (require("dplyr") == FALSE) {
  install.packages("dplyr")
  library(dplyr)
}
if (require("zoo") == FALSE) {
  install.packages("zoo")
  library(zoo)
}
if (require("psych") == FALSE) {
  install.packages("psych")
  library(psych)
}
if (require("TSA") == FALSE) {
  install.packages("TSA")
  library(TSA)
}
if (require("forecast") == FALSE) {
  install.packages("forecast")
  library(forecast)
}
if (require("Metrics") == FALSE) {
  install.packages("Metrics")
  library(Metrics)
}
if (require("ggplot2") == FALSE) {
  install.packages("ggplot2")
  library(ggplot2)
}
if (require("tseries") == FALSE) {
  install.packages("tseries")
  library(tseries)
}


Loading required package: dplyr

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Loading required package: zoo

Attaching package: ‘zoo’

The following objects are masked from ‘package:base’:

    as.Date, as.Date.numeric

Loading required package: psych
Loading required package: TSA

Attaching package: ‘TSA’

The following objects are masked from ‘package:stats’:

    acf, arima

The following object is masked from ‘package:utils’:

    tar

Loading required package: forecast
Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 
Registered S3 methods overwritten by 'forecast':
  method       from
  fitted.Arima TSA 
  plot.Arima   TSA 
Loading required package: Metrics

Attaching package: ‘Metrics’

The following object is masked from ‘package:forecast’:

    accuracy

Loading required package: 

In [10]:
%%R
# Load data
hub_prices <- list(
  nbp = read.csv("../../data/interpolated/nbp_close_interpolated.csv"),
  peg = read.csv("../../data/interpolated/peg_close_interpolated.csv"),
  the = read.csv("../../data/interpolated/the_close_interpolated.csv"),
  ttf = read.csv("../../data/interpolated/ttf_close_interpolated.csv"),
  ztp = read.csv("../../data/interpolated/ztp_close_interpolated.csv")
)

In [17]:
%%R

hub1_name <- "ztp"
hub2_name <- "ttf"
hub1 <- hub_prices[[hub1_name]]$CLOSE
hub2 <- hub_prices[[hub2_name]]$CLOSE
hubs <- data.frame(hub1 = hub1, hub2 = hub2)


In [18]:
%%R


# Function to store the residuals of an OLS model to be used in VAR
ols_model <- function(hubs) {
  model <- lm(hubs$hub1 ~ hubs$hub2)
  return (model)
}
ols <- ols_model(hubs)

print(summary(ols))


Call:
lm(formula = hubs$hub1 ~ hubs$hub2)

Residuals:
    Min      1Q  Median      3Q     Max 
-52.515  -1.492  -0.285   1.208  20.046 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 2.851481   0.164423   17.34   <2e-16 ***
hubs$hub2   0.892812   0.002453  363.98   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.697 on 1543 degrees of freedom
Multiple R-squared:  0.9885,	Adjusted R-squared:  0.9885 
F-statistic: 1.325e+05 on 1 and 1543 DF,  p-value: < 2.2e-16



In [19]:
%%R
print(adf.test(ols$residuals))


	Augmented Dickey-Fuller Test

data:  ols$residuals
Dickey-Fuller = -5.7188, Lag order = 11, p-value = 0.01
alternative hypothesis: stationary



In adf.test(ols$residuals) : p-value smaller than printed p-value


In [20]:
%%R
residuals_df <- data.frame(Date = hub_prices[[hub1_name]]$Date ,Residuals = ols$residuals)

In [23]:
%%R
folder_path <- "../intermediate_storage/"
write.csv(residuals_df, paste0(folder_path, hub1_name, "_", hub2_name, "_residuals.csv"), row.names = FALSE)