Try out the package without installation (after loading, try clicking on
README.Rmd in the
rEDM package is a collection of methods for Empirical Dynamic
Modeling (EDM). EDM is based on the mathematical theory of recontructing
attractor manifolds from time series data, with applications to
forecasting, causal inference, and more. It is based on research
software previously developed for the Sugihara Lab (University of
California San Diego, Scripps Institution of Oceanography).
You can install rEDM from CRAN with:
OR from github with:
# install.packages("remotes") remotes::install_github("ha0ye/rEDM")
If you are on Windows, you may need to install Rtools first, so that you have access to a C++ compiler.
We begin by looking at annual time series of sunspots:
dat <- data.frame(yr = as.numeric(time(sunspot.year)), sunspot_count = as.numeric(sunspot.year)) plot(dat$yr, dat$sunspot_count, type = "l", xlab = "year", ylab = "sunspots")
First, we use simplex to determine the optimal embedding dimension, E:
library(rEDM) # load the package n <- NROW(dat) lib <- c(1, floor(2/3 * n)) # indices for the first 2/3 of the time series pred <- c(floor(2/3 * n) + 1, n) # indices for the final 1/3 of the time series output <- simplex(dat, # input data (for data.frames, uses 2nd column) lib = lib, pred = lib, # which portions of the data to train and predict E = 1:10) # embedding dimensions to try summary(output[, 1:9]) #> E tau tp nn num_pred #> Min. : 1.00 Min. :1 Min. :1 Min. : 2.00 Min. :182.0 #> 1st Qu.: 3.25 1st Qu.:1 1st Qu.:1 1st Qu.: 4.25 1st Qu.:184.2 #> Median : 5.50 Median :1 Median :1 Median : 6.50 Median :186.5 #> Mean : 5.50 Mean :1 Mean :1 Mean : 6.50 Mean :186.5 #> 3rd Qu.: 7.75 3rd Qu.:1 3rd Qu.:1 3rd Qu.: 8.75 3rd Qu.:188.8 #> Max. :10.00 Max. :1 Max. :1 Max. :11.00 Max. :191.0 #> rho mae rmse perc #> Min. :0.7082 Min. :10.17 Min. :13.94 Min. :1 #> 1st Qu.:0.8759 1st Qu.:10.78 1st Qu.:14.21 1st Qu.:1 #> Median :0.9079 Median :11.32 Median :14.86 Median :1 #> Mean :0.8815 Mean :12.15 Mean :16.48 Mean :1 #> 3rd Qu.:0.9172 3rd Qu.:12.72 3rd Qu.:17.42 3rd Qu.:1 #> Max. :0.9195 Max. :18.23 Max. :25.92 Max. :1
It looks like
E = 3 or
4 is optimal. Since we generally want a
simpler model, if possible, let’s go with
E = 3 to forecast the
remaining 1/3 of the data.
output <- simplex(dat, lib = lib, pred = pred, # predict on last 1/3 E = 3, stats_only = FALSE) # return predictions, too predictions <- na.omit(output$model_output[]) plot(dat$yr, dat$sunspot_count, type = "l", xlab = "year", ylab = "sunspots") lines(predictions$time, predictions$pred, col = "blue", lty = 2) polygon(c(predictions$time, rev(predictions$time)), c(predictions$pred - sqrt(predictions$pred_var), rev(predictions$pred + sqrt(predictions$pred_var))), col = rgb(0, 0, 1, 0.5), border = NA)
Please see the package vignettes for more details: