Multiview() chooses the maximum lag based on the number of predictor columns used, regardless of E. #44

duriah · 2021-01-11T14:24:10Z

On https://cran.r-project.org/web/packages/rEDM/vignettes/rEDM-tutorial.pdf in the description of function Multiview there is written: Multiview() operates by constructing all possible embeddings of dimension E with lag up to E-1.

I have noticed that the function behaves oddly in that regard, namely it chooses the maximum lag to be equal to the number of predictor columns used, regardless of other parameter values. I'm showing this with some examples:

example 1: 2 predictor columns and E=3

library(rEDM)
data(block_3sp)

L = Multiview( dataFrame = block_3sp, lib = "1 99", pred = "105 190", 
               E = 3, columns = "x_t y_t", target = "x_t", multiview = 100)
head(L$View)

  col_1 col_2   name_1   name_2      rho    MAE   RMSE
1     1     2 x_t(t-0) x_t(t-1)  0.92240 0.2548 0.3259
2     1     3 x_t(t-0) y_t(t-0)  0.86910 0.3214 0.4187
3     1     4 x_t(t-0) y_t(t-1)  0.90120 0.2883 0.3663
4     2     3 x_t(t-1) y_t(t-0)  0.73710 0.4535 0.5764
5     2     4 x_t(t-1) y_t(t-1)  0.65050 0.5242 0.6892
6     3     4 y_t(t-0) y_t(t-1) -0.01939 0.8524 1.0530

As can be seen, the lag is indeed E-1=2, but the the dimensions of the single views are 2 and not 3.

example 2: 3 predictor columns and E=3

L = Multiview( dataFrame = block_3sp, lib = "1 99", pred = "105 190", 
               E = 3, columns = "x_t y_t z_t", target = "x_t", multiview = 100)
head(L$View)

  col_1 col_2 col_3   name_1   name_2   name_3    rho    MAE   RMSE
1     1     2     7 x_t(t-0) x_t(t-1) z_t(t-0) 0.9208 0.2485 0.3164
2     1     2     6 x_t(t-0) x_t(t-1) y_t(t-2) 0.8677 0.3294 0.4113
3     1     2     3 x_t(t-0) x_t(t-1) x_t(t-2) 0.9319 0.2277 0.2934
4     1     2     8 x_t(t-0) x_t(t-1) z_t(t-1) 0.9183 0.2476 0.3205
5     1     7     9 x_t(t-0) z_t(t-0) z_t(t-2) 0.8858 0.3031 0.3738
6     1     4     9 x_t(t-0) y_t(t-0) z_t(t-2) 0.7774 0.4191 0.5116

In this case the dimensions of each view is indeed 3, but the max lag is not E-1=2 but it is 3.

example 3: 4 predictor columns and E=3

L = Multiview( dataFrame = block_3sp, lib = "1 99", pred = "105 190", 
               E = 3, columns = "x_t y_t z_t y_t-1", target = "x_t", multiview = 100)
head(L$View)

  col_1 col_2 col_3 col_4   name_1   name_2   name_3     name_4    rho    MAE   RMSE
1     1     2     7    14 x_t(t-0) x_t(t-1) y_t(t-2) y_t-1(t-1) 0.8602 0.3372 0.4208
2     1     2     4     9 x_t(t-0) x_t(t-1) x_t(t-3)   z_t(t-0) 0.8766 0.2989 0.3881
3     1     2     9    11 x_t(t-0) x_t(t-1) z_t(t-0)   z_t(t-2) 0.8781 0.2955 0.3858
4     1     2     8    15 x_t(t-0) x_t(t-1) y_t(t-3) y_t-1(t-2) 0.8546 0.3188 0.4201
5     1     2     9    16 x_t(t-0) x_t(t-1) z_t(t-0) y_t-1(t-3) 0.8849 0.2992 0.3751
6     1     2     3    16 x_t(t-0) x_t(t-1) x_t(t-2) y_t-1(t-3) 0.8639 0.3134 0.4062

In this case neither the embedding dimension nor the maximum makes sense: each view is of dimension 4 and the maximum lag is also 4.

The pattern is quite obvious: Embedding dimension = maximum lag = number of columns.

I don't think this is intended, as following the description it should be: maximum lag = Embedding dimension-1; and both the maximum lag and E are independent from the number of predictor columns (to an extent).

I am also confused regarding the 2 arguments E (embedding dimension) and D (multivariate dimension). What exactly is the difference? D seems to overwrite E, meaning that if D ist set, the value of E has no influence and the maximum lag is still chosen based on the number of predictor columns. Example:

L = Multiview( dataFrame = block_3sp, lib = "1 99", pred = "105 190", D=3,
               E = 3, columns = "x_t y_t", target = "x_t", multiview = 4)
head(L$View)

  col_1 col_2 col_3   name_1   name_2   name_3    rho    MAE   RMSE
1     1     2     4 x_t(t-0) x_t(t-1) y_t(t-1) 0.8977 0.2913 0.3673
2     1     2     3 x_t(t-0) x_t(t-1) y_t(t-0) 0.8738 0.3037 0.4016
3     1     3     4 x_t(t-0) y_t(t-0) y_t(t-1) 0.7524 0.4366 0.5402
4     2     3     4 x_t(t-1) y_t(t-0) y_t(t-1) 0.4988 0.6344 0.7492

Max lag is 2 (= number of predictors), and 3 variables are used for each view (= D)

L = Multiview( dataFrame = block_3sp, lib = "1 99", pred = "105 190", D=3,
               E = 4, columns = "x_t y_t", target = "x_t", multiview = 4)
head(L$View)

  col_1 col_2 col_3   name_1   name_2   name_3    rho    MAE   RMSE
1     1     2     4 x_t(t-0) x_t(t-1) y_t(t-1) 0.8977 0.2913 0.3673
2     1     2     3 x_t(t-0) x_t(t-1) y_t(t-0) 0.8738 0.3037 0.4016
3     1     3     4 x_t(t-0) y_t(t-0) y_t(t-1) 0.7524 0.4366 0.5402
4     2     3     4 x_t(t-1) y_t(t-0) y_t(t-1) 0.4988 0.6344 0.7492

Same as above, changing E did nothing.

Also, why is the maximum lag set to E-1? I would like the possibility of constructing a model in which each view is e.g. of size 2 and the maximum lag is 3. That is, I would like to have the possibility of choosing the dimension of the views and the maximum lag separately. Is that possible?

Thank you and best regards,
Uriah

sessionInfo(package = "rEDM")

R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_Switzerland.1252  LC_CTYPE=English_Switzerland.1252    LC_MONETARY=English_Switzerland.1252
[4] LC_NUMERIC=C                         LC_TIME=English_Switzerland.1252    

attached base packages:
character(0)

other attached packages:
[1] rEDM_1.7.3

loaded via a namespace (and not attached):
 [1] compiler_4.0.3  graphics_4.0.3  htmltools_0.5.0 tools_4.0.3     utils_4.0.3     yaml_2.2.1      grDevices_4.0.3
 [8] Rcpp_1.0.5      stats_4.0.3     datasets_4.0.3  rmarkdown_2.6   knitr_1.30      methods_4.0.3   xfun_0.20      
[15] digest_0.6.27   rlang_0.4.10    base_4.0.3      evaluate_0.14

The text was updated successfully, but these errors were encountered:

SoftwareLiteracy · 2021-01-13T22:33:54Z

Dear Uriah,

Thank you for your use and analysis of rEDM.

Having looked at your examples, it appears you have identified a bug where the value of E is not properly handled. Thank you!

I have found and corrected the problem. The fix is uploaded to github as version 1.7.5.

Regarding parameters E and D:

E defines the dimension of the time-delay embedding for the state-space of each variable. Since the first dimension is the variable itself, there are E-1 additional dimensions added by successive time delays.

D defines the dimension of the multiviews. If not set, it defaults to the number of columns. For example, if D = 2, columns = "x_t y_t z_t", then the constructed views have D = 2 variables, selected from the 3*E available.

Also, please note that there is no guarantee that all lags will be present in the selected views. That is, with E = 3, which embeds each columns vector into a 3-dimensional state-space (the addition of 2 lags) does not mean that one will see X(t-2) in the top views.

Your first example has 2-dimensional views, since the number of columns is 2 and D is not specified:

Multiview( dataFrame = block_3sp, lib = "1 99", pred = "105 190",
           E = 3, columns = "x_t y_t", target = "x_t" )

Your second example has 3-D views for the same reason:

Multiview( dataFrame = block_3sp, lib = "1 99", pred = "105 190", 
           E = 3, columns = "x_t y_t z_t", target = "x_t" )

The third example, you set E = 3 with 4 columns:

Multiview( dataFrame = block_3sp, lib = "1 99", pred = "105 190", 
           E = 3, columns = "x_t y_t z_t y_t-1", target = "x_t" )

With D not specified, D = 4, but, it appears that E = 3 for the time-delay embeddings is not being honored... the bug you have identified, which has been addressed in version 1.7.5.

Your other question:
Why is the maximum lag set to E-1? I would like the possibility of constructing a model in which each view is e.g. of size 2 and the maximum lag is 3. That is, I would like to have the possibility of choosing the dimension of the views and the maximum lag separately. Is that possible?

Yes, that is (now) possible. An example:

> Multiview( dataFrame = block_3sp, lib = "1 99", pred = "105 190", D = 2, E = 4, columns = "x_t y_t z_t", target = "x_t", excludeTarget = FALSE ) $ View
Multiview() Set view sample size to 8
  col_1 col_2   name_1   name_2    rho    MAE   RMSE
1     1     2 x_t(t-0) x_t(t-1) 0.9338 0.2323 0.2991
2     1     9 x_t(t-0) z_t(t-0) 0.8866 0.2716 0.3748
3     1    11 x_t(t-0) z_t(t-2) 0.8946 0.2833 0.3638
4     1     4 x_t(t-0) x_t(t-3) 0.9137 0.2520 0.3272
5     1     8 x_t(t-0) y_t(t-3) 0.8671 0.3126 0.4042
6     1     3 x_t(t-0) x_t(t-2) 0.9241 0.2463 0.3137
7     1     7 x_t(t-0) y_t(t-2) 0.8692 0.3212 0.3991
8     1    10 x_t(t-0) z_t(t-1) 0.9082 0.2770 0.3444

Thank you again for your diligence and analysis of rEDM!

JP

duriah · 2021-01-14T07:21:06Z

Hi Joseph,
Thanks for the fast and thorough reply and that you already fixed it! This will really help me 👍

Best
Uriah

SoftwareLiteracy closed this as completed Jan 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiview() chooses the maximum lag based on the number of predictor columns used, regardless of E. #44

Multiview() chooses the maximum lag based on the number of predictor columns used, regardless of E. #44

duriah commented Jan 11, 2021 •

edited

Loading

SoftwareLiteracy commented Jan 13, 2021

duriah commented Jan 14, 2021

Multiview() chooses the maximum lag based on the number of predictor columns used, regardless of E. #44

Multiview() chooses the maximum lag based on the number of predictor columns used, regardless of E. #44

Comments

duriah commented Jan 11, 2021 • edited Loading

SoftwareLiteracy commented Jan 13, 2021

duriah commented Jan 14, 2021

duriah commented Jan 11, 2021 •

edited

Loading