Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiview() chooses the maximum lag based on the number of predictor columns used, regardless of E. #44

Closed
duriah opened this issue Jan 11, 2021 · 2 comments

Comments

@duriah
Copy link

duriah commented Jan 11, 2021

On https://cran.r-project.org/web/packages/rEDM/vignettes/rEDM-tutorial.pdf in the description of function Multiview there is written: Multiview() operates by constructing all possible embeddings of dimension E with lag up to E-1.

I have noticed that the function behaves oddly in that regard, namely it chooses the maximum lag to be equal to the number of predictor columns used, regardless of other parameter values. I'm showing this with some examples:

example 1: 2 predictor columns and E=3

library(rEDM)
data(block_3sp)

L = Multiview( dataFrame = block_3sp, lib = "1 99", pred = "105 190", 
               E = 3, columns = "x_t y_t", target = "x_t", multiview = 100)
head(L$View)
  col_1 col_2   name_1   name_2      rho    MAE   RMSE
1     1     2 x_t(t-0) x_t(t-1)  0.92240 0.2548 0.3259
2     1     3 x_t(t-0) y_t(t-0)  0.86910 0.3214 0.4187
3     1     4 x_t(t-0) y_t(t-1)  0.90120 0.2883 0.3663
4     2     3 x_t(t-1) y_t(t-0)  0.73710 0.4535 0.5764
5     2     4 x_t(t-1) y_t(t-1)  0.65050 0.5242 0.6892
6     3     4 y_t(t-0) y_t(t-1) -0.01939 0.8524 1.0530

As can be seen, the lag is indeed E-1=2, but the the dimensions of the single views are 2 and not 3.

example 2: 3 predictor columns and E=3

L = Multiview( dataFrame = block_3sp, lib = "1 99", pred = "105 190", 
               E = 3, columns = "x_t y_t z_t", target = "x_t", multiview = 100)
head(L$View)
  col_1 col_2 col_3   name_1   name_2   name_3    rho    MAE   RMSE
1     1     2     7 x_t(t-0) x_t(t-1) z_t(t-0) 0.9208 0.2485 0.3164
2     1     2     6 x_t(t-0) x_t(t-1) y_t(t-2) 0.8677 0.3294 0.4113
3     1     2     3 x_t(t-0) x_t(t-1) x_t(t-2) 0.9319 0.2277 0.2934
4     1     2     8 x_t(t-0) x_t(t-1) z_t(t-1) 0.9183 0.2476 0.3205
5     1     7     9 x_t(t-0) z_t(t-0) z_t(t-2) 0.8858 0.3031 0.3738
6     1     4     9 x_t(t-0) y_t(t-0) z_t(t-2) 0.7774 0.4191 0.5116

In this case the dimensions of each view is indeed 3, but the max lag is not E-1=2 but it is 3.

example 3: 4 predictor columns and E=3

L = Multiview( dataFrame = block_3sp, lib = "1 99", pred = "105 190", 
               E = 3, columns = "x_t y_t z_t y_t-1", target = "x_t", multiview = 100)
head(L$View)
  col_1 col_2 col_3 col_4   name_1   name_2   name_3     name_4    rho    MAE   RMSE
1     1     2     7    14 x_t(t-0) x_t(t-1) y_t(t-2) y_t-1(t-1) 0.8602 0.3372 0.4208
2     1     2     4     9 x_t(t-0) x_t(t-1) x_t(t-3)   z_t(t-0) 0.8766 0.2989 0.3881
3     1     2     9    11 x_t(t-0) x_t(t-1) z_t(t-0)   z_t(t-2) 0.8781 0.2955 0.3858
4     1     2     8    15 x_t(t-0) x_t(t-1) y_t(t-3) y_t-1(t-2) 0.8546 0.3188 0.4201
5     1     2     9    16 x_t(t-0) x_t(t-1) z_t(t-0) y_t-1(t-3) 0.8849 0.2992 0.3751
6     1     2     3    16 x_t(t-0) x_t(t-1) x_t(t-2) y_t-1(t-3) 0.8639 0.3134 0.4062

In this case neither the embedding dimension nor the maximum makes sense: each view is of dimension 4 and the maximum lag is also 4.

The pattern is quite obvious: Embedding dimension = maximum lag = number of columns.

I don't think this is intended, as following the description it should be: maximum lag = Embedding dimension-1; and both the maximum lag and E are independent from the number of predictor columns (to an extent).

I am also confused regarding the 2 arguments E (embedding dimension) and D (multivariate dimension). What exactly is the difference? D seems to overwrite E, meaning that if D ist set, the value of E has no influence and the maximum lag is still chosen based on the number of predictor columns. Example:

L = Multiview( dataFrame = block_3sp, lib = "1 99", pred = "105 190", D=3,
               E = 3, columns = "x_t y_t", target = "x_t", multiview = 4)
head(L$View)
  col_1 col_2 col_3   name_1   name_2   name_3    rho    MAE   RMSE
1     1     2     4 x_t(t-0) x_t(t-1) y_t(t-1) 0.8977 0.2913 0.3673
2     1     2     3 x_t(t-0) x_t(t-1) y_t(t-0) 0.8738 0.3037 0.4016
3     1     3     4 x_t(t-0) y_t(t-0) y_t(t-1) 0.7524 0.4366 0.5402
4     2     3     4 x_t(t-1) y_t(t-0) y_t(t-1) 0.4988 0.6344 0.7492

Max lag is 2 (= number of predictors), and 3 variables are used for each view (= D)

L = Multiview( dataFrame = block_3sp, lib = "1 99", pred = "105 190", D=3,
               E = 4, columns = "x_t y_t", target = "x_t", multiview = 4)
head(L$View)
  col_1 col_2 col_3   name_1   name_2   name_3    rho    MAE   RMSE
1     1     2     4 x_t(t-0) x_t(t-1) y_t(t-1) 0.8977 0.2913 0.3673
2     1     2     3 x_t(t-0) x_t(t-1) y_t(t-0) 0.8738 0.3037 0.4016
3     1     3     4 x_t(t-0) y_t(t-0) y_t(t-1) 0.7524 0.4366 0.5402
4     2     3     4 x_t(t-1) y_t(t-0) y_t(t-1) 0.4988 0.6344 0.7492

Same as above, changing E did nothing.

Also, why is the maximum lag set to E-1? I would like the possibility of constructing a model in which each view is e.g. of size 2 and the maximum lag is 3. That is, I would like to have the possibility of choosing the dimension of the views and the maximum lag separately. Is that possible?

Thank you and best regards,
Uriah

sessionInfo(package = "rEDM")

R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_Switzerland.1252  LC_CTYPE=English_Switzerland.1252    LC_MONETARY=English_Switzerland.1252
[4] LC_NUMERIC=C                         LC_TIME=English_Switzerland.1252    

attached base packages:
character(0)

other attached packages:
[1] rEDM_1.7.3

loaded via a namespace (and not attached):
 [1] compiler_4.0.3  graphics_4.0.3  htmltools_0.5.0 tools_4.0.3     utils_4.0.3     yaml_2.2.1      grDevices_4.0.3
 [8] Rcpp_1.0.5      stats_4.0.3     datasets_4.0.3  rmarkdown_2.6   knitr_1.30      methods_4.0.3   xfun_0.20      
[15] digest_0.6.27   rlang_0.4.10    base_4.0.3      evaluate_0.14  
@SoftwareLiteracy
Copy link
Contributor

Dear Uriah,

Thank you for your use and analysis of rEDM.

Having looked at your examples, it appears you have identified a bug where the value of E is not properly handled. Thank you!

I have found and corrected the problem. The fix is uploaded to github as version 1.7.5.

Regarding parameters E and D:

E defines the dimension of the time-delay embedding for the state-space of each variable. Since the first dimension is the variable itself, there are E-1 additional dimensions added by successive time delays.

D defines the dimension of the multiviews. If not set, it defaults to the number of columns. For example, if D = 2, columns = "x_t y_t z_t", then the constructed views have D = 2 variables, selected from the 3*E available.

Also, please note that there is no guarantee that all lags will be present in the selected views. That is, with E = 3, which embeds each columns vector into a 3-dimensional state-space (the addition of 2 lags) does not mean that one will see X(t-2) in the top views.

Your first example has 2-dimensional views, since the number of columns is 2 and D is not specified:

Multiview( dataFrame = block_3sp, lib = "1 99", pred = "105 190",
           E = 3, columns = "x_t y_t", target = "x_t" )

Your second example has 3-D views for the same reason:

Multiview( dataFrame = block_3sp, lib = "1 99", pred = "105 190", 
           E = 3, columns = "x_t y_t z_t", target = "x_t" )

The third example, you set E = 3 with 4 columns:

Multiview( dataFrame = block_3sp, lib = "1 99", pred = "105 190", 
           E = 3, columns = "x_t y_t z_t y_t-1", target = "x_t" )

With D not specified, D = 4, but, it appears that E = 3 for the time-delay embeddings is not being honored... the bug you have identified, which has been addressed in version 1.7.5.

Your other question:
Why is the maximum lag set to E-1? I would like the possibility of constructing a model in which each view is e.g. of size 2 and the maximum lag is 3. That is, I would like to have the possibility of choosing the dimension of the views and the maximum lag separately. Is that possible?

Yes, that is (now) possible. An example:

> Multiview( dataFrame = block_3sp, lib = "1 99", pred = "105 190", D = 2, E = 4, columns = "x_t y_t z_t", target = "x_t", excludeTarget = FALSE ) $ View
Multiview() Set view sample size to 8
  col_1 col_2   name_1   name_2    rho    MAE   RMSE
1     1     2 x_t(t-0) x_t(t-1) 0.9338 0.2323 0.2991
2     1     9 x_t(t-0) z_t(t-0) 0.8866 0.2716 0.3748
3     1    11 x_t(t-0) z_t(t-2) 0.8946 0.2833 0.3638
4     1     4 x_t(t-0) x_t(t-3) 0.9137 0.2520 0.3272
5     1     8 x_t(t-0) y_t(t-3) 0.8671 0.3126 0.4042
6     1     3 x_t(t-0) x_t(t-2) 0.9241 0.2463 0.3137
7     1     7 x_t(t-0) y_t(t-2) 0.8692 0.3212 0.3991
8     1    10 x_t(t-0) z_t(t-1) 0.9082 0.2770 0.3444

Thank you again for your diligence and analysis of rEDM!

JP

@duriah
Copy link
Author

duriah commented Jan 14, 2021

Hi Joseph,
Thanks for the fast and thorough reply and that you already fixed it! This will really help me 👍

Best
Uriah

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants