-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
version 1.15 dealing with NaN #62
Comments
Perhaps a little background for context could be helpful. In Simplex, and it's derivatives EmbedDimension, CCM... nan in the data are passed through the entire | embedding : knn : projection | pipeline, and as such, any nan in the data are automatically rendered in the library, excluded in prediction, and properly represented in the output. SMap embeds the data, then creates a linear system matrix solved with a LAPACK/BLAS SVD. LAPACK does not allow nan. In versions 1.14 and earlier, time series rows that contained nan were removed prior the SVD. This effectively prevented any library vectors with nan, but also created gaps in the output and raises the question of whether Takens embedding remains theoretically valid. S-map ingoreNan in version 1.15 is new, adjusting the library to ignore all embedding vectors with nan. This should properly represent the output with nan as appropriate, rather than the previous method that returned gaps in the output. So the answer to the first question is no, version 1.15 SMap does not handle nan in the same way as versions 1.14 and earlier. Answer to the second question is yes, Simplex based functions ignore nan. However, not by redefining the library. Since the numerical computations are all internal and nan are carried though | embedding : knn : projection | any projection influenced by a nan will return nan. On a related note, one can also consider handling missing data with "bundle embedding" |
Thank you so much for the clarification of how different versions do in Simplex and in S-map. |
It is a bit complex since E, Tp, tau all influence the availability of valid embedding vectors in response to a NaN. Simply, when a NaN is present no prediction should be made with a library vector that has a NaN neighbor (a function of E, tau) or where Tp would include a vector with a NaN component. Recall that projections are made by taking neighbors projected Tp time steps ahead (behind) in Simplex, while all neighbors are used in SMap. Perhaps some examples can illustrate. Insert a Nan into observation x[10] library( rEDM )
df = circle
dim( df )
[1] 200 3
head( df, 2 )
Time x y
1 1 0.0000 1.000
2 2 0.0631 0.998
df $ x[10] = NaN
df[ 8:12, ]
Time x y
8 8 0.4278 0.9039
9 9 0.4840 0.8751
10 10 NaN 0.8428
11 11 0.5903 0.8072
12 12 0.6401 0.7683 SimplexSimplex prediction with E=2, Tp=1 and library including NaN observation. Note Time 11 & 12 do not have a prediction, since Tp = 1, E = 2. The prediction at Time 9 is likely from a neighbor that included a component of the NaN in it's embedding vector. > Simplex( dataFrame = df, lib = '1 50', pred = '5 15',
columns = 'x', target = 'x', E = 2, Tp = 1 )
Time Observations Predictions Pred_Variance
1 5 0.2499 NaN NaN
2 6 0.3105 0.2451 0.011215
3 7 0.3699 0.3056 0.010833
4 8 0.4278 0.3648 0.010375
5 9 0.4840 NaN NaN
6 10 NaN 0.4183 0.002957
7 11 0.5903 NaN NaN
8 12 0.6401 NaN NaN
9 13 0.6873 0.6162 0.008243
10 14 0.7318 0.6816 0.006449
11 15 0.7733 0.7260 0.005701
12 16 0.8118 0.7676 0.004952 In the case of Tp = -1, we expect Time 9 & 10 to not have a prediction with E = 2: > Simplex( dataFrame = df, lib = '1 50', pred = '5 15',
columns = 'x', target = 'x', E = 2, Tp = -1 )
Time Observations Predictions Pred_Variance
1 4 0.1883 0.2034 0.0030564
2 5 0.2499 0.2648 0.0029724
3 6 0.3105 0.3251 0.0028645
4 7 0.3699 0.3841 0.0027358
5 8 0.4278 0.4500 0.0040664
6 9 0.4840 NaN NaN
7 10 NaN NaN NaN
8 11 0.5903 0.6460 0.0003792
9 12 0.6401 0.6518 0.0018654
10 13 0.6873 0.6983 0.0016638
11 14 0.7318 0.7420 0.0014626
12 15 0.7733 NaN NaN Perhaps this is clearer in the case where the observation (target) does not have a NaN, but the library still does, here we use Simplex( dataFrame = df, lib = '1 50', pred = '5 15',
columns = 'x', target = 'y', E = 2, Tp = 1 )
Time Observations Predictions Pred_Variance
1 5 0.9683 NaN NaN
2 6 0.9506 0.3156 0.8291
3 7 0.9291 0.3042 0.8033
4 8 0.9039 0.2916 0.7716
5 9 0.8751 0.2779 0.7345
6 10 0.8428 -0.2784 0.7446
7 11 0.8072 NaN NaN
8 12 0.7683 NaN NaN
9 13 0.7264 -0.2757 0.5360
10 14 0.6815 0.1936 0.4916
11 15 0.6340 0.1740 0.4369
12 16 0.5839 0.1539 0.3822 SMapSMap is a bit different since all library vectors are processed (but localized with theta), and the SVD solver does not allow NaN. The cross mapping example with SMap ( > SMap( dataFrame = df, lib = '1 50', pred = '5 15',
columns = 'x', target = 'y', theta = 2, E = 2, Tp = 1 ) [['predictions']]
Time Observations Predictions Pred_Variance
1 5 0.9683 NaN NaN
2 6 0.9506 0.9506 1.9172
3 7 0.9291 0.9289 1.9033
4 8 0.9039 0.9044 1.8924
5 9 0.8751 0.8750 1.8894
6 10 0.8428 0.8428 1.7217
7 11 0.8072 NaN NaN
8 12 0.7683 NaN NaN
9 13 0.7264 0.7270 1.3985
10 14 0.6815 0.6811 1.1857
11 15 0.6340 0.6346 1.0133
12 16 0.5839 0.5829 0.8618 Prior to version 1.15 and Create a validLib vector. Recall df $ x[10] is nan, so the initial validLib has > validLib = !is.nan(df $ x)
> validLib[11] = FALSE
> validLib[5:15]
[1] 1 1 1 1 1 0 0 1 1 1 1 Now using > SMap( dataFrame = df, lib = '1 50', pred = '5 15',
columns = 'x', target = 'y', theta = 2, E = 2, Tp = 1,
validLib = validLib, ignoreNan = FALSE ) [['predictions']]
Time Observations Predictions Pred_Variance
1 5 0.9683 NaN NaN
2 6 0.9506 0.9506 1.7717
3 7 0.9291 0.9289 1.7709
4 8 0.9039 0.9044 1.7573
5 9 0.8751 0.8750 1.7315
6 10 0.8428 0.8428 1.7055
7 11 0.8072 NaN NaN
8 12 0.7683 NaN NaN
9 13 0.7264 0.7270 1.3448
10 14 0.6815 0.6811 1.1575
11 15 0.6340 0.6346 0.9989
12 16 0.5839 0.5829 0.8551 Whereas if one uses SMap( dataFrame = df, lib = '1 50', pred = '5 15',
columns = 'x', target = 'y', theta = 2, E = 2, Tp = 1,
ignoreNan = FALSE ) [['predictions']]
Time Observations Predictions Pred_Variance
1 5 0.9683 NaN NaN
2 6 0.9506 NaN NaN
3 7 0.9291 NaN NaN
4 8 0.9039 NaN NaN
5 9 0.8751 NaN NaN
6 10 0.8428 NaN NaN
7 11 0.8072 NaN NaN
8 12 0.7683 NaN NaN
9 13 0.7264 NaN NaN
10 14 0.6815 NaN NaN
11 15 0.6340 NaN NaN
12 16 0.5839 NaN NaN For peek under-the-hood, the code that actually creates the library vector is here: While the SMap code to adjust Perhaps the SMap issued warning "Time delay embedding presumption violated." is a bit extreme, as it is not absolute whether-or-not the embedding violates Takens presumption for a specific prediction. |
Thank you so much, especially the examples! |
Hello!
I have a few questions related to the missing data or NaN in the EDM functions like Simplex, CCM and S-map.
I understand that for Takens theorem to work, the continuity of the data is important for reconstructing the shadow manifold.
But unfortunately, my data have some gaps/missing data points between trials. Therefore, it will be helpful to know how could I avoid this problem in rEDM.
Questions:
(1)The note from rEDM version 1.15 mentioned that:
"SMap() ignoreNan parameter added. If ignoreNan is TRUE (default) the library is redefined to ignore embedding vectors with nan.
If ignoreNan is FALSE no change is made, the user can manually specify library segments in lib."
I also found a code note from rEDM version 1.2.3 mentioned:
"Missing data can be recorded using either of the standard
NA
orNaN
values. The program will automatically ignore such missing values when appropriate. For instance, simplex projection will not select nearest neighbors if any of the state vector coordinates is missing or if the corresponding target value is missing."I am wondering is the S-map ignoreNan function from version 1.15 is doing the same way as the version 1.2.3 did? Just not selecting the nearest neighbors if any of the state vector coordinates is missing or if the corresponding target value is missing?
(2) Does rEDM version 1.15 also ignore NaN (like the version 1.2.3) for Simplex, EmbedDimension and CCM?
The text was updated successfully, but these errors were encountered: