Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random Error in Function Bsptime #8

Closed
Hallo951 opened this issue Jun 27, 2022 · 12 comments
Closed

Random Error in Function Bsptime #8

Hallo951 opened this issue Jun 27, 2022 · 12 comments

Comments

@Hallo951
Copy link

Hallo951 commented Jun 27, 2022

I looked at it all again and it looks like the total time is not the problem after all (Issue). What the actual problem is, I don't know. However, I have observed that if you run the "Bsptime" function with the same data and settings in a loop, that it then sporadically produces errors.

Here is a minimal example with my data where the error occurs from time to time:

My data: data.csv

data <- read.csv("S:/data.csv")  # Customize path

N <- 2000
nBurn = 1000
f <- as.formula(y_spec ~ x_precipitation_HYRAS_std + x_DGM1 + x_air_temperature_mean_std * x_radiation_global_mean + x_air_temperature_mean_std * x_Auenlehmmaechtigkeit)
plotit <- F
mchoice <- F
data <- data
vrows <- c(13,14,15,16,17,18,43,44,45,46,47,48,61,62,63,64,65,66,79,80,81,82,83,84,91,92,93,94,95,96,127,128,129,130,131,132,133,134,135,136,137,138,157,158,159,160,161,162,181,182,183,184,185,186,193,194,195,196,197,198,217,218,219,220,221,222,253,254,255,256,257,258,289,290,291,292,293,294,295,296,297,298,299,300,319,320,321,322,323,324,337,338,339,340,341,342,349,350,351,352,353,354,367,368,369,370,371,372,379,380,381,382,383,384,409,410,411,412,413,414,421,422,423,424,425,426,457,458,459,460,461,462,517,518,519,520,521,522,535,536,537,538,539,540,541,542,543,544,545,546,565,566,567,568,569,570,571,572,573,574,575,576,589,590,591,592,593,594,601,602,603,604,605,606,613,614,615,616,617,618,655,656,657,658,659,660,667,668,669,670,671,672,709,710,711,712,713,714,721,722,723,724,725,726,727,728,729,730,731,732,823,824,825,826,827,828,853,854,855,856,857,858,865,866,867,868,869,870,901,902,903,904,905,906,907,908,909,910,911,912,919,920,921,922,923,924,955,956,957,958,959,960,997,998,999,1000,1001,1002,1009,1010,1011,1012,1013,1014,1033,1034,1035,1036,1037,1038,1087,1088,1089,1090,1091,1092,1105,1106,1107,1108,1109,1110)
coords <- as.matrix(data.frame(x = c(314456.1,305362.6,308393.7,311424.9),y = c(5692110,5695567,5695567,5695567)))
n.report <- 1
package <- "spTimer"
model <- "GPP"
coordtype <- "plain"
coords_column <- which(colnames(data) %in% c("coord_x","coord_y"))
scale.transform <- "NONE"
                        
                        
foreach(a = 1:10) %do% {
    Bsptime(package = package, model = model, knots.coords = coords, formula = f, data = data, n.report = n.report, coordtype = coordtype, coords = coords_column, N = N, burn.in = nBurn, mchoice = mchoice, scale.transform = scale.transform, validrows = vrows, plotit = plotit)
}

Note: The error occurs randomly. If you do not get the error immediately just increase the number of loop passes.

Here is a screenshot of a minimal example with 3 passes with the error:

image

You can see that the model was executed 2 times without error and only on the 3rd time the model produces error. All settings and data are the same!

Here a screenshot of the same function being executed twice (same data and settings) once without error and then with error outside a loop:

image

What is very strange is the fact that sometimes the model works without error and then again it doesn't....or is it my data?

Maybe the functions must not be executed so quickly one after the other, because something is not yet processed in the background. I do not know....

Please take a look at this, otherwise I can't use your wonderful package because it doesn't work reliably.

My system:
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server x64 (build 17763)

Version bmstdr: 0.2.2

@Hallo951
Copy link
Author

I have tested this with the pause inside the foreach loop but unfortunately this has no positive effect on the random error. So the problem must be somewhere else...

@Hallo951
Copy link
Author

So, I could now trace the error back to the function "Bsptime/BspTimer_sptime/spTimer::spT.Gibbs". It seems that the used package 'spTime' causes the error...

@Hallo951
Copy link
Author

I extracted the problem function that causes the random error from the 'Bsptime/BspTimer_sptime' function and set it up to test with the same settings and data I used in my first post. I think this should help clarify the error.

library(spTimer)

data <- read.csv("S:/data.csv")  # Customize path

results <- spTimer::spT.Gibbs(formula = as.formula("y_spec ~ x_precipitation_HYRAS_std + x_DGM1 + x_air_temperature_mean_std + 
    x_radiation_global_mean + x_Auenlehmmaechtigkeit + x_air_temperature_mean_std:x_radiation_global_mean + 
    x_air_temperature_mean_std:x_Auenlehmmaechtigkeit"), 
                              data = data, 
                              model = "GPP", 
                              coords = as.matrix(unique(data[, which(colnames(data) %in% c("coord_x","coord_y"))])), 
                              nItr = 2000, 
                              nBurn = 1000,
                              distance.method = "euclidean", 
                              priors = spTimer::spT.priors(model = "GPP", inv.var.prior = Gamm(2,1), beta.prior = Norm(0, 1e-04^(-1))),
                              spatial.decay = spTimer::spT.decay(distribution = Gamm(2,1), 0.03), 
                              scale.transform = "NONE",
                              knots.coords = as.matrix(data.frame(x = c(314456.1,305362.6,308393.7,311424.9),y = c(5692110,5695567,5695567,5695567))), 
                              cov.fnc = "exponential", 
                              tol.dist = 0.005, 
                              time.data = NULL, 
                              newcoords = NULL,
                              newdata = NULL, 
                              truncation.para = list(at = 0, lambda = 2),
                              annual.aggrn = "NONE", 
                              report = 1)

I think the current behavior of the package 'spTimer' occurs on all models not just the GPP model. But I didn't test that, because currently with my data the GPP-model gets the best results.

Here is a short summary of what I want to use your package for.

I am trying to create habitat modeling for different plants. Using the uploaded example dataset for the plant 'Alium usinum' you can see what kind of data I have available. The dependent variable always has the range 0 <= y <= 1 (or even 0% <= y <= 100% cover). The model I am currently using is the GPP-model, as this has so far given the best results with a reasonable computation time. The goal is to find a regression model with which I can make spatial and/or temporal predictions.

I am not sure if the truncatedGPP-model would be better for my kind of data. I did some tests with it but they were always worse than the GPP-model. Also, in the current implementation of the truncatedGPP-model you can only define one boundary and not two. For these reasons I decided to use the GPP-model for now. What do you advise me to do?

My further procedure in reference to your book and other habitat models I have already created for the identification of the 'best' regression model is currently as follows:

  1. create different regression formulas
  2. create different node configurations based on a regular grid within the study area
  3. create a k-fold cross validation dataset for validation
  4. create different models using the different regression formulas in conjunction with the different node configurations and the cross validation dataset
  5. calculate the mean of all validation statistics for all cross validation models per regression formula and node configuration
  6. extract the best model according to the smallest mean MAE and/or mean RMSE (would the CRPS statistic possibly be better for selection here?).
  7. use the extracted best model for spatial/temporal prediction.

For the implementation I use several loops in which I create a variety of models using their package within a tryCatch - environment, then compare them and select the best one afterwards. And exactly at this point, despite the used tryCatch - environment, I always get a random error, so that the outermost loop ends with an error and then the whole script terminates.

@Hallo951
Copy link
Author

Hallo951 commented Jun 30, 2022

I think I am beginning to understand the problem. I think this is a classic convergence problem of the regression function which is not properly intercepted/handled. I mean that the respective regression model does not find a solution for the corresponding data. I don't know what the correct technical term for this is in the field of Bayesian statistics. I am not well versed in this branch of statistics.

My thesis is based on the example and the data set used there from the first post here. I scale (normalization of a dataset using the mean value and standard deviation) independent variables in the dataset for testing and then ran the example from first Post in a loop 100000 times. Result, not a single error!

After that I extended the loop with a second loop for different model formulas. Unfortunately this test was not successful. Means that one or more models inside the loop do not converge properly (does not find a solution), thus causing an error and thus causing the whole loop to terminate.

If my thesis is true, a quick solution would be to write a work-around which intercepts the non-converging regression model from the package "spTimer" (I think also the other packages) and outputs an appropriate message and as result NA and thus prevents the current error.

This work-around could be extended to the effect that a non-converging regression model is tried to be executed x times and only then a message and as result NA is output about a non-converging model.

@Hallo951
Copy link
Author

Hallo951 commented Jul 1, 2022

I have made a fork of your repository and implemented the described work-around there. I am currently testing whether it works like this. But I won't have any results until Monday....if it should work like this, you can take a look at my commits and see if you want to adopt this work-around for the official repository...

@Hallo951
Copy link
Author

Hallo951 commented Jul 5, 2022

Hello,

A very general question:

Does it make sense to transform (more normalise) the input data for the Bavarian regression in advance using Tukey's Ladder of Powers transformation or is this not necessary for a Bavarian regression?

My tests have shown that a standardization (scale() in R) has a positive influence on the model building, but what about a transformation?

@sujit-sahu
Copy link
Owner

Sorry, I have not been able to replicate the random error you reported in the foreach loop. Again I ran the code on both my Windows and Linux machines. Please can you try to run this example on a different machine and tell me if this is still a problem.

It will take me time to implement the power transformation. I will talk to you over email before adding this feature.

Please let me knoiw if it is okay to close this issue.

@Hallo951
Copy link
Author

Hallo951 commented Jul 8, 2022

I have tested the example from the first post on different computers and it appears every now and then when spTimer does not find a solution and outputs different NA in the model. This then causes various subsequent errors which I catch in the fork.

Please have a look at the change to solve the problem in the fork. It works very well and should be included in the maincode if possible, unless they have a better solution.

@sujit-sahu
Copy link
Owner

Sorry, I am not able to see your fork or solution. I did not see your pull request as well. Please can you post it again. I will have a look.

@Hallo951
Copy link
Author

Hallo951 commented Jul 8, 2022

Oh, my bad. I thought you can see the fork on github. I make a pull request on...

@Hallo951
Copy link
Author

Hallo951 commented Jul 8, 2022

Please take another look at the closed thread about the error in the prediction. I have written something else about it....

@sujit-sahu
Copy link
Owner

I could not see that thread. Please feel free to open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants