Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange bug in PLS algorithm #88

Closed
svkucheryavski opened this issue Jun 9, 2020 · 3 comments
Closed

Strange bug in PLS algorithm #88

svkucheryavski opened this issue Jun 9, 2020 · 3 comments
Labels

Comments

@svkucheryavski
Copy link
Owner

svkucheryavski commented Jun 9, 2020

This code:

data(people)
set.seed(6)
people <- as.data.frame(people)
X <- people[, -4]
y <- people[,  4, drop = FALSE]
m <- pls(X, y, cv = 8)
m <- pls(X, y, cv = 8)
m <- pls(X, y, cv = 8)
m <- pls(X, y, cv = 8)
m <- pls(X, y, cv = 8)

Leads to the following error (only when last command is executed):

Error in solve.default(crossprod(object$xloadings, object$weights)) : 
  system is computationally singular: reciprocal condition number = 2.74318e-32

In predict.cal() inside cross-validation loop (when m.loc is used):

xscores <- x %*% (object$weights %*% solve(crossprod(object$xloadings, object$weights)))
@svkucheryavski
Copy link
Owner Author

Seems like it is caused by too small eigenvalues in SIMPLS algorithm when cross-validation is applied, so number of observations is smaller than for calibration set. Will fix by adding a check inside cross-validation loop and if number of components is too large it will raise and error and ask user to limit the number.

@klebyn
Copy link

klebyn commented Jun 10, 2020

Perhaps the kappa function can help you by measuring how poorly conditioned the matrix is.
?kappa

@svkucheryavski
Copy link
Owner Author

Thanks for suggestion. Actually the problem was not in the line I mentioned in the first comment, but a bit deeper, in the SIMPLS algorithm. When maximum number of components estimated or provided by user is too large (within the limits bounded by number of observations and variables but larger than the effective rank), then it causes, of course, computational issues in the algorithm. There is actually a corresponding check inside in the algorithm and if this happens the algorithm reduces the number of components and warn user about this.

But if random cross-validation is used and some variables are discrete, in one of the steps there can be a local calibration set where all values for one or several such variables are constant. This further reduces the effective rank and led to the above mentioned error. The situation is very unlikely and this is why all tests passed so far until I started experimenting with some things.

Will simply add another check inside cross-validation and ask user to limit number of components in this case.

svkucheryavski added a commit that referenced this issue Jun 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants