-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
validate() for rms::ols: Error in lsfit(x, y) : only 0 cases, but 2 variables #52
Comments
Thanks for the report. There was a bug for |
Updating to the Github version,
In the
My guess is the same as before: one has to use special sampling to avoid the issue. As someone on Cross Validated suggested:
|
The behavior you saw is the intended behavior when the sample size does not support a large number of parameters. You'll need to reduce the number of parameters in the model. |
How do you recommend that I validate models that contain a large number of logical predictors without running into this issue? |
You have too many parameters in the model. |
In addition: Warning messages: |
I get a strange sounding error when trying to use
validate()
on a fittedols
:Error in lsfit(x, y) : only 0 cases, but 2 variables
The dataset has n=1890 with about 400 predictors in the model. Almost all the predictors are dichotomous dummies indicating whether some regex pattern matched a name or not. Some of these only have a few true cases (but at least 10). This is a preliminary fit before I am doing some penalization to improve the model fit and final predictors (done with LASSO in glmnet). However, I wanted to validate the validity of the initial model. My guess is that the error occurs due to the resampling ending up with no cases for a given variable in the training set, which causes it to fail to fit / not be able to use that variable in the prediction in the test set.
For a reproducible example, here's a similar dataset based on iris:
Gives:
The dataset has no missing data.
In my own simple cross-validation implementation discussed here, I got around this issue by simply ignoring runs that produce errors. See this question: https://stats.stackexchange.com/questions/213837/k-fold-cross-validation-nominal-predictor-level-appears-in-the-test-data-but-no Maybe this too should be done for rms?
The text was updated successfully, but these errors were encountered: