Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rborist Error from doTryCatch() #55

Closed
suiji opened this issue Apr 15, 2022 · 9 comments
Closed

Rborist Error from doTryCatch() #55

suiji opened this issue Apr 15, 2022 · 9 comments

Comments

@suiji
Copy link
Owner

suiji commented Apr 15, 2022

GitHub reports steady search activity for this and similar trapped-error messages. None of the tests we have on hand report premature exit. If someone has a reproducible test case, however, please help out by responding to this Issue or opening a new bug report.

Thank you.

@rociogonzalezfdez85
Copy link

rociogonzalezfdez85 commented Nov 8, 2022

I have the same problem. Write me an email and I send you an example

@rociogonzalezfdez85
Copy link

rociogonzalezfdez85 commented Nov 9, 2022 via email

@suiji
Copy link
Owner Author

suiji commented Nov 9, 2022

Will be happy to run your test code. Please feel free to send it when you are ready.

In the meantime, though, the error message you encountered is complaining about a mismatch between the data frames employed for training and prediction. The package's "deframer" phase repacks data frames into distinct blocks of values having the same data type (numeric or factor, for example). Right now, the deframer expects the predictors to appear in the same order, and have the same data type, in both frames. We are loosening this requirement by means of a "keyed" option, which will match predictors in the two frames in arbitrary order by keying off their names. This option did not make it into 0.3-2, which had to be posted on CRAN under deadline. We do intend to support "keyed" in the next release. Could this be the source of your problem?

Regards,
The maintainers.

@suiji
Copy link
Owner Author

suiji commented Nov 11, 2022

No example has been received so far, but we're ready to help when it arrives.

Please note that setting autocompress to 1.0 was a solution to a problem appearing version 0.2-4 and should no longer be relevant. Setting thinLeaves is strictly for reducing memory footprint, so is also unlikely to apply.

@suiji suiji closed this as completed Nov 11, 2022
@suiji suiji reopened this Nov 11, 2022
@rociogonzalezfdez85
Copy link

rociogonzalezfdez85 commented Nov 15, 2022 via email

@suiji
Copy link
Owner Author

suiji commented Nov 17, 2022

Thank you. Your example reproduces the behavior you describe.

The error message is complaining that the training and prediction data frames do not match and, in this example, they do not. The training frame contains two predictor columns, while the prediction frame contains three. The example can easily be made to work by filtering out the third column ("Y") for prediction.

Traditionally, the package has offered only a positional scheme for reconciling data frames between, say, training and prediction. That is, the columns in the training frame were assumed to match those in the prediction frame. Some checking was performed to ensure, at the very least, that data types agree at their respective positions across the two frames. With release 0.3 we are also checking that the two frames have the same number of predictors. If your example does not fail with earlier releases it is likely because we were not performing the additional check.

In addition to the positional scheme we are planning to introduce a "keyed" (or maybe "keyedFrame") option which will allow the column positions to vary between the two frames. In particular, there would be no problem with the training frame having fewer columns than the prediction frame, so long as the latter includes all columns present in the training frame - and that the respective types agree.

@rociogonzalezfdez85
Copy link

rociogonzalezfdez85 commented Nov 19, 2022 via email

@suiji
Copy link
Owner Author

suiji commented Nov 19, 2022

The easiest way to filter out a column is probably just to place a minus sign in front of it. Rborist's predict() method computes MSE as a side-effect, moreover, when passed with a test vector. So you can probably save some work by applying the following codelet, which omits column 3 from the new data but passes it as a test vector:

yPrime <- predict(fitMulti, test[,-3], test[,3])
mse <- yPrime$mse

@suiji
Copy link
Owner Author

suiji commented Dec 2, 2022

Closing this thread. Please feel free to reopen or begin a new thread.

@suiji suiji closed this as completed Dec 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants