New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate random forest crash when feature types are set #4054
Comments
@karlnapf I would like to work on this |
@karlnapf I looked at the code through debugger and found that it stops here . The feature length is 1 but we are providing |
Thanks a ton for taking this up. The error message should be printed. Maybe it is Python? Did you manage to reproduce the same problem from c++? Approach:
|
Hi @karlnapf Also, if I change |
Yes, for c++, you want Now, leaving the error message printing aside, let's get down with this RF bug :) |
@karlnapf umm, isn't the size mismatch doing what it is supposed to do? feats_train = sg.RealFeatures(X.reshape(-1,1).T)
print feats_train.get_num_features(), feats_train.get_num_vectors()
feats_test = sg.RealFeatures(X_test.reshape(-1,1).T)
print feats_test.get_num_features(), feats_test.get_num_vectors()
labels_train = sg.RegressionLabels(Y)
print labels_train.get_num_labels()
labels_test = sg.RegressionLabels(Y_test)
print labels_test.get_num_labels()
We have only 1 feature so there should be only 1 feature type too. Or am I missing something here? |
You are absolutely right, I don't know why I got confused here. So is there any problem after all? Doesn't seem like. |
Yup, no problem in random forest then. But isn't it troublesome for a user to not get error messages when input goes wrong? I think we should resolve the error printing part too. Can we change the call to |
@vigsterkr @lisitsyn what's you opinion on that? |
Running a random forest regression without setting the feature type works fine (and produces correct results). The forest complains though: since the feature types are not set, it tells the user that it assumes continuous features:
To get rid of the warning, the user sets the feature types to continuous (false)
which causes Shogun to die without an error message:
I made a standalone notebook example here:
https://gist.github.com/karlnapf/47683ab0dd015a3e9c7ee60f05a2fec0
We want this to be fixed. In the likely case that this is a corner case of how the forest is set up, Shogun should at least give a proper error message and instruct the user what to do.
Good task for someone who wants to understand how the random forests in Shogun work. A bugfix is one of the most valuable contributions you can make. You will have to use a debugger to see what is going on inside Shogun's C++ code for this
This bug was reported recently in #4051
The text was updated successfully, but these errors were encountered: