-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] CascadeForestRegressor
somehow cannot be inserted into a DataFrame
#87
Comments
Hi @IncubatorShokuhou, I would like to ask that what is the purpose of storing the model in a pandas dataframe? |
@xuyxu Actually I am trying to integrate deep-forest into PyCaret. In theory, PyCaret supports all ml algorithms with scikit-learn-Compatible API. In practice, most models, including xgboost, lightgbm, catboost, ngboost, explainable boosting matching et al. can be easily integrated. Here is the example code: from pycaret.datasets import get_data
boston = get_data('boston')
from pycaret.regression import *
from deepforest import CascadeForestRegressor
from ngboost import NGBRegressor
# setup, data preprocessing
exp_name = setup(data = boston, target = 'medv',silent = True)
# establish regressors
ngr = NGBRegressor()
ngboost = create_model(ngr)
cfr = CascadeForestRegressor()
casforest = create_model(cfr)
# compare models
best_model = compare_models(include=[ngboost,casforest,"xgboost","lightgbm"])
# save model
save_model(best_model , 'best_model ') During the integration, I met 2 errors: 1. the Deep-Forest only accepts np.array, and cannot input pd.DataFrame, which could be easily fixed by #86 . 2. In line 2219 of https://github.com/pycaret/pycaret/blob/c76f4b7699474bd16a2e2a6d0f52759ae29898b6/pycaret/internal/tabular.py#L2219 , the model object is put into a pd.DataFrame, and the bug described above happened, which is quite weird for me. I guess there might be something wrong with the initialization. Wish you could give me some suggestions. |
Thanks for your kind explanations! I will take a look at your PR first ;-) |
BTW, could you please telling me why a local implementation of |
We prefer to treat lightgbm as a soft dependency. If we use |
I see. So maybe I can write a simple GPU version for the three models using |
The performance would be much worse since Random Forest in cuML is not designed for the case where we want the forest to be as complex as possible (it does not support unlimited tree depth). |
OK, I see. |
@xuyxu I think I have figure out the reason of this error. result = np.empty(0, dtype="object")
result[:] = CascadeForestRegressor() and when trying to put Actually, the error can be more significantly reproduced in another way: # basic example
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from deepforest import CascadeForestClassifier
X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)
model = CascadeForestClassifier(random_state=1)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
acc = accuracy_score(y_test, y_pred) * 100
print("\nTesting Accuracy: {:.3f} %".format(acc))
# now the model have 2 layers. Iterate it.
for i,j in enumerate(model):
print("i = ")
print(i)
print("j = ")
print(j)
print("ok") and here is the error:
Then I noticed that https://docs.python.org/zh-cn/3/reference/datamodel.html#object.__setitem__ introduces:
That's it! |
I am going to create a PR and fix this error ASAP. |
Describe the bug
CascadeForestRegressor
somehow cannot be inserted into a DataFrameTo Reproduce
Expected behavior
No error
Additional context
This bug can be simpliy fixed if we change
if not 0 <= layer_idx < self.n_layers_:
toif not 0 <= layer_idx <= self.n_layers_:
, but I still don't know the cause of this error and whether this fix is corret.The text was updated successfully, but these errors were encountered: