New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash with ValueError when ensemble=True #130
Comments
Could you check whether this line is executed? Line 187 in b04b00d
It is supposed to be executed to preprocess the categorical features before hitting File "/global/home/hpc3552/.conda/envs/myenv/lib/python3.8/site-packages/flaml/model.py", line 78, in _fit model.fit(X_train, y_train, **kwargs) assuming you are using pandas dataframe |
Thanks for the response. While investigating this issue further, I realized that my FLAML installation was old. I updated to the latest (0.5.6) and have now run into a new error. I have created another issue for that #133. Once that error is resolved, I will come back to this issue to see if it still exists. |
I can confirm that this issue still exists for version 0.5.2. @sonichi I added some print statements around the line you mention in But that line of code seems like it is not executed when building the ensemble's version of the dataset. I added a print just before the call to Line 1231 in b04b00d
I printed out |
Could you check the latest version on github? I just merged a PR that fixes #133 |
It's also uploaded to pypi v0.5.7. |
@sonichi I am still getting this error, even with the latest version 0.5.7. |
Could you share a minimal example so that we can reproduce this error? |
I am out of office for a few days, but I will send a test case next week. |
@sonichi Here is a minimal example that causes the error; note that if you set
import pandas as pd
from flaml import AutoML
X = pd.DataFrame({
'f1': [1, -2, 3, -4, 5, -6, -7, 8, -9, -10, -11, -12, -13, -14],
'f2': [3., 16., 10., 12., 3., 14., 11., 12., 5., 14., 20., 16., 15., 11.,],
'f3': ['a', 'b', 'a', 'c', 'c', 'b', 'b', 'b', 'b', 'a', 'b', 'e', 'e', 'a'],
})
y = pd.Series([0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1])
automl_settings = {
"time_budget": 60,
"task": 'classification',
"n_jobs": 1,
"estimator_list": ['lgbm', 'xgboost', 'rf', 'extra_tree', 'catboost'],
"eval_method": "cv",
"n_splits": 3,
"metric": "accuracy",
"log_training_metric": True,
"verbose": 1,
"ensemble": True,
}
pipe = AutoML()
pipe.fit(X, y, **automl_settings) Output:
|
Thanks @stepthom. I'm able to reproduce it. Investigating. |
I found the problem: the |
When I set
ensemble=True
, and my data has categorical features, I get the following error at the end of the FLAML run:This error does not occur if
ensemble=False
or if I remove (or encode) the categorical features from my datasetMy guess is that FLAML properly encodes categorical features when training the base estimators (LGBM, RF, etc), but not when training the stacking classifier.
The text was updated successfully, but these errors were encountered: