Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to load models #377

Closed
nasergh opened this issue Apr 17, 2021 · 18 comments
Closed

unable to load models #377

nasergh opened this issue Apr 17, 2021 · 18 comments
Assignees
Labels
bug Something isn't working

Comments

@nasergh
Copy link

nasergh commented Apr 17, 2021

Hello,
i train some models and give the folder to save the models.
but when i try to load the model by below command it's give me error

automl = AutoML(
  mode="Compete",
  model_time_limit=(15)*60,
  n_jobs=-1,
  results_path="/media/autosk4/",
  explain_level=0,  
  algorithms=["LightGBM","CatBoost"],
  start_random_models=2
)
_`2021-04-17 09:17:50,775 supervised.exceptions ERROR Cannot load AutoML directory. '1_Default_LightGBM_GoldenFeatures'

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/anaconda3/envs/autosk/lib/python3.7/site-packages/supervised/base_automl.py in load(self, path)
    185                 ):
--> 186                     ens = Ensemble.load(path, model_subpath, models_map)
    187                     self._models += [ens]

~/anaconda3/envs/autosk/lib/python3.7/site-packages/supervised/ensemble.py in load(results_path, model_subpath, models_map)
    436             ensemble.selected_models += [
--> 437                 {"model": models_map[m["model"]], "repeat": m["repeat"]}
    438             ]

KeyError: '1_Default_LightGBM_GoldenFeatures'

During handling of the above exception, another exception occurred:

AutoMLException                           Traceback (most recent call last)
<ipython-input-6-437ae6b31a0f> in <module>
      6                 algorithms=["LightGBM","CatBoost"],start_random_models=2)
      7 
----> 8 predictions = automl.predict(X_test)
      9 
     10 predictions[X_test['momkene_out']!=2]=0

~/anaconda3/envs/autosk/lib/python3.7/site-packages/supervised/automl.py in predict(self, X)
    344             AutoMLException: Model has not yet been fitted.
    345         """
--> 346         return self._predict(X)
    347 
    348     def predict_proba(self, X):

~/anaconda3/envs/autosk/lib/python3.7/site-packages/supervised/base_automl.py in _predict(self, X)
   1298     def _predict(self, X):
   1299 
-> 1300         predictions = self._base_predict(X)
   1301         # Return predictions
   1302         # If classification task the result is in column 'label'

~/anaconda3/envs/autosk/lib/python3.7/site-packages/supervised/base_automl.py in _base_predict(self, X, model)
   1230         if model is None:
   1231             if self._best_model is None:
-> 1232                 self.load(self.results_path)
   1233             model = self._best_model
   1234 

~/anaconda3/envs/autosk/lib/python3.7/site-packages/supervised/base_automl.py in load(self, path)
    212 
    213         except Exception as e:
--> 214             raise AutoMLException(f"Cannot load AutoML directory. {str(e)}")
    215 
    216     def get_leaderboard(

AutoMLException: Cannot load AutoML directory. '1_Default_LightGBM_GoldenFeatures'

and these are files in 1_default_light_... folder

framework.json		     learner_fold_2_training.log
learner_fold_0.lightgbm      learning_curves.png
learner_fold_0_training.log  predictions_out_of_folds.csv
learner_fold_1.lightgbm      README.html
learner_fold_1_training.log  README.md
learner_fold_2.lightgbm      status.txt
@nasergh
Copy link
Author

nasergh commented Apr 17, 2021

same error happen with below command

automl = AutoML(results_path="/media/autosk4/")

@pplonski pplonski self-assigned this Apr 17, 2021
@pplonski pplonski added the bug Something isn't working label Apr 17, 2021
@pplonski
Copy link
Contributor

Hi @nasergh! Thank you for reporting the issue. Could you please send the console output from the training?

It looks like there was some problem with the 1_Default_LightGBM_GoldenFeatures model. I see that there are only fold_0, fold_1 and fold_2 model files - should be more ...

Could you send the code that you used for training and Ideally a dataset sample (so I can reproduce the problem)?

@nasergh
Copy link
Author

nasergh commented Apr 17, 2021

maybe it's because i use 3 fold
validation_strategy={"validation_type": "kfold","k_folds": 3,"shuffle": False, "stratify": True},
my dataset is very big i can't upload it !
can you check it can work with 3 fold to ?

@pplonski
Copy link
Contributor

@nasergh thank you, it can work with 3 folds, no problem. I just don't see the code that you used so I'm trying to guess what's wrong ...

Could you please paste the code that you used for AutoML training? Do you observe the problem with smaller sample size, for example with 100 training samples?

@nasergh
Copy link
Author

nasergh commented Apr 17, 2021

i don't have nan in inputs !
console:
AutoML directory: /media/autosk4/
The task is binary_classification with evaluation metric logloss
AutoML will use algorithms: ['LightGBM', 'CatBoost']
AutoML will stack models
AutoML will ensemble availabe models
AutoML steps: ['simple_algorithms', 'default_algorithms', 'not_so_random', 'golden_features', 'insert_random_feature', 'features_selection', 'hill_climbing_1', 'hill_climbing_2', 'boost_on_errors', 'ensemble', 'stack', 'ensemble_stacked']
Skip simple_algorithms because no parameters were generated.

  • Step default_algorithms will try to check up to 2 models
    1_Default_LightGBM logloss 0.513333 trained in 478.83 seconds
    2_Default_CatBoost logloss 0.518409 trained in 428.56 seconds
  • Step not_so_random will try to check up to 2 models
    3_LightGBM logloss 0.511237 trained in 324.28 seconds
    4_CatBoost logloss 0.519217 trained in 2281.95 seconds
  • Step golden_features will try to check up to 3 models
    Input contains NaN, infinity or a value too large for dtype('float32').
    Input contains NaN, infinity or a value too large for dtype('float32').
    Input contains NaN, infinity or a value too large for dtype('float32').
    Input contains NaN, infinity or a value too large for dtype('float32').
    Input contains NaN, infinity or a value too large for dtype('float32').
    Input contains NaN, infinity or a value too large for dtype('float32').
    Input contains NaN, infinity or a value too large for dtype('float32').

@nasergh
Copy link
Author

nasergh commented Apr 17, 2021

automl = AutoML(mode="Compete",model_time_limit=(15)60,n_jobs=-1,results_path="/media/autosk4/",explain_level=0, validation_strategy={"validation_type": "kfold","k_folds": 3,"shuffle": False, "stratify": False},
total_time_limit=(12
60)*60,
algorithms=["LightGBM","CatBoost"],start_random_models=2)

automl.fit(X_train, y_train)

@nasergh
Copy link
Author

nasergh commented Apr 17, 2021

please give me one little dataset to test with it.

@pplonski
Copy link
Contributor

@nasergh that might be some problem with Golden Features generation code ...

You can select only few samples from your own dataset or you can generate the data:

from sklearn import datasets

X, y = datasets.make_classification(
            n_samples=100,
            n_features=5,
            n_informative=4,
            n_redundant=1,
            n_classes=2,
            n_clusters_per_class=3,
            n_repeated=0,
            shuffle=False,
            random_state=0,
        )

@pplonski pplonski changed the title unable to load models bug in golden features generation code / unable to load models Apr 17, 2021
@nasergh
Copy link
Author

nasergh commented Apr 17, 2021

i test with this dataset and it's work !
i think because i have very big value in float32 it's some times convert them to float64 and maybe this cause of the problem. (NAN part)
i'm not sure what cause of this !

@pplonski
Copy link
Contributor

Can you send me your data statistics? If your data is in pandas data frame, you can use describe()

df.describe()

@nasergh
Copy link
Author

nasergh commented Apr 17, 2021

describe
https://ufile.io/ed572d16

@pplonski
Copy link
Contributor

@nasergh thank you! By setting large numbers I was able to reproduce the warnings:

invalid value encountered in reduce
Input contains NaN, infinity or a value too large for dtype('float32').

However, even with warnings I can fit the model and then load to compute predictions without any errors ...

Could you provide the full code that you used to compute predictions (load AutoML)?

@nasergh
Copy link
Author

nasergh commented Apr 17, 2021

i try to load via below one
automl = AutoML(mode="Compete",model_time_limit=(15)60,n_jobs=-1,results_path="/media/autosk4/",explain_level=0, validation_strategy={"validation_type": "kfold","k_folds": 3,"shuffle": False, "stratify": False},
total_time_limit=(1260)*60,
algorithms=["LightGBM","CatBoost"],start_random_models=2)

and also
automl = AutoML(results_path="/media/autosk4/")

but when i call .predict it's gives error

@pplonski
Copy link
Contributor

pplonski commented Apr 17, 2021

I need a dataset to recreate the problem and fix a bug. In the meanwhile you can train models with golden_features=False maybe with golden features switched off you will be able to train and then load AutoML.

@nasergh
Copy link
Author

nasergh commented Apr 17, 2021

i save the model with pickle, for now i fix my issue.
but my problem now is i want to know feature importance of each model but i don't know how i can get them.(EDA part take a lot of time that why i put explain = 0, how i can run without EDA and just show feature importance with SHAP and ... or it's possible to see the models and extract feature importance of each one ?)

there is a way to get each model of ensemble ? (i try different things but it seems they did not work any more like to_json, get_models,...)

@nasergh
Copy link
Author

nasergh commented Apr 18, 2021

same error without golden_feature


---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/anaconda3/envs/autosk/lib/python3.7/site-packages/supervised/base_automl.py in load(self, path)
    185                 ):
--> 186                     ens = Ensemble.load(path, model_subpath, models_map)
    187                     self._models += [ens]

~/anaconda3/envs/autosk/lib/python3.7/site-packages/supervised/ensemble.py in load(results_path, model_subpath, models_map)
    436             ensemble.selected_models += [
--> 437                 {"model": models_map[m["model"]], "repeat": m["repeat"]}
    438             ]

KeyError: '1_Default_LightGBM'

During handling of the above exception, another exception occurred:

AutoMLException                           Traceback (most recent call last)
<ipython-input-8-a0f4ec1ec4f7> in <module>
      1 automl = AutoML(results_path="/media/nanoc/New Volume/autosk5/")
----> 2 automl.predict([3])

~/anaconda3/envs/autosk/lib/python3.7/site-packages/supervised/automl.py in predict(self, X)
    344             AutoMLException: Model has not yet been fitted.
    345         """
--> 346         return self._predict(X)
    347 
    348     def predict_proba(self, X):

~/anaconda3/envs/autosk/lib/python3.7/site-packages/supervised/base_automl.py in _predict(self, X)
   1298     def _predict(self, X):
   1299 
-> 1300         predictions = self._base_predict(X)
   1301         # Return predictions
   1302         # If classification task the result is in column 'label'

~/anaconda3/envs/autosk/lib/python3.7/site-packages/supervised/base_automl.py in _base_predict(self, X, model)
   1230         if model is None:
   1231             if self._best_model is None:
-> 1232                 self.load(self.results_path)
   1233             model = self._best_model
   1234 

~/anaconda3/envs/autosk/lib/python3.7/site-packages/supervised/base_automl.py in load(self, path)
    212 
    213         except Exception as e:
--> 214             raise AutoMLException(f"Cannot load AutoML directory. {str(e)}")
    215 
    216     def get_leaderboard(

AutoMLException: Cannot load AutoML directory. '1_Default_LightGBM'

@pplonski pplonski changed the title bug in golden features generation code / unable to load models unable to load models Apr 19, 2021
@pplonski
Copy link
Contributor

Hi @nasergh!

  1. To get feature importance you can use explain_level=1 - it will produce permutation-based importance plots for each model.
  2. What do you mean by getting model from ensemble? When you check the README.md for Ensemble model you will get there the list of models in the ensemble with weight.
  3. Without code+data I cant help you with the bug. We can wait to see if other users will have similar issues.

You can try to install the package with the newest changes from GitHub - I've added new plots to model reports.

@pplonski
Copy link
Contributor

I'm closing the issue. Cant reprodcue it. Feel free to reopen or create new one if similar wrong behavior observed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants