How to store a pipeline contains Gridsearchcv object? #19504
-
I have the following model built using the Pipeline of scikit-learn pipeline:
When I try to pickle the lgb_model which is pipeline:
It shows up:
Anyone can help me with storing the best model that can be retrievedd and make predictions? Thanks a lot. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
Can you please provide the entire code? Can you also try with a scikit-learn estimator instead of the LightGBM one? Also, there's a data leak: Pipeline([('scaler', StandardScaler()), ('model', GridSearchCV(...)) Should be GridSearchCV(Pipeline([('scaler', StandardScaler()), ('model', lgbm_estimator)])) In the first snippet, the CV done by grid-search takes as input the entire normalized data, and so the folds are not independent anymore. It seems that you're only doing model selection and not model evaluation so it might not be super important, but it's worth noting that this is in general incorrect. |
Beta Was this translation helpful? Give feedback.
Can you please provide the entire code? Can you also try with a scikit-learn estimator instead of the LightGBM one?
Also, there's a data leak:
Should be
In the first snippet, the CV done by grid-search takes as input the entire normalized data, and so the folds are not independent anymore. It seems that you're only doing model selection and not model evaluation so it might not be super important, but it's worth noting that this is in general incorrect.