-
My goal is model stacking, where I wish to use an external, prefit model in an sklearn pipeline as an input without refitting it. I tried subclassing a TransformerMixin in order to use the predictions of this externally-fitted sklearn model (this model is related as it uses the same features, but it is trained on disjoint data). This external model's outputs would become inputs to my new estimator, the same way that we would utilize any unsupervised transformer. However, the sklearn API will reset any attached estimator to an unfitted state when cloning. Refitting is not an option, due to both fit time and the fact that the external model is trained on a disjoint dataset. Is there a good way to accomplish this without giving up sklearn pipelines? Without using pipelines to encapsulate all transformations, including the upstream model's predictions, it's difficult to accomplish feature importance analyses. Here's my attempt that doesn't work as intended - it still results in the (expensive) joblib reload on every
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
This is a known limitation and I think that we are looking for a consensus to tackle the problem. You can have a look at the following PR: #8370 and the related issues and PRs. I think that we should make probably a SLEP in order to tackle this problem. |
Beta Was this translation helpful? Give feedback.
This is a known limitation and I think that we are looking for a consensus to tackle the problem. You can have a look at the following PR: #8370 and the related issues and PRs. I think that we should make probably a SLEP in order to tackle this problem.