Why do unsupervised transformers only update on predict and not also on learn? #542

TheRisenPhoenix · 2021-03-26T10:39:37Z

TheRisenPhoenix
Mar 26, 2021

I came across an issue when I wanted to do a pretraining of the model, it would fail to learn anything. After quite some time I figured out that the Standardscaler I use in my pipeline (Standardscaler -> PAClassifier) would always output 0/0/0/0... for all features. Thanks to the documentation and a couple of issues/discussions here, I realized that this is the desired behaviour, unsupervised transformers should only update when calling predict, not when calling learn:

One special thing to take notice to is the way transformers are handled. In a typical scenario,
it is usual to predict something for a sample and wait for the ground truth to arrive. In such
a case, the features are seen before the ground truth arrives. Therefore, the unsupervised
parts of the pipeline are updated when predict_one and predict_proba_one are called.
Usually the unsupervised parts of the pipeline are all the steps that precede the final step,
which is a supervised model. However, some transformers are supervised and are therefore
obtained during calls to learn_one.

I see that you would like to update transformers when predicting, but I don't understand why they would not also be updated when calling learn. I'm not judging, just asking, so it would be great if someone could explain this to me.
Also, is there any official way to accomplish pretraining? I could call predict() before each learn(), but that doesn't feel that right.

Answered by MaxHalford

Mar 26, 2021

It's a good question and comes up a lot.

The reason why we update the transformers in predict_one is because we have all the information we need at that point, and it performs better to update the transformers as soon as possible. This is especially true for transformers. If we update the transformers in learn_one, we could be updating them twice, which is not desirable.

Indeed the current way to do pretraining is to predict_one before learn_one. I understand it's not ideal. What we could do is add a learn_unsupervised boolean parameter to the learn_one method.

View full answer

MaxHalford · 2021-03-26T11:32:30Z

MaxHalford
Mar 26, 2021
Maintainer

It's a good question and comes up a lot.

The reason why we update the transformers in predict_one is because we have all the information we need at that point, and it performs better to update the transformers as soon as possible. This is especially true for transformers. If we update the transformers in learn_one, we could be updating them twice, which is not desirable.

Indeed the current way to do pretraining is to predict_one before learn_one. I understand it's not ideal. What we could do is add a learn_unsupervised boolean parameter to the learn_one method.

2 replies

TheRisenPhoenix Mar 26, 2021
Author

Thank you for the explanation.

If we update the transformers in learn_one, we could be updating them twice, which is not desirable.

I see. Updating the transformers multiple times might result in unnecessary computation as well as unbalanced adjustments (using trainingdata twice results in a false view on the data, when predictiondata is only used once to update). Did I catch that correctly?

What we could do is add a learn_unsupervised boolean parameter to the learn_one method.

Sounds good. I think that would help a lot.
For the moment, I will go with the predict->learn solution.

Thank you!

MaxHalford Mar 29, 2021
Maintainer

This is adressed in #545.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why do unsupervised transformers only update on predict and not also on learn? #542

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Why do unsupervised transformers only update on predict and not also on learn? #542

TheRisenPhoenix Mar 26, 2021

Replies: 1 comment · 2 replies

MaxHalford Mar 26, 2021 Maintainer

TheRisenPhoenix Mar 26, 2021 Author

MaxHalford Mar 29, 2021 Maintainer

TheRisenPhoenix
Mar 26, 2021

Replies: 1 comment 2 replies

MaxHalford
Mar 26, 2021
Maintainer

TheRisenPhoenix Mar 26, 2021
Author

MaxHalford Mar 29, 2021
Maintainer