[BUG] fix transformer chaining in regression pipeline#780
[BUG] fix transformer chaining in regression pipeline#780fkiraly merged 6 commits intosktime:mainfrom
Conversation
X was not updated between transformer steps during prediction, so each transformer received the original X instead of the previous transformer's output. This caused a fit/predict mismatch in pipelines with 2+ transformers: fit used T2(T1(X)) but predict used T2(X), silently corrupting all probabilistic predictions. Signed-off-by: ashnaaseth2325-oss <ashnaaseth2325@gmail.com>
|
Hello @fkiraly , @felipeangelimvieira |
fkiraly
left a comment
There was a problem hiding this comment.
Thanks - this is a quite serious bug, thank you for fixing.
From a readability perspective, I would prefer if, inside the loop, you have Xt = transformer.transform(Xt) instead of constantly renaming the variable.
|
Hello @fkiraly, |
fkiraly
left a comment
There was a problem hiding this comment.
Looks good, thanks!
It would be great if you could add a test that ensures the bug is fixed: I would suggest, pick any deterministic estimator, and then chain two exponentiation transformers or similar. Then check that using two of the transformers is indeed the correct exponent of using one.
|
Hello @fkiraly |
fkiraly
left a comment
There was a problem hiding this comment.
Thanks. I was thinking more along the lines of comparing two pipelines, but this also works.
Summary
Fixes a mismatch between
Pipeline._fitand_transformin:During
fit, transformers are chained correctly:But in
_transform,Xwas never updated inside the loop. Each transformer received the original X, and only the last output was returned.So a pipeline:
was trained on:
but predicted on:
Impact
Fix
Add one line inside
_transform:This properly chains transformer outputs.
Result