Skip to content

[BUG] fix transformer chaining in regression pipeline#780

Merged
fkiraly merged 6 commits intosktime:mainfrom
ashnaaseth2325-oss:fix/pipeline-transform-chaining-2
Mar 5, 2026
Merged

[BUG] fix transformer chaining in regression pipeline#780
fkiraly merged 6 commits intosktime:mainfrom
ashnaaseth2325-oss:fix/pipeline-transform-chaining-2

Conversation

@ashnaaseth2325-oss
Copy link
Contributor

Summary

Fixes a mismatch between Pipeline._fit and _transform in:

skpro/regression/compose/_pipeline.py

During fit, transformers are chained correctly:

X = t.fit_transform(X=X, y=y)

But in _transform, X was never updated inside the loop. Each transformer received the original X, and only the last output was returned.

So a pipeline:

T1 → T2 → Regressor

was trained on:

T2(T1(X_train))

but predicted on:

T2(X_test)

Impact

  • Multi-transformer pipelines produced incorrect predictions.
  • CV and backtesting results could be silently corrupted.
  • No error was raised — shapes remained valid.
  • Single-transformer pipelines were unaffected (hard to detect).

Fix

Add one line inside _transform:

X = Xt

This properly chains transformer outputs.


Result

  • Fit and predict now use the same feature representation.
  • Multi-step pipelines behave correctly.
  • No API changes.

X was not updated between transformer steps during prediction,
so each transformer received the original X instead of the
previous transformer's output. This caused a fit/predict mismatch
in pipelines with 2+ transformers: fit used T2(T1(X)) but predict
used T2(X), silently corrupting all probabilistic predictions.

Signed-off-by: ashnaaseth2325-oss <ashnaaseth2325@gmail.com>
@ashnaaseth2325-oss
Copy link
Contributor Author

Hello @fkiraly , @felipeangelimvieira
This PR fixes an inconsistency where _transform wasn’t chaining transformer outputs like _fit, causing multi-step pipelines to use incorrect features at predict time.
Added X = Xt inside the loop so prediction now mirrors training . Happy to adjust if needed.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - this is a quite serious bug, thank you for fixing.

From a readability perspective, I would prefer if, inside the loop, you have Xt = transformer.transform(Xt) instead of constantly renaming the variable.

@fkiraly fkiraly changed the title fix(pipeline): align _transform chaining with _fit [BUG] fix transformer chaining in regression pipeline Mar 1, 2026
@fkiraly fkiraly added bug module:regression probabilistic regression module labels Mar 1, 2026
@ashnaaseth2325-oss
Copy link
Contributor Author

Hello @fkiraly,
Thanks for the suggestion.
I've updated the loop to use Xt = transformer.transform(Xt) for improved readability and consistency with fit.
All checks are now passing. Happy to adjust further if needed.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

It would be great if you could add a test that ensures the bug is fixed: I would suggest, pick any deterministic estimator, and then chain two exponentiation transformers or similar. Then check that using two of the transformers is indeed the correct exponent of using one.

@ashnaaseth2325-oss
Copy link
Contributor Author

Hello @fkiraly
Thanks for the suggestion! I’ve added a test chaining two FunctionTransformer(np.exp) steps with a deterministic estimator and checking that the result matches exp(exp(X)).
CI is green now. Let me know if anything else should be adjusted.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I was thinking more along the lines of comparing two pipelines, but this also works.

@fkiraly fkiraly merged commit 6cdbf75 into sktime:main Mar 5, 2026
39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug module:regression probabilistic regression module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants