Skip to content

Detect single-column transformers when they are the first step of a Pipeline#1900

Merged
rcap107 merged 6 commits intoskrub-data:mainfrom
jeromedockes:wrap-transformer-inspect-pipelines
Feb 16, 2026
Merged

Detect single-column transformers when they are the first step of a Pipeline#1900
rcap107 merged 6 commits intoskrub-data:mainfrom
jeromedockes:wrap-transformer-inspect-pipelines

Conversation

@jeromedockes
Copy link
Copy Markdown
Member

when we inspect an estimator to find out if it is a single-column transformer, special-case Pipelines and look at the first step


This is done by checking the special attribute
__single_column_transformer__ (thus inheriting from the
SingleColumnTransformer class is not mandatory). We treat scikit-learn
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what situations would it be better to just set single_column_transformer rather than inheriting from the class?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example you don't want the checks and default methods provided by the base class, you're a third-party module that wants to interoperate with skrub but not depend on it, ... there can be a few reasons so a kind of "protocol" like this attribute can be less cumbersome than imposing inheritance. in any case that is already what is in place so not really related to this PR

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I wonder if this should be explained somewhere in the documentation (for very advanced users).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably -- we can polish the SingleColumnTransformer docs in #1851

@rcap107 rcap107 added this to the Release 0.8.0 milestone Feb 13, 2026
@jeromedockes jeromedockes moved this to In progress in Labs Feb 16, 2026
return True
if isinstance(transformer, Pipeline):
try:
return is_single_column_transformer(transformer.steps[0][1])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a comment here to explain why the function is calling itself, and why the 1 transformers.steps[0][1]

It's because each step is a tuple (name, transformer), but it's not clear from the code

Copy link
Copy Markdown
Member

@rcap107 rcap107 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks a lot @jeromedockes

@rcap107 rcap107 merged commit bb8f6da into skrub-data:main Feb 16, 2026
29 checks passed
@github-project-automation github-project-automation Bot moved this from In progress to Done in Labs Feb 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants