FIX handle column names renaming in ColumnTransformer #28262

glemaitre · 2024-01-25T15:24:13Z

The adapters in charge of renaming columns with set_output to a DataFrame-like object was failing in case that the original DataFrame-like as duplicated columns (that is possible in pandas.

We inverse the stacking and and renaming to avoid potential column duplicate that should not happen.

github-actions · 2024-01-25T15:26:25Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 4c16345. Link to the linter CI: here}

glemaitre · 2024-01-25T15:26:54Z

I could also use the code snippet of the original issue as an integration test. However, they weird stuff to be investigated in between:

Polars does not allow duplicated columns so it should be failing,
It does not fail on the issue snippet because the FunctionTransformer returns a pandas DataFrame while requesting a polars DataFrame.

So more to follow.

sklearn/utils/_set_output.py

ogrisel

Suggestion about the error message.

sklearn/compose/_column_transformer.py

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

glemaitre · 2024-01-26T15:57:44Z

sklearn/compose/_column_transformer.py

+                        continue
+                    names_out = feature_names_outs[names_idx : names_idx + X.shape[1]]
+                    adapter.rename_columns(X, names_out)
+                    names_idx += X.shape[1]


I find this code ugly but I don't see how to improve it without reshaping boilerplate

glemaitre · 2024-01-26T16:01:06Z

I think it is ready for a review @ogrisel @adrinjalali

There quite some boilerplate code for generating the error but I think this is OK. For the rest, I'm just annoyed between having a flat list and a list of list. But I don't see how to simplify if I want to avoid touching more code that is not related to this issue.

ogrisel

Some more comments but otherwise LGTM.

doc/whats_new/v1.4.rst

sklearn/compose/_column_transformer.py

sklearn/compose/tests/test_column_transformer.py

sklearn/compose/_column_transformer.py

ogrisel · 2024-01-26T16:03:05Z

sklearn/compose/_column_transformer.py

+                    raise ValueError(
+                        "Concatenating DataFrames from the transformer's output lead to"
+                        " an inconsistent number of samples. The output may have Pandas"
+                        " Indexes that do not match."


Note: we could actually check if that last sentence hold to make to the error message more precise (and not use "may" when it's not the case).

But let's keep that for a follow-up PR ;)

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

adrinjalali

This seems like there must be a better way to do it 😅

sklearn/compose/_column_transformer.py

adrinjalali · 2024-01-31T10:32:13Z

sklearn/compose/_column_transformer.py

+                        continue
+                    names_out = feature_names_outs[names_idx : names_idx + X.shape[1]]
+                    adapter.rename_columns(X, names_out)
+                    names_idx += X.shape[1]


_num_features and _num_samples instead of .shape?

Here we play with container that are dataframe-like container. So they will implement the shape.

adrinjalali

Otherwise LGTM

sklearn/compose/_column_transformer.py

…8262) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

FIX handle column names renaming in ColumnTransformer

e243fae

github-actions bot added the module:utils label Jan 25, 2024

update changelgo

0546f18

adrinjalali reviewed Jan 25, 2024

View reviewed changes

sklearn/utils/_set_output.py Show resolved Hide resolved

glemaitre added this to the 1.4.1 milestone Jan 25, 2024

adrinjalali approved these changes Jan 25, 2024

View reviewed changes

glemaitre added 2 commits January 26, 2024 14:30

Merge remote-tracking branch 'origin/main' into is/28260

7a88a43

iter

4b3efbe

ogrisel reviewed Jan 26, 2024

View reviewed changes

sklearn/compose/_column_transformer.py Outdated Show resolved Hide resolved

sklearn/compose/_column_transformer.py Outdated Show resolved Hide resolved

sklearn/compose/_column_transformer.py Outdated Show resolved Hide resolved

glemaitre and others added 3 commits January 26, 2024 16:39

Apply suggestions from code review

d61c5cb

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

TST check that we raise the proper error message

d642067

naming

064b3e2

glemaitre commented Jan 26, 2024

View reviewed changes

ogrisel approved these changes Jan 26, 2024

View reviewed changes

glemaitre and others added 3 commits January 26, 2024 19:51

Apply suggestions from code review

965e050

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

improve changelog

8e94d76

case of empty list

d804cd8

adrinjalali reviewed Jan 31, 2024

View reviewed changes

glemaitre mentioned this pull request Jan 31, 2024

FIX FunctionTransformer overwrite column names if not consistent #28241

Merged

glemaitre added 2 commits January 31, 2024 12:22

cover missing case

7955e10

Merge remote-tracking branch 'origin/main' into is/28260

b14c5dd

adrinjalali reviewed Jan 31, 2024

View reviewed changes

sklearn/compose/_column_transformer.py Outdated Show resolved Hide resolved

review

4c16345

adrinjalali enabled auto-merge (squash) January 31, 2024 16:25

adrinjalali merged commit 4a28ba3 into scikit-learn:main Jan 31, 2024

glemaitre added a commit to glemaitre/scikit-learn that referenced this pull request Feb 10, 2024

FIX handle column names renaming in ColumnTransformer (scikit-learn#2…

4dd644d

…8262) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

glemaitre added a commit to glemaitre/scikit-learn that referenced this pull request Feb 13, 2024

FIX handle column names renaming in ColumnTransformer (scikit-learn#2…

e144b27

…8262) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

glemaitre added a commit that referenced this pull request Feb 13, 2024

FIX handle column names renaming in ColumnTransformer (#28262)

7798d88

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

LennartPurucker mentioned this pull request Jan 14, 2025

Fix error when using TabPFN as part of a pipeline PriorLabs/TabPFN#135

Merged

Uh oh!

FIX handle column names renaming in ColumnTransformer #28262

FIX handle column names renaming in ColumnTransformer #28262

Uh oh!

Conversation

glemaitre commented Jan 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

glemaitre commented Jan 25, 2024

Uh oh!

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glemaitre Jan 26, 2024

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Jan 26, 2024

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ogrisel Jan 26, 2024

Choose a reason for hiding this comment

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

adrinjalali Jan 31, 2024

Choose a reason for hiding this comment

Uh oh!

glemaitre Jan 31, 2024

Choose a reason for hiding this comment

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

glemaitre commented Jan 25, 2024 •

edited

Loading

github-actions bot commented Jan 25, 2024 •

edited

Loading