-
-
Notifications
You must be signed in to change notification settings - Fork 26.1k
FIX handle column names renaming in ColumnTransformer #28262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I could also use the code snippet of the original issue as an integration test. However, they weird stuff to be investigated in between:
So more to follow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion about the error message.
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
continue | ||
names_out = feature_names_outs[names_idx : names_idx + X.shape[1]] | ||
adapter.rename_columns(X, names_out) | ||
names_idx += X.shape[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this code ugly but I don't see how to improve it without reshaping boilerplate
I think it is ready for a review @ogrisel @adrinjalali There quite some boilerplate code for generating the error but I think this is OK. For the rest, I'm just annoyed between having a flat list and a list of list. But I don't see how to simplify if I want to avoid touching more code that is not related to this issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more comments but otherwise LGTM.
raise ValueError( | ||
"Concatenating DataFrames from the transformer's output lead to" | ||
" an inconsistent number of samples. The output may have Pandas" | ||
" Indexes that do not match." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: we could actually check if that last sentence hold to make to the error message more precise (and not use "may" when it's not the case).
But let's keep that for a follow-up PR ;)
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like there must be a better way to do it 😅
continue | ||
names_out = feature_names_outs[names_idx : names_idx + X.shape[1]] | ||
adapter.rename_columns(X, names_out) | ||
names_idx += X.shape[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_num_features
and _num_samples
instead of .shape
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we play with container that are dataframe-like container. So they will implement the shape
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM
…8262) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
…8262) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
closes #28260
The adapters in charge of renaming columns with
set_output
to a DataFrame-like object was failing in case that the original DataFrame-like as duplicated columns (that is possible inpandas
.We inverse the stacking and and renaming to avoid potential column duplicate that should not happen.