ENH Improves efficiency of ColumnTransformer for string keys#16431
Conversation
rth
left a comment
There was a problem hiding this comment.
LGTM, aside for one comment below. Thanks @thomasjpfan !
| columns = list(key) | ||
|
|
||
| try: | ||
| column_indices = [all_columns.index(col) for col in columns] |
There was a problem hiding this comment.
Do we have some test, to gracefully fail with duplicate columns names, where get_loc wouldn't behave as expected?
There was a problem hiding this comment.
We do not have a test for this behavior. The original behavior uses index, which returns the first appearance of the column name.
I would consider this a bug and error.
There was a problem hiding this comment.
Updated PR with a check on the return of get_loc.
|
Please add an Efficiency entry to the change log |
glemaitre
left a comment
There was a problem hiding this comment.
Just some nitpicks otherwise LGTM
| column_indices = [] | ||
| for col in columns: | ||
| col_idx = all_columns.get_loc(col) | ||
| if not isinstance(col_idx, int): |
There was a problem hiding this comment.
| if not isinstance(col_idx, int): | |
| if not isinstance(col_idx, numbers.Integer): |
There was a problem hiding this comment.
numbers.Integer does not exist :(
|
Thanks @thomasjpfan |
Reference Issues/PRs
Fixes #16327
What does this implement/fix? Explain your changes.
Uses the pandas index to find column indices
Any other comments?
Is a benchmark for this PR
This PR
master