New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
repr of column transformer unhelpful #13770
Comments
I'll take a look |
I'd never thought to use a mask like that, let alone with a series. I
would use a list of column names where a callable is not possible...
|
Really? Why? import pandas as pd
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
data = pd.read_csv("https://github.com/amueller/ml-workshop-1-of-4/raw/master/notebooks/data/adult.csv", index_col=0)
cat_features = data.dtypes == 'object'
ct = make_column_transformer((OneHotEncoder(sparse=False), cat_features),
(StandardScaler(), ~cat_features)) or something similar. |
Oh I guess the person was using stable, so maybe in 0.21 this is less bad
for the code above. Really friendly is different, though... |
There are 2 ways of "fixing" it that should not be too hard to implement: Option 1: cut dataframes and numpy arrays after the first \n
Option 1 with change_only=True:
Option 2: don't cut, but indent correctly
LMK which one you prefer |
If we had an easy tool for make_feature_selector(dtype=object) then that
would be the right idiom.
|
Here's a screenshot from a tutorial I did the other day:
The repr currently includes part of the dataframe, which I find of questionable value to begin with. Additionally, there's ellipsis inside the dataframe, which is hard to see but means that there is actually two different dataframes being shown here (the first part is from the mask of the first transformer, the second from the mask of the second transformer).
I'm not sure what a good solution would be. Possibly a custom repr that hides the columns if they are too long to show? Or trying more magic in the general repr? @NicolasHug had a hard time making that much smarter when he tried in the fall.
Given how central the ColumnTransformer is, I think we should make sure it's easy to understand, including the repr.
The text was updated successfully, but these errors were encountered: