<div class="alert alert-block alert-success">
    <h1 align="center">Scikit-Learn Tips</h1>
    <h3 align="center">Tip 05: ColumnTransformers</h3>
    <h4 align="center"><a href="http://www.iran-machinelearning.ir">Soheil Tehranipour</a></h5>
</div>

Use ColumnTransformer to apply different preprocessing to different columns:

- select from DataFrame columns by name
- passthrough or drop unspecified columns

See example 👇

In [1]:
import pandas as pd
df = pd.read_csv(r'C:\Users\soso\Desktop\Maktbakhoone\Scikit-Learn Tips\data.csv')

In [2]:
cols = ['Fare', 'Embarked', 'Sex', 'Age']
X = df[cols]

In [3]:
X

Unnamed: 0,Fare,Embarked,Sex,Age
0,7.2500,S,male,22.0
1,71.2833,C,female,38.0
2,7.9250,S,female,26.0
3,53.1000,S,female,35.0
4,8.0500,S,male,35.0
...,...,...,...,...
886,13.0000,S,male,27.0
887,30.0000,S,female,19.0
888,23.4500,S,female,
889,30.0000,C,male,26.0


In [4]:
from sklearn.preprocessing import OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.compose import make_column_transformer

In [5]:
ohe = OneHotEncoder()
imp = SimpleImputer()

In [6]:
ct = make_column_transformer(
    (ohe, ['Embarked', 'Sex']),  # apply OneHotEncoder to Embarked and Sex
    (imp, ['Age']),              # apply SimpleImputer to Age
    remainder='passthrough')     # include remaining column (Fare) in the output

In [10]:
# column order: Embarked (3 columns), Sex (2 columns), Age (1 column), Fare (1 column)
ct.fit_transform(X)

array([[ 0.        ,  0.        ,  1.        , ...,  1.        ,
        22.        ,  7.25      ],
       [ 1.        ,  0.        ,  0.        , ...,  0.        ,
        38.        , 71.2833    ],
       [ 0.        ,  0.        ,  1.        , ...,  0.        ,
        26.        ,  7.925     ],
       ...,
       [ 0.        ,  0.        ,  1.        , ...,  0.        ,
        29.69911765, 23.45      ],
       [ 1.        ,  0.        ,  0.        , ...,  1.        ,
        26.        , 30.        ],
       [ 0.        ,  1.        ,  0.        , ...,  1.        ,
        32.        ,  7.75      ]])

<img src="https://webna.ir/wp-content/uploads/2018/08/%D9%85%DA%A9%D8%AA%D8%A8-%D8%AE%D9%88%D9%86%D9%87.png" width=50% />