## 📦 ColumnTransformer 
ColumnTransformer lets you apply different preprocessing to different columns at once.

### ✅ Why use it?
- Useful when you have both numerical and categorical features.
- Keeps your preprocessing clean, organized, and pipeline-ready.

### 🛠️ Example Use Case:
- Scale numeric columns (age, salary) with StandardScaler
- One-hot encode categorical columns (gender) with OneHotEncoder

In [1]:
import numpy as np
import pandas as pd

In [6]:
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OrdinalEncoder, OneHotEncoder
from sklearn.model_selection import train_test_split

In [4]:
df = pd.read_csv("covid_toy.csv")
df.head()

Unnamed: 0,age,gender,fever,cough,city,has_covid
0,60,Male,103.0,Mild,Kolkata,No
1,27,Male,100.0,Mild,Delhi,Yes
2,42,Male,101.0,Mild,Delhi,No
3,31,Female,98.0,Mild,Kolkata,No
4,65,Female,101.0,Mild,Mumbai,No


In [9]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   age        100 non-null    int64  
 1   gender     100 non-null    object 
 2   fever      90 non-null     float64
 3   cough      100 non-null    object 
 4   city       100 non-null    object 
 5   has_covid  100 non-null    object 
dtypes: float64(1), int64(1), object(4)
memory usage: 4.8+ KB


In [7]:
X_train, X_test, y_train, y_test = train_test_split(df.drop(columns=['has_covid']), df['has_covid'], test_size=0.2)

In [8]:
X_train.head()

Unnamed: 0,age,gender,fever,cough,city
96,51,Female,101.0,Strong,Kolkata
55,81,Female,101.0,Mild,Mumbai
34,74,Male,102.0,Mild,Mumbai
69,73,Female,103.0,Mild,Delhi
92,82,Female,102.0,Strong,Kolkata


In [12]:
X_train['cough'].unique()

array(['Strong', 'Mild'], dtype=object)

### 🥺 Without Column Transformer

In [13]:
# add simple imputer to fever column
si = SimpleImputer()

X_train_fever = si.fit_transform(X_train[['fever']])
X_test_fever = si.transform(X_test[['fever']])

X_train_fever.shape, X_test_fever.shape

((80, 1), (20, 1))

In [14]:
# ordinal encoder for cough column
oe = OrdinalEncoder(categories=[['Mild', 'Strong']]) # less will be first

X_train_cough = oe.fit_transform(X_train[['cough']])
X_test_cough = oe.transform(X_test[['cough']])

X_train_cough.shape, X_test_cough.shape

((80, 1), (20, 1))

In [17]:
# oneHotEncoder for gender and city columns
ohe = OneHotEncoder(drop='first', sparse_output=False)

X_train_gender_city = ohe.fit_transform(X_train[['gender', 'city']])
X_test_gender_city = ohe.fit_transform(X_test[['gender', 'city']])

X_train_gender_city.shape

(80, 4)

In [19]:
# extract age
X_train_age = X_train[['age']].values
X_test_age = X_test[['age']].values

X_train_age.shape, X_test_age.shape

((80, 1), (20, 1))

In [21]:
X_train_transformed = np.hstack((X_train_fever, X_train_cough, X_train_age, X_train_gender_city))
X_test_transformed = np.hstack((X_test_fever, X_test_cough, X_test_age, X_test_gender_city))

In [24]:
X_train_transformed.shape

(80, 7)

### 😁 With Column Transformer

In [25]:
from sklearn.compose import ColumnTransformer

In [34]:
transformer = ColumnTransformer(
    transformers=[
('tnf1', SimpleImputer(), ['fever']),
('tnf2', OrdinalEncoder(categories=[['Mild', 'Strong']]), ["cough"]),
('tnf3', OneHotEncoder(drop='first', sparse_output=False ), ['gender', 'city']),
    ], remainder='passthrough'
)

In [38]:
transformer.fit_transform(X_train)
transformer.transform(X_test)

array([[104.       ,   0.       ,   0.       ,   0.       ,   0.       ,
          0.       ,  12.       ],
       [104.       ,   0.       ,   1.       ,   0.       ,   0.       ,
          0.       ,  25.       ],
       [100.       ,   1.       ,   0.       ,   0.       ,   0.       ,
          0.       ,  47.       ],
       [104.       ,   0.       ,   1.       ,   0.       ,   1.       ,
          0.       ,  16.       ],
       [102.       ,   1.       ,   0.       ,   0.       ,   0.       ,
          0.       ,  24.       ],
       [104.       ,   0.       ,   0.       ,   0.       ,   1.       ,
          0.       ,  17.       ],
       [ 98.       ,   1.       ,   1.       ,   0.       ,   1.       ,
          0.       ,  34.       ],
       [100.       ,   0.       ,   1.       ,   0.       ,   0.       ,
          0.       ,  80.       ],
       [100.8028169,   0.       ,   1.       ,   0.       ,   1.       ,
          0.       ,  82.       ],
       [102.       ,   0.   