# ColumnTransformer

`ColumnTransformer` is a utility in **scikit-learn** that allows applying **different preprocessing steps to different columns**
of a dataset **in a single, unified pipeline**.

It is especially useful when working with datasets that contain **both numerical and categorical features**.

---

## Why ColumnTransformer is Important

In real-world datasets:
- Numerical features require scaling  
- Categorical features require encoding  
- Different columns need different preprocessing  

`ColumnTransformer` ensures that:
- Each column gets the correct transformation  
- Preprocessing is reproducible and clean  
- Data leakage is avoided  
- Pipelines are production-ready  

---

## Basic Idea

Instead of preprocessing features separately, `ColumnTransformer` lets you define:

- **What transformation**
- **Applied to which columns**

All transformations are then applied **in parallel**.




In [1]:
%%capture
!pip install numpy
!pip install pandas
!pip install matplotlib
!pip install scikit-learn


In [2]:
import numpy as np
import pandas as pd

In [3]:
df=pd.read_csv('customer.csv')
df

Unnamed: 0,Purchased,Gender,Education,Review
0,No,Female,UG,Average
1,Yes,Male,HS,Good
2,Yes,Male,PG,Good
3,Yes,Male,HS,Poor
4,No,Female,UG,Good
...,...,...,...,...
495,No,Male,HS,Poor
496,No,Male,PG,Poor
497,No,Male,PG,Average
498,No,Female,HS,Average


In [12]:
from sklearn.model_selection import train_test_split
X=df.drop('Purchased',axis=1)
y=df['Purchased']
X,y

(     Gender Education   Review
 0    Female        UG  Average
 1      Male        HS     Good
 2      Male        PG     Good
 3      Male        HS     Poor
 4    Female        UG     Good
 ..      ...       ...      ...
 495    Male        HS     Poor
 496    Male        PG     Poor
 497    Male        PG  Average
 498  Female        HS  Average
 499    Male        PG     Poor
 
 [500 rows x 3 columns],
 0       No
 1      Yes
 2      Yes
 3      Yes
 4       No
       ... 
 495     No
 496     No
 497     No
 498     No
 499     No
 Name: Purchased, Length: 500, dtype: object)

In [13]:
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=42)
X_train.shape

(400, 3)

In [14]:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OrdinalEncoder
from sklearn.preprocessing import OneHotEncoder

transformer=ColumnTransformer(transformers=[
    ('tnf1', OrdinalEncoder(categories=[['Poor','Average','Good'], ['HS','UG','PG']]), ['Review', 'Education']),
    ('tnf2',OneHotEncoder(sparse_output=False,drop='first'),['Gender']),
],remainder='passthrough')

In [15]:
transformer.fit_transform(X_train)

array([[2., 0., 1.],
       [1., 1., 1.],
       [0., 0., 1.],
       ...,
       [0., 0., 1.],
       [1., 0., 1.],
       [0., 2., 1.]], shape=(400, 3))

In [16]:
transformer.transform(X_test)

array([[1., 1., 1.],
       [1., 1., 0.],
       [2., 2., 1.],
       [0., 2., 0.],
       [2., 2., 0.],
       [0., 1., 0.],
       [2., 1., 0.],
       [1., 1., 1.],
       [1., 1., 0.],
       [1., 0., 1.],
       [2., 1., 1.],
       [1., 1., 1.],
       [1., 0., 1.],
       [1., 2., 0.],
       [2., 1., 0.],
       [0., 0., 0.],
       [0., 0., 1.],
       [1., 2., 0.],
       [1., 0., 0.],
       [0., 0., 1.],
       [2., 2., 0.],
       [1., 0., 0.],
       [2., 1., 1.],
       [0., 1., 0.],
       [0., 2., 0.],
       [1., 0., 0.],
       [1., 2., 1.],
       [2., 1., 0.],
       [1., 2., 1.],
       [1., 2., 0.],
       [2., 1., 0.],
       [0., 1., 0.],
       [0., 0., 0.],
       [2., 0., 0.],
       [2., 2., 1.],
       [1., 1., 1.],
       [2., 1., 0.],
       [0., 0., 1.],
       [0., 2., 1.],
       [1., 1., 0.],
       [0., 0., 0.],
       [1., 1., 0.],
       [2., 1., 1.],
       [0., 2., 1.],
       [1., 0., 1.],
       [2., 2., 0.],
       [1., 1., 0.],
       [0., 1