## Column Transformer:

It allows you to selectively apply data preparation transforms to different columns in your dataset. This is particularly useful when you have a mix of numerical and categorical data that require different preprocessing steps.

<img src="images/transformer.png" alt="Image Description" width="850" height="280">

In [3]:
import pandas as pd
import numpy as np

In [4]:
df1 = pd.read_csv('Datasets/col transformer data.csv')

In [5]:
df1.sample(4)

Unnamed: 0,Years_of_Experience,Department,Job_Role,Performance_Rating,Seniority_Level,Promoted
191,2,Marketing,Marketing Analyst,Average,Junior,No
194,9,IT,Developer,Excellent,Senior,Yes
4,10,Marketing,Marketing Analyst,Excellent,Senior,Yes
32,12,Sales,Sales Manager,Good,Senior,No


## Summary table of the data:

<img src="images/data.png" alt="Image Description" width="650" height="470">

In [6]:
from sklearn.model_selection import train_test_split

x = df1.drop(columns=['Promoted'])
y = df1.iloc[:,-1]
X_train,X_test,y_train,y_test = train_test_split(x,y,test_size=0.3)

In [7]:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import OrdinalEncoder

In [8]:
X_train.columns

Index(['Years_of_Experience', 'Department', 'Job_Role', 'Performance_Rating',
       'Seniority_Level'],
      dtype='object')

In [9]:
print(X_train['Performance_Rating'].unique())
print(X_train['Seniority_Level'].unique())

['Good' 'Excellent' 'Poor' 'Average']
['Mid' 'Junior' 'Senior']


### The `ColumnTransformer` takes two main parameters:

1. `transformers` (List of Transformations Applied)  

-> Each tuple inside the transformers list consists of three elements:

- Name (String Identifier) → A unique name for the transformation (used for reference).   
- Transformer (Transformation Method) → The transformation applied, such as OneHotEncoder, OrdinalEncoder, StandardScaler, etc.    
- Column List (Columns to Transform) → Specifies which columns the transformer should be applied to.  

Example:  

- **('ct1', OneHotEncoder(...), ['Department', 'Job_Role'])** →   
        Applies One-Hot Encoding to categorical columns (Department, Job_Role), converting them into binary (0/1) columns.   

- **('ct2', OrdinalEncoder(...), ['Performance_Rating', 'Seniority_Level'])** →    
         Applies Ordinal Encoding, mapping Performance_Rating (Poor, Good, Average, Excellent) and Seniority_Level (Junior, Mid, Senior) into numerical ranks.     

2. `remainder` (What Happens to Other Columns) 

- 'drop' → Drops any columns not listed in transformers.   
- 'passthrough' → Keeps unchanged columns (like Years_of_Experience and Promoted) in the output.

In [10]:
ct = ColumnTransformer(transformers=[
    ('ct1',OneHotEncoder(sparse_output=False,dtype=np.int32),['Department', 'Job_Role']),
    ('ct2',OrdinalEncoder(categories=[['Poor','Good', 'Average', 'Excellent'],['Junior','Mid','Senior']]),['Performance_Rating',
       'Seniority_Level']),
],remainder='passthrough')

In [11]:
ct.fit(X_train)

The format of the columns of the 'remainder' transformer in ColumnTransformer.transformers_ will change in version 1.7 to match the format of the other transformers.
At the moment the remainder columns are stored as indices (of type int). With the same ColumnTransformer configuration, in the future they will be stored as column names (of type str).



In [12]:
new_X_train = ct.transform(X_train)
new_X_test = ct.transform(X_test)

In [13]:
new_X_train.shape

(140, 19)

In [14]:
new_X_test.shape

(60, 19)

Using column transformer you can easily encode the multiple types feature simultenously.

----