# Column Transformer
The ColumnTransformer in Scikit-learn is a powerful tool used for applying different transformations to different columns or subsets of columns within a dataset. It enables the application of various preprocessing steps on specific columns and combines these transformations into a single step for use in machine learning pipelines.

### Key Aspects of ColumnTransformer:

#### 1. **Multiple Transformations:**
   - Allows applying different preprocessing techniques or transformations to specific columns or subsets of columns.

#### 2. **Integration with Pipelines:**
   - Easily integrates into Scikit-learn pipelines, enabling a streamlined and organized preprocessing workflow.

#### 3. **Named Transformation Steps:**
   - Assigns names to each transformation step for clear identification and reference.

#### 4. **Usage with Different Transformers:**
   - Supports integration with various preprocessing transformers like StandardScaler, OneHotEncoder, etc.

### Implementation of ColumnTransformer:
Here's an example:

```python
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
import pandas as pd

# Sample dataset with different types of columns
data = {
    'numeric_col': [10, 20, 30, 40],
    'categorical_col': ['A', 'B', 'A', 'C']
}

# Create DataFrame
df = pd.DataFrame(data)

# Define transformations for different columns
numeric_transformer = StandardScaler()
categorical_transformer = OneHotEncoder()

# Define ColumnTransformer with specified transformations for each column
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, ['numeric_col']),
        ('cat', categorical_transformer, ['categorical_col'])
    ]
)

# Fit and transform the data using ColumnTransformer
transformed_data = preprocessor.fit_transform(df)

# Display the transformed data
print(transformed_data)
```

### Steps Explained:

1. **Import Necessary Libraries:** Import required modules (`ColumnTransformer`, `StandardScaler`, `OneHotEncoder`, etc.).

2. **Sample Dataset Creation:** Create a sample DataFrame (`df`) with different types of columns (numeric and categorical).

3. **Define Transformers:** Define separate transformers directly (e.g., `StandardScaler` for numeric and `OneHotEncoder` for categorical columns).

4. **Create ColumnTransformer:** Define a ColumnTransformer with specified transformers for each column or subset of columns.

5. **Fit and Transform Data:** Use the ColumnTransformer to fit and transform the dataset, applying appropriate transformations to specific columns.

6. **Display Transformed Data:** Print or display the transformed data resulting from the ColumnTransformer.

The ColumnTransformer simplifies the preprocessing of datasets with multiple types of columns by allowing different transformations to be applied to specific columns or subsets, streamlining the machine learning pipeline construction.

In [1]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import OrdinalEncoder, OneHotEncoder
from sklearn.impute import SimpleImputer

In [2]:
df = pd.read_csv('covid_toy.csv')

In [3]:
df.head()

Unnamed: 0,age,gender,fever,cough,city,has_covid
0,60,Male,103.0,Mild,Kolkata,No
1,27,Male,100.0,Mild,Delhi,Yes
2,42,Male,101.0,Mild,Delhi,No
3,31,Female,98.0,Mild,Kolkata,No
4,65,Female,101.0,Mild,Mumbai,No


In [4]:
df.isnull().sum()

age           0
gender        0
fever        10
cough         0
city          0
has_covid     0
dtype: int64

In [5]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df.drop(columns=['has_covid']), df['has_covid'], test_size=0.2)

In [6]:
X_train

Unnamed: 0,age,gender,fever,cough,city
78,11,Male,100.0,Mild,Bangalore
68,54,Female,104.0,Strong,Kolkata
55,81,Female,101.0,Mild,Mumbai
15,70,Male,103.0,Strong,Kolkata
53,83,Male,98.0,Mild,Delhi
...,...,...,...,...,...
65,69,Female,102.0,Mild,Bangalore
99,10,Female,98.0,Strong,Kolkata
33,26,Female,98.0,Mild,Kolkata
92,82,Female,102.0,Strong,Kolkata


### 1. Aam Zindagi

In [10]:
# Adding simple imputer to fever column
si = SimpleImputer()
X_train_fever = si.fit_transform(X_train[['fever']])
X_test_fever = si.transform(X_test[['fever']])

X_test_fever.shape

(20, 1)

In [13]:
# Ordinal encoding to cough column
oe = OrdinalEncoder(categories=[['Mild', 'Strong']], dtype=np.int8)
X_train_cough = oe.fit_transform(X_train[['cough']])
X_test_cough = oe.transform(X_test[['cough']])

X_train_cough.shape

(80, 1)

In [15]:
# Nominal encoding on gender and city column
ohe = OneHotEncoder(drop='first', sparse_output=False, dtype=np.int8)
X_train_gender_city = ohe.fit_transform(X_train[['gender','city']])
X_test_gender_city = ohe.transform(X_test[['gender','city']])

X_train_gender_city.shape

(80, 4)

In [19]:
# Extracting age
X_train_age = X_train['age'].values.reshape(-1,1)
X_test_age = X_test['age'].values.reshape(-1,1)
X_train_age.shape

(80, 1)

In [21]:
X_train_transformed = np.concatenate((X_train_age, X_train_gender_city, X_train_fever, X_train_cough), axis=1)
X_test_transformed = np.concatenate((X_test_age, X_test_gender_city, X_test_fever, X_test_cough), axis=1)

X_train_transformed.shape

(80, 7)

### 2. Mentos Zindagi

In [22]:
from sklearn.compose import ColumnTransformer
transformer = ColumnTransformer(transformers=[
    ('tnf1', SimpleImputer(), ['fever']),
    ('tnf2', OrdinalEncoder(categories=[['Mild','Strong']]), ['cough']),
    ('tnf3', OneHotEncoder(sparse_output=False, drop='first', dtype=np.int8), ['gender','city'])
], remainder='passthrough')

In [23]:
transformer.fit_transform(X_train).shape

(80, 7)

In [24]:
transformer.transform(X_test).shape

(20, 7)