# ColumnTransformer 



The ColumnTransformer in scikit-learn allows you to apply different preprocessing steps to different columns of your dataset in a single pipeline, which is useful when handling mixed data types (e.g., numeric and categorical) efficiently.

![image.png](attachment:image.png)

# 1. Import Libraries

In [37]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report

# 2. Data Loading

In [38]:
import pandas as pd 
df = pd.read_csv("covid_toy.csv")
df

Unnamed: 0,age,gender,fever,cough,city,has_covid
0,60,Male,103.0,Mild,Kolkata,No
1,27,Male,100.0,Mild,Delhi,Yes
2,42,Male,101.0,Mild,Delhi,No
3,31,Female,98.0,Mild,Kolkata,No
4,65,Female,101.0,Mild,Mumbai,No
...,...,...,...,...,...,...
95,12,Female,104.0,Mild,Bangalore,No
96,51,Female,101.0,Strong,Kolkata,Yes
97,20,Female,101.0,Mild,Bangalore,No
98,5,Female,98.0,Strong,Mumbai,No


# 3. Preprocess Data 

Numerical: age, fever

Categorical: gender, cough, city

Target: has_covid → Convert to 0 (No), 1 (Yes)

In [40]:
df['has_covid'] = df['has_covid'].map({'Yes':1, 'No':0})

# 4. Train-Test Split

In [None]:
X = df.drop('has_covid', axis=1)
y = df['has_covid']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 5. Build ColumnTransformer

In [42]:
# Define columns
num_features = ['age', 'fever']
cat_features = ['gender', 'cough', 'city']

# ColumnTransformer
ct = ColumnTransformer(transformers=[
    ('num', StandardScaler(), num_features),
    ('cat', OneHotEncoder(handle_unknown='ignore'), cat_features)
])

# Fit and transform training, only transform test
X_train_transformed = ct.fit_transform(X_train)
X_test_transformed = ct.transform(X_test)

# 6. Train Model

In [44]:
model = RandomForestClassifier(random_state=42)
model.fit(X_train_transformed, y_train)

# 7. Predict and Evaluate

In [45]:
y_pred = model.predict(X_test_transformed)

print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))

Accuracy: 0.4
              precision    recall  f1-score   support

           0       0.57      0.31      0.40        13
           1       0.31      0.57      0.40         7

    accuracy                           0.40        20
   macro avg       0.44      0.44      0.40        20
weighted avg       0.48      0.40      0.40        20



# 8. Prediction System

In [46]:
def predict_covid(model, ct, new_data):

    # Convert input dict to DataFrame
    input_df = pd.DataFrame([new_data])

    # Transform using ColumnTransformer
    X_new_transformed = ct.transform(input_df)

    # Predict
    prediction = model.predict(X_new_transformed)[0]
    return "Has COVID" if prediction == 1 else "No COVID"

In [47]:
# Input from a new user
new_input = {
    "age": 40,
    "gender": "Female",
    "fever": 102.0,
    "cough": "Mild",
    "city": "Mumbai"
}

# Predict result
result = predict_covid(model, ct, new_input)
print("Prediction:", result)


Prediction: No COVID
