## Column Transformer

**In this example, we use the `ColumnTransformer` from `sklearn.compose` to preprocess a dataset containing text and categorical data. Below is a step-by-step explanation of the code used and its output.**

In [32]:
import pandas as pd

In [33]:
df = pd.read_csv(r"../data/sentimentdataset.csv")

In [34]:
df.head()

Unnamed: 0,Review,Platform,Sentiment
0,Enjoying a beautiful day at the park! ...,Twitter,1
1,Traffic was terrible this morning. ...,Twitter,0
2,Just finished an amazing workout! 💪 ...,Instagram,1
3,Excited about the upcoming weekend getaway! ...,Facebook,1
4,Trying out a new recipe for dinner tonight. ...,Instagram,1


In [35]:
from sklearn.compose import ColumnTransformer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.preprocessing import OneHotEncoder


In [36]:
# Define the column transformer
column_transformer = ColumnTransformer(
    transformers=[
        ('platform_ohe', OneHotEncoder(), ['Platform']),
        ('review_bow', CountVectorizer(), 'Review'),
    ],
    remainder='drop'  # Drop other columns not specified in transformers
)

# first will call column transformer class 
# column transformer needs list to work []
# in thi list we send tuples [(),()..]
# in each tuple there will be 3 things first keep name for the transformation , secopnd then whatever transformation we want to apply make object of that eg. ohe() i.e one hot encodeing third tell the column name on which you want to apply that transformation
# in our code first line of code is basiclly applying one hot encoder onsocial media platform and the operation name we kept is platform_ohe
# the remaining we are dropping so output column will be dropped

In [37]:
column_transformer

In [38]:
pd.DataFrame(column_transformer.fit_transform(df).toarray(),columns=column_transformer.get_feature_names_out()).head()

Unnamed: 0,platform_ohe__Platform_ Facebook,platform_ohe__Platform_ Instagram,platform_ohe__Platform_ Twitter,review_bow__about,review_bow__accomplished,review_bow__adopted,review_bow__after,review_bow__ahead,review_bow__ai,review_bow__amazing,...,review_bow__weather,review_bow__week,review_bow__weekend,review_bow__winter,review_bow__with,review_bow__work,review_bow__workout,review_bow__workshop,review_bow__world,review_bow__year
0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
3,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


- **Creating the Column Transformer:**
  - `ColumnTransformer` is used to apply different preprocessing steps to different columns of the DataFrame.

- **Transformers List:**
  - **`('platform_ohe', OneHotEncoder(), ['Platform'])`:**
    - **Name:** `platform_ohe`
    - **Transformation:** `OneHotEncoder()`
    - **Columns:** `['Platform']`
    - **Purpose:** Converts the categorical `Platform` column into one-hot encoded vectors.
    
  - **`('review_bow', CountVectorizer(), 'Review')`:**
    - **Name:** `review_bow`
    - **Transformation:** `CountVectorizer()`
    - **Column:** `'Review'`
    - **Purpose:** Transforms the `Review` column into a bag-of-words (BoW) matrix.

- **`remainder='drop'`:** Drops columns not specified in the transformers list.

- **Transforming the Data:**
  - `fit_transform(df)`: Applies the transformations defined in `ColumnTransformer` to the DataFrame.
  - `toarray()`: Converts the sparse matrix output of `fit_transform` into a dense array.
  - `pd.DataFrame(...)`: Creates a new DataFrame with transformed data and appropriate column names.
  - `get_feature_names_out()`: Retrieves the feature names generated by the transformers for the new DataFrame columns.

- **Output:**
  - The transformed DataFrame has the following features:

    - **One-Hot Encoded Columns (`platform_ohe`):**
      - `platform_ohe__Platform_Facebook`
      - `platform_ohe__Platform_Instagram`
      - `platform_ohe__Platform_Twitter`
    
    - **Bag-of-Words Columns (`review_bow`):**
      - Features corresponding to individual words present in the `Review` column.

  - The transformed DataFrame contains 190 columns, combining the one-hot encoded features for the `Platform` column and the bag-of-words features for the `Review` column. Above is a snippet of the resulting DataFrame:
