Sklearn’s Pipelines with ColumnTransformer is an easy way to apply transformation rules in a standard manner, creating a more organized and clean code.
What a ColumnTransformer allows is to apply a Sklearn’s Transformer only in a group of columns.
The ColumnTransformer object receives a list of tuples composed of the transformer name (this is your choice), the transformer itself, and the columns where to apply the transformation. The argument remainder specifies what needs to be done with all other columns.
The ColumnTransformer is quite helpful, but more is needed. In many cases, a column needs to be processed in multiple steps.
For example, the numerical feature “price” may require an operation to replace the NULL values with the data mean, a log transformation to distribute the data more symmetrically, and standardization to make its values fall closer to the interval [-1, 1].
With pipelines, we can chain multiple transformers to create a complex process. Because a pipeline object is equivalent to a simple transformer (e.g., it has the same .fit() and .transform() methods), it can be inserted into the ColumnTransformer object.
You can also put a ColumnTransformer inside a Pipeline because it is a simple transformer object, and this loop can go on as long as you need.
The pipeline object has quite an intuitive interface. It accepts a list of tuples, each representing a transformer, with a name of your choice and the transformer object itself. It applies the transformations in the specified order.