Skip to content

srsapireddy/Data-Preprocessing-with-ColumnTransformer-and-Pipelines

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 

Repository files navigation

Data-Preprocessing-with-ColumnTransformer-and-Pipelines

Sklearn’s Pipelines with ColumnTransformer is an easy way to apply transformation rules in a standard manner, creating a more organized and clean code.

ColumnTransformer

What a ColumnTransformer allows is to apply a Sklearn’s Transformer only in a group of columns.

image

The ColumnTransformer object receives a list of tuples composed of the transformer name (this is your choice), the transformer itself, and the columns where to apply the transformation. The argument remainder specifies what needs to be done with all other columns.

image

ColumnTransformers with Pipelines

The ColumnTransformer is quite helpful, but more is needed. In many cases, a column needs to be processed in multiple steps.

For example, the numerical feature “price” may require an operation to replace the NULL values with the data mean, a log transformation to distribute the data more symmetrically, and standardization to make its values fall closer to the interval [-1, 1].

With pipelines, we can chain multiple transformers to create a complex process. Because a pipeline object is equivalent to a simple transformer (e.g., it has the same .fit() and .transform() methods), it can be inserted into the ColumnTransformer object.

You can also put a ColumnTransformer inside a Pipeline because it is a simple transformer object, and this loop can go on as long as you need.

image

The pipeline object has quite an intuitive interface. It accepts a list of tuples, each representing a transformer, with a name of your choice and the transformer object itself. It applies the transformations in the specified order.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published