Dataset transformations

scikit-learn provides a library of transformers, which may clean (see preprocessing), reduce (see data_reduction), expand (see kernel_approximation) or generate (see feature_extraction) feature representations.

Like other estimators, these are represented by classes with a fit method, which learns model parameters (e.g. mean and standard deviation for normalization) from a training set, and a transform method which applies this transformation model to unseen data. fit_transform may be more convenient and efficient for modelling and transforming the training data simultaneously.

Combining such transformers, either in parallel or series is covered in combining_estimators. metrics covers transforming feature spaces into affinity matrices, while preprocessing_targets considers transformations of the target space (e.g. categorical labels) for use in scikit-learn.

modules/compose modules/feature_extraction modules/preprocessing modules/impute modules/unsupervised_reduction modules/random_projection modules/kernel_approximation modules/metrics modules/preprocessing_targets

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data_transforms.rst

data_transforms.rst

Dataset transformations

Files

data_transforms.rst

Latest commit

History

data_transforms.rst

File metadata and controls

Dataset transformations