[MRG+1] Feature: Implement PowerTransformer #10210

chang · 2017-11-27T09:39:53Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

This PR implements sklearn.preprocessing.PowerTransformer. Power transforms are a family of monotonic, parametric transformations used to transform skewed distributions to as close to Gaussian as possible. This could be useful for models that require homoschedasticity, or any other situations where normality is desirable.

At the moment, only the Box-Cox transform is supported, which requires strictly positive data. The optimal parameters for stabilizing variance and minimizing skewness are determined using maximum likelihood, and the transformation is applied to the dataset feature-wise.

Any other comments?

We will consider implementing the Yeo-Johnson transform - a power transformation that can be applied to negative data - in a future PR.

Thanks to @maniteja123 for kicking it off!

…ures

…_ to store Box-Cox parameters

… when Box-Cox is being tested. Fix docstring test failure.

jnothman · 2017-12-05T07:43:13Z

There are a couple of other small things @glemaitre requested that are unaddressed as far as I can tell.

…

On 5 December 2017 at 18:32, Eric Chang ***@***.***> wrote: Thanks for the doc fix @glemaitre <https://github.com/glemaitre>. Good suggestion on normalizing the distributions, Joel - I used minmax_scale(X, feature_range=(1e-10, 10)). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#10210 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz67ZpQ-N0-cJkfbJX9_0gYHl7eo7Nks5s9PGVgaJpZM4QrWr2> .

…s in plot example

chang · 2017-12-05T07:55:24Z

Fixed the issues - thanks!

glemaitre

My last nitpicks. @jnothman I am fine to merge.

glemaitre · 2017-12-05T13:46:44Z

sklearn/preprocessing/data.py

+    power_transform : Equivalent function without the estimator API.
+
+    QuantileTransformer : Maps data to a standard normal distribution with
+        the parameter output_distribution='normal'.


output_distribution='normal'

Are you suggesting backticks? That's not obvious from the rendering ;)

Ups ... thanks to point this out :)

glemaitre · 2017-12-05T13:47:01Z

sklearn/preprocessing/data.py

+        API (as part of a preprocessing :class:`sklearn.pipeline.Pipeline`).
+
+    quantile_transform : Maps data to a standard normal distribution with
+        the parameter output_distribution='normal'.


output_distribution='normal'

amueller · 2017-12-05T15:59:13Z

Great work @ericchang00!

amueller · 2017-12-05T16:02:40Z

examples/preprocessing/plot_power_transformer.py

+    'font.size': 6,
+    'hist.bins': 150
+}
+matplotlib.rcParams.update(params)


Is that a good idea? Depending on how careful sphinxgallery is with global state, I feel this could go wrong? Or for people copy and pasting the example?

removed the global parameter setting

amueller · 2017-12-05T16:16:14Z

examples/preprocessing/plot_power_transformer.py

+
+params = {
+    'font.size': 6,
+    'hist.bins': 150


Maybe slightly less bins would make it more clear?

amueller · 2017-12-05T16:28:03Z

doc/modules/preprocessing.rst

+In many modeling scenarios, normality of the features in a dataset is desirable.
+Power transforms are a family of parametric, monotonic transformations that aim
+to map data from any distribution to as close to a Gaussian distribution as
+possible in order to minimize skewness.


Do all power transformation aim to minimize skewness? (I actually don't know)

Good point - it might be clearer as 'minimize skewness and stabilize variance'.

amueller · 2017-12-05T16:28:54Z

sklearn/preprocessing/data.py

+    that are applied to make data more Gaussian-like. This is useful for
+    modeling issues related to heteroscedasticity (non-constant variance),
+    or other situations where normality is desired. Note that power
+    transforms do not result in standard normal distributions.


(i.e. mean might be far from zero and standard deviation not one?)

Exactly! added

I meant maybe say that explicitly ;)

amueller · 2017-12-05T16:45:56Z

I'm still confused as to how maximum likelihood relates to skewdness. The wikipedia article of box-cox doesn't mention skew.... Is it just that empirically it decreases skew or is there some more formal statement?

chang · 2017-12-05T17:07:42Z

Added the final tweaks.

@amueller, I think the 'skewness' vocabulary came from an earlier review. It's more of an empirical observation - the main purpose of Box-Cox is to make data normal and stabilize variance. Skewness does not necessarily imply higher variance, but it does imply nonnormality, so the description still makes sense, IMO.

edit: fixed flake8 error

…xample

amueller · 2017-12-05T19:44:14Z

Maybe not in this PR, but a direct comparison against quantile transformer would be nice, right?

amueller

LGTM. Green button on green CI?

chang · 2017-12-05T20:01:59Z

Agreed - comparison with quantile transformer + a linear model example for a future PR. Looks like we're good to go :)

jnothman · 2017-12-05T21:30:31Z

Congrats @ericchang00 and @maniteja123

amueller · 2017-12-05T21:35:23Z

Sweeeet!

chang · 2017-12-05T21:38:19Z

Awesome thanks so much guys! This is very exciting :)

jnothman · 2017-12-05T22:17:03Z

Btw, @amueller, what's your opinion on having a standardize parameter to centre and scale the output of PowerTransformer? After all #6675 does describe box-cox as reshaping the data into a standard normal.

amueller · 2017-12-08T18:56:19Z

I think it would be good, and we might have it on by default. I don't think it'll surprise anyone, and it'll make things easier. I can't really imagine a case when it would be a bad idea.

jnothman · 2017-12-09T11:39:32Z

so let us try to get a default standardisation in before next release

maniteja123 and others added 29 commits September 29, 2016 09:35

ENH: Add boxcox transform to preprocess input data

68c82fe

cleanup + proper test / example

7f37876

Modify estimator checks to input positive data

6ed54fa

fix test again

c0ad985

fix test

b6945ae

flake8 fixes

b4afc75

modify common tests

dfae43f

Modify example for boxcox transformation

ceeda6b

update documentation

c4c23a1

Modify parallel to support memmap

71d5d80

Add inverse_transform

780baf7

Add jnothman suggestions for vectorised inverse transform across feat…

df2b372

…ures

Some minor refactoring and doc changes

6097c13

Merge branch 'INCOMPLETE-PR-6781' into implement-boxcox-transform

6235245

ENH: remove Parallel() usage

779eb7f

Fix merge conflicts

5d5e3c1

Merge branch 'sklearn-master' into implement-boxcox-transform

ad8515b

ENH: simplify boxcox .fit() and input validation

4b6d290

Remove function versions of boxcox, update references in docstring

4b705c8

Remove changes to _transform_selected()

376948f

STY: use sklearn convention for fitted attributes: lmbdas_

f38831e

TST rework tests for BoxCoxTransformer

c98853c

Merge branch 'sklearn-master' into implement-boxcox-transform

99738f6

ENH: Implement Box-Cox inverse transformation

cfe8c8b

DOC: Update documentation for Box-Cox. Use lambdas_ instead of lmbdas…

0a1e2ce

…_ to store Box-Cox parameters

DOC: Add Box-Cox example to examples/preprocessing/plot_all_scaling.py

89bfe52

ENH: minor changes to docstrings and error messages

93b683c

TST: improve test coverage, fix linting error

6f99646

TST: Fix CI test errors by making data nonzero in check_estimators.py…

7bf0c03

… when Box-Cox is being tested. Fix docstring test failure.

chang changed the title ~~[MRG] Implement BoxCoxTransformer~~ [WIP] Implement BoxCoxTransformer Nov 27, 2017

ENH: improvements to PowerTransformer docs and normalize distribution…

234d69d

…s in plot example

chang force-pushed the implement-boxcox-transform branch from 3b0d703 to 234d69d Compare December 5, 2017 07:54

glemaitre reviewed Dec 5, 2017

View reviewed changes

amueller reviewed Dec 5, 2017

View reviewed changes

DOC: documentation tweaks and remove global param setting from plot e…

6653100

…xample

chang force-pushed the implement-boxcox-transform branch from e67e5f6 to 6653100 Compare December 5, 2017 17:38

DOC: minor tweak to plot_all_scaling.py

c303ae6

chang force-pushed the implement-boxcox-transform branch from 2b55109 to c303ae6 Compare December 5, 2017 18:55

amueller approved these changes Dec 5, 2017

View reviewed changes

jnothman merged commit 62e9bb8 into scikit-learn:master Dec 5, 2017

chang mentioned this pull request Dec 5, 2017

Implement the Yeo-Johnson transform as part of PowerTransformer #10261

Closed

jnothman mentioned this pull request Dec 14, 2017

Add a parameter standardize with default True to PowerTransformer #10311

Closed

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

Feature: Implement PowerTransformer (scikit-learn#10210)

79cdd67

adrinjalali mentioned this pull request Aug 13, 2024

ENH Use scipy.special.inv_boxcox in PowerTransformer #27875

Merged

Uh oh!

[MRG+1] Feature: Implement PowerTransformer #10210

[MRG+1] Feature: Implement PowerTransformer #10210

Uh oh!

Conversation

chang commented Nov 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

jnothman commented Dec 5, 2017 via email

Uh oh!

chang commented Dec 5, 2017

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amueller commented Dec 5, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chang Dec 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amueller commented Dec 5, 2017

Uh oh!

chang commented Dec 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amueller commented Dec 5, 2017

Uh oh!

amueller left a comment

Choose a reason for hiding this comment

Uh oh!

chang commented Dec 5, 2017

Uh oh!

jnothman commented Dec 5, 2017

Uh oh!

amueller commented Dec 5, 2017

Uh oh!

chang commented Dec 5, 2017

Uh oh!

jnothman commented Dec 5, 2017

Uh oh!

amueller commented Dec 8, 2017

Uh oh!

jnothman commented Dec 9, 2017 via email

Uh oh!

Uh oh!

chang commented Nov 27, 2017 •

edited

Loading

chang Dec 5, 2017 •

edited

Loading

chang commented Dec 5, 2017 •

edited

Loading