Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Feature: Implement PowerTransformer #10210

Merged
merged 56 commits into from Dec 5, 2017

Conversation

chang
Copy link
Contributor

@chang chang commented Nov 27, 2017

Reference Issues/PRs

Fixes #6675
Fixes #6781

What does this implement/fix? Explain your changes.

This PR implements sklearn.preprocessing.PowerTransformer. Power transforms are a family of monotonic, parametric transformations used to transform skewed distributions to as close to Gaussian as possible. This could be useful for models that require homoschedasticity, or any other situations where normality is desirable.

At the moment, only the Box-Cox transform is supported, which requires strictly positive data. The optimal parameters for stabilizing variance and minimizing skewness are determined using maximum likelihood, and the transformation is applied to the dataset feature-wise.

Any other comments?

We will consider implementing the Yeo-Johnson transform - a power transformation that can be applied to negative data - in a future PR.

Thanks to @maniteja123 for kicking it off!

maniteja123 and others added 29 commits September 29, 2016 09:35
… when Box-Cox is being tested. Fix docstring test failure.
@chang chang changed the title [MRG] Implement BoxCoxTransformer [WIP] Implement BoxCoxTransformer Nov 27, 2017
@chang
Copy link
Contributor Author

chang commented Dec 5, 2017

Thanks for the doc fix @glemaitre. Good suggestion on normalizing the distributions, Joel - I used minmax_scale(X, feature_range=(1e-10, 10)).

@jnothman
Copy link
Member

jnothman commented Dec 5, 2017 via email

@chang
Copy link
Contributor Author

chang commented Dec 5, 2017

Fixed the issues - thanks!

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My last nitpicks. @jnothman I am fine to merge.

power_transform : Equivalent function without the estimator API.

QuantileTransformer : Maps data to a standard normal distribution with
the parameter output_distribution='normal'.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output_distribution='normal'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting backticks? That's not obvious from the rendering ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ups ... thanks to point this out :)

API (as part of a preprocessing :class:`sklearn.pipeline.Pipeline`).

quantile_transform : Maps data to a standard normal distribution with
the parameter output_distribution='normal'.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output_distribution='normal'

@amueller
Copy link
Member

amueller commented Dec 5, 2017

Great work @ericchang00!

'font.size': 6,
'hist.bins': 150
}
matplotlib.rcParams.update(params)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that a good idea? Depending on how careful sphinxgallery is with global state, I feel this could go wrong? Or for people copy and pasting the example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the global parameter setting


params = {
'font.size': 6,
'hist.bins': 150
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe slightly less bins would make it more clear?

In many modeling scenarios, normality of the features in a dataset is desirable.
Power transforms are a family of parametric, monotonic transformations that aim
to map data from any distribution to as close to a Gaussian distribution as
possible in order to minimize skewness.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do all power transformation aim to minimize skewness? (I actually don't know)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - it might be clearer as 'minimize skewness and stabilize variance'.

that are applied to make data more Gaussian-like. This is useful for
modeling issues related to heteroscedasticity (non-constant variance),
or other situations where normality is desired. Note that power
transforms do not result in standard normal distributions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(i.e. mean might be far from zero and standard deviation not one?)

Copy link
Contributor Author

@chang chang Dec 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly! added

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant maybe say that explicitly ;)

@amueller
Copy link
Member

amueller commented Dec 5, 2017

I'm still confused as to how maximum likelihood relates to skewdness. The wikipedia article of box-cox doesn't mention skew.... Is it just that empirically it decreases skew or is there some more formal statement?

@chang
Copy link
Contributor Author

chang commented Dec 5, 2017

Added the final tweaks.

@amueller, I think the 'skewness' vocabulary came from an earlier review. It's more of an empirical observation - the main purpose of Box-Cox is to make data normal and stabilize variance. Skewness does not necessarily imply higher variance, but it does imply nonnormality, so the description still makes sense, IMO.

edit: fixed flake8 error

@amueller
Copy link
Member

amueller commented Dec 5, 2017

Maybe not in this PR, but a direct comparison against quantile transformer would be nice, right?

Copy link
Member

@amueller amueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Green button on green CI?

@chang
Copy link
Contributor Author

chang commented Dec 5, 2017

Agreed - comparison with quantile transformer + a linear model example for a future PR. Looks like we're good to go :)

@jnothman jnothman merged commit 62e9bb8 into scikit-learn:master Dec 5, 2017
@jnothman
Copy link
Member

jnothman commented Dec 5, 2017

Congrats @ericchang00 and @maniteja123

@amueller
Copy link
Member

amueller commented Dec 5, 2017

Sweeeet!

@chang
Copy link
Contributor Author

chang commented Dec 5, 2017

Awesome thanks so much guys! This is very exciting :)

@jnothman
Copy link
Member

jnothman commented Dec 5, 2017

Btw, @amueller, what's your opinion on having a standardize parameter to centre and scale the output of PowerTransformer? After all #6675 does describe box-cox as reshaping the data into a standard normal.

@amueller
Copy link
Member

amueller commented Dec 8, 2017

I think it would be good, and we might have it on by default. I don't think it'll surprise anyone, and it'll make things easier. I can't really imagine a case when it would be a bad idea.

@jnothman
Copy link
Member

jnothman commented Dec 9, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add BoxCox Transform
8 participants