New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Add Yeo-Johnson transform to PowerTransformer #11520

Merged
merged 44 commits into from Jul 20, 2018

Conversation

@NicolasHug
Contributor

NicolasHug commented Jul 14, 2018

Reference Issues/PRs

Closes #10261

What does this implement/fix? Explain your changes.

This PR implements the Yeo-Johnson transform as part of the PowerTransformer class.

PowerTransformer currently only support Box-Cox which only works for positive values, Yeo-Johnson works for the whole real line.

Original paper : link.

TODO:

  • Write transform
  • Fix lambda param estimation
  • Write inverse transform
  • Write docs
  • Write tests
  • Update examples

Any other comments?

The lambda parameter estimation is a bit tricky and currently does not work. (should be OK now, see below). Unlike for Box-Cox there's no scipy built-in that we can rely on. I'm having a hard time finding decent guidelines, tried to implement likelihood maximization with the brent optimizer (just like for Box-Cox) but run into overflow issues.

The transform code seems to work though:

boxcox_yeojohnson

which is a reproduction of

blah
From Quantile regression via vector generalized additive models by Thomas W. Yee.

Code for figure (hacky):

import numpy as np
from sklearn.preprocessing import PowerTransformer
import matplotlib.pyplot as plt

yj = PowerTransformer(method='yeo-johnson', standardize=False)
bc = PowerTransformer(method='box-cox', standardize=False)

X = np.arange(-4, 4, .1).reshape(-1, 1)
fig, axes = plt.subplots(ncols=2)

for lmbda in (0, .5, 1, 1.5, 2):
    X_pos = X[X > 0].reshape(-1, 1)
    bc.fit(X_pos)
    bc.lambdas_ = [lmbda]
    X_trans = bc.transform(X_pos)
    axes[0].plot(X_pos, X_trans, label=r'$\lambda = {}$'.format(lmbda))
    axes[0].set_title('Box-Cox')

    yj.fit(X)
    yj.lambdas_ = [lmbda]
    X_trans = yj.transform(X)
    axes[1].plot(X, X_trans, label=r'$\lambda = {}$'.format(lmbda))
    axes[1].set_title('Yeo-Johnson')

for ax in axes:
    ax.set(xlim=[-4, 4], ylim=[-5, 5], aspect='equal')
    ax.legend()
    ax.grid()

plt.show()

NicolasHug added some commits Jul 14, 2018

Fixed lambda param optimization
The issue was from an error in the log likelihood function
@NicolasHug

This comment has been minimized.

Show comment
Hide comment
@NicolasHug

NicolasHug Jul 14, 2018

Contributor

Lambda param estimation should be fixed now, thanks @amueller.

Replication of this example with Yeo-Johnson instead of Box-Cox:

figure_1

Contributor

NicolasHug commented Jul 14, 2018

Lambda param estimation should be fixed now, thanks @amueller.

Replication of this example with Yeo-Johnson instead of Box-Cox:

figure_1

NicolasHug added some commits Jul 15, 2018

Some first tests
Need to write inverse_transform to continue
@amueller

We think it's working now, right? So we need a test for the optimization, and then documentation and adding it to an example?

Show outdated Hide outdated sklearn/preprocessing/data.py
Show outdated Hide outdated sklearn/preprocessing/data.py
@@ -2076,7 +2078,7 @@ def test_power_transformer_strictly_positive_exception():
pt.fit, X_with_negatives)
assert_raise_message(ValueError, not_positive_message,
power_transform, X_with_negatives)
power_transform, X_with_negatives, 'box-cox')

This comment has been minimized.

@amueller

amueller Jul 15, 2018

Member

why is this needed? The default value shouldn't change, right? Or do we want to start a cycle to change the default to yeo-johnson?

@amueller

amueller Jul 15, 2018

Member

why is this needed? The default value shouldn't change, right? Or do we want to start a cycle to change the default to yeo-johnson?

This comment has been minimized.

@NicolasHug

NicolasHug Jul 15, 2018

Contributor

I find it clearer and explicit?

I don't know if we'll change the default but it should still be fine as PowerTransform hasn't been released yet AFAIK

@NicolasHug

NicolasHug Jul 15, 2018

Contributor

I find it clearer and explicit?

I don't know if we'll change the default but it should still be fine as PowerTransform hasn't been released yet AFAIK

This comment has been minimized.

@amueller

amueller Jul 15, 2018

Member

good point. We should discuss before the release. I think yeo-johnson would make more sense.

@amueller

amueller Jul 15, 2018

Member

good point. We should discuss before the release. I think yeo-johnson would make more sense.

This comment has been minimized.

@ogrisel

ogrisel Jul 15, 2018

Member

The fact that Yeo-Johnson accepts negative values while Box-Cox does not makes me feel like we should use it by default. From a usability point of view, it's nicer to our users.

@ogrisel

ogrisel Jul 15, 2018

Member

The fact that Yeo-Johnson accepts negative values while Box-Cox does not makes me feel like we should use it by default. From a usability point of view, it's nicer to our users.

This comment has been minimized.

@NicolasHug

NicolasHug Jul 15, 2018

Contributor

I have the same feeling. Plus, it is designed to be a generalization of Box-Cox, even though that's not strictly the case.

@NicolasHug

NicolasHug Jul 15, 2018

Contributor

I have the same feeling. Plus, it is designed to be a generalization of Box-Cox, even though that's not strictly the case.

This comment has been minimized.

@amueller
@amueller

This comment has been minimized.

@NicolasHug

NicolasHug Jul 15, 2018

Contributor

shall I change the default then?

@NicolasHug

NicolasHug Jul 15, 2018

Contributor

shall I change the default then?

This comment has been minimized.

@amueller

amueller Jul 15, 2018

Member

think so.

@amueller

amueller Jul 15, 2018

Member

think so.

This comment has been minimized.

@NicolasHug

NicolasHug Jul 15, 2018

Contributor

Done

@NicolasHug

NicolasHug Jul 15, 2018

Contributor

Done

NicolasHug added some commits Jul 15, 2018

Opt for yeo-johnson not influenced by Nan
Also added related test
@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Jul 15, 2018

Member

If we want this to be the default then this is a blocker, right?

Member

amueller commented Jul 15, 2018

If we want this to be the default then this is a blocker, right?

lmbda_no_nans = pt.lambdas_[0]
# concat nans at the end and check lambda stays the same
X = np.concatenate([X, np.full_like(X, np.nan)])

This comment has been minimized.

@ogrisel

ogrisel Jul 15, 2018

Member

To make sure that the location of the NaNs does not impact the estimation:

from sklearn.utils import shuffle
...

X = np.concatenate([X, np.full_like(X, np.nan)])
X = shuffle(X, random_state=0)
@ogrisel

ogrisel Jul 15, 2018

Member

To make sure that the location of the NaNs does not impact the estimation:

from sklearn.utils import shuffle
...

X = np.concatenate([X, np.full_like(X, np.nan)])
X = shuffle(X, random_state=0)

This comment has been minimized.

@NicolasHug

NicolasHug Jul 15, 2018

Contributor

Done

@NicolasHug

NicolasHug Jul 15, 2018

Contributor

Done

@amueller amueller added the Blocker label Jul 15, 2018

@amueller amueller added this to the 0.20 milestone Jul 15, 2018

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Jul 15, 2018

Member

tagged for 0.20 and added blocker label. I don't like that we keep adding stuff but if we want to make it default we should do it now.

Member

amueller commented Jul 15, 2018

tagged for 0.20 and added blocker label. I don't like that we keep adding stuff but if we want to make it default we should do it now.

@glemaitre

Couple of opened comments.
If I am not wrong we should have something in the common estimator_checks which force the input to be positive to work with box-cox. We probably want to change this behavior with we change the default.

Show outdated Hide outdated sklearn/preprocessing/data.py
Show outdated Hide outdated sklearn/preprocessing/data.py
Show outdated Hide outdated sklearn/preprocessing/data.py
return x_inv
def _yeo_johnson_transform(self, x, lmbda):

This comment has been minimized.

@glemaitre

glemaitre Jul 16, 2018

Contributor

we cannot just define the forward transform and take 1 / _yeo_johnson_transform for the inverse?

@glemaitre

glemaitre Jul 16, 2018

Contributor

we cannot just define the forward transform and take 1 / _yeo_johnson_transform for the inverse?

This comment has been minimized.

@NicolasHug

NicolasHug Jul 16, 2018

Contributor

The inverse here means f^{-1}, not 1 / f

@NicolasHug

NicolasHug Jul 16, 2018

Contributor

The inverse here means f^{-1}, not 1 / f

Show outdated Hide outdated sklearn/preprocessing/data.py
Show outdated Hide outdated sklearn/preprocessing/data.py
Show outdated Hide outdated sklearn/preprocessing/data.py

@jorisvandenbossche jorisvandenbossche added this to PRs tagged in scikit-learn 0.20 Jul 16, 2018

@NicolasHug

This comment has been minimized.

Show comment
Hide comment
@NicolasHug

NicolasHug Jul 16, 2018

Contributor

I made a quick example to illustrate the use of Yeo-Johnson vs. Box-Cox + offset.

As Box-Cox only accepts positive data, one solution is to shift the data by a fixed offset value (typically min(data) + eps):

test

One thing we see is that the "after offset and Box-Cox" isn't as symmetric as the eo-Johnson and most importantly the values are much higher.

Is it worth adding this as an example @amueller? TBH I wouldn't be able to mathematically or intuitively explain those results.

Contributor

NicolasHug commented Jul 16, 2018

I made a quick example to illustrate the use of Yeo-Johnson vs. Box-Cox + offset.

As Box-Cox only accepts positive data, one solution is to shift the data by a fixed offset value (typically min(data) + eps):

test

One thing we see is that the "after offset and Box-Cox" isn't as symmetric as the eo-Johnson and most importantly the values are much higher.

Is it worth adding this as an example @amueller? TBH I wouldn't be able to mathematically or intuitively explain those results.

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Jul 16, 2018

Member

+1 on comments by @glemaitre . Also, the tests are failing.

Member

GaelVaroquaux commented Jul 16, 2018

+1 on comments by @glemaitre . Also, the tests are failing.

@amueller amueller moved this from PRs tagged to Blockers in scikit-learn 0.20 Jul 16, 2018

NicolasHug added some commits Jul 16, 2018

Removed box-cox specific checks in estimator_checks
The default is now Yeo-Johnson which can handle negative data

@glemaitre glemaitre changed the title from [WIP] Add Yeo-Johnson transform to PowerTransformer to [MRG] Add Yeo-Johnson transform to PowerTransformer Jul 16, 2018

@glemaitre

Couple of comments

Show outdated Hide outdated doc/modules/preprocessing.rst
Show outdated Hide outdated examples/preprocessing/plot_power_transformer.py
Show outdated Hide outdated sklearn/preprocessing/data.py
Show outdated Hide outdated sklearn/preprocessing/data.py
Show outdated Hide outdated sklearn/preprocessing/data.py
Show outdated Hide outdated sklearn/preprocessing/data.py
Show outdated Hide outdated sklearn/preprocessing/data.py
Show outdated Hide outdated sklearn/preprocessing/tests/test_data.py
Show outdated Hide outdated sklearn/preprocessing/tests/test_data.py
Show outdated Hide outdated sklearn/preprocessing/tests/test_data.py

NicolasHug added some commits Jul 16, 2018

Fixed Nan issues (ignored warnings)
Also removed other checks about positive data now that Yeo-Johnson is the
default
@NicolasHug

This comment has been minimized.

Show comment
Hide comment
@NicolasHug

NicolasHug Jul 17, 2018

Contributor

Suggestion: When calling fit and transform on the same data, the transform is done twice. We could implement a fit_transform method to do it only once.

Good idea, I tried to implement it but I don't think it's possible:

Currently when calling fit_transform, the pipeline goes like this:
X -> transform -> scaler_fit -> scaler_transform -> transform

So transform is indeed called twice, but not on the same data. I don't think avoiding one of them is possible, but I may be missing something?

In any case I changed fit so that transform is only called when needed, that is when standardize is True.

Contributor

NicolasHug commented Jul 17, 2018

Suggestion: When calling fit and transform on the same data, the transform is done twice. We could implement a fit_transform method to do it only once.

Good idea, I tried to implement it but I don't think it's possible:

Currently when calling fit_transform, the pipeline goes like this:
X -> transform -> scaler_fit -> scaler_transform -> transform

So transform is indeed called twice, but not on the same data. I don't think avoiding one of them is possible, but I may be missing something?

In any case I changed fit so that transform is only called when needed, that is when standardize is True.

@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel Jul 17, 2018

Member

It's fine to call just transfom on the test.

The goal here is to call fit_transform on the training set instead of fit and then transform on the training set as there is a redundant computation that happens when subsequently calling fit and then transform on the same data.

Member

ogrisel commented Jul 17, 2018

It's fine to call just transfom on the test.

The goal here is to call fit_transform on the training set instead of fit and then transform on the training set as there is a redundant computation that happens when subsequently calling fit and then transform on the same data.

@TomDLT

This comment has been minimized.

Show comment
Hide comment
@TomDLT

TomDLT Jul 17, 2018

Member

What I have in mind is:

def fit(self, X):
    _ = self._fit_transform(X, transform=False)
    return self

def fit_transform(self, X):
    Xt = self._fit_transform(X, transform=True)
    return Xt

def transform(self, X):
    check_is_fitted(self, 'lambdas_')
    X = self._check_input(X, check_positive=True, check_shape=True,
                          copy=self.copy)
    Xt = self._transform_inplace(X)
    return Xt

def _fit_transform(self, X, transform=True):
    X = self._check_input(X, check_positive=True, check_method=True,
                          copy=self.copy or not transform)
    optim_function = ...
    self.lambdas_ = []
    for col in X.T:
        with np.errstate(invalid='ignore'):  # hide NaN warnings
             lmbda = optim_function(col)
             self.lambdas_.append(lmbda)
    self.lambdas_ = np.array(self.lambdas_)

    Xt = None
    if self.standardize or transform:
        Xt = self._transform_inplace(X)

    if self.standardize:
        self._scaler = StandardScaler()
        self._scaler.fit(Xt)
        if transform:
            Xt = self._scaler.transform(Xt)
    return Xt

def _transform_inplace(self, X):
    transform_function = ...
    for i, lmbda in enumerate(self.lambdas_):
        with np.errstate(invalid='ignore'):  # hide NaN warnings
            X[:, i] = transform_function(X[:, i], lmbda)
    return X

Therefore we reuse the transformed data computed for the standardization.

Member

TomDLT commented Jul 17, 2018

What I have in mind is:

def fit(self, X):
    _ = self._fit_transform(X, transform=False)
    return self

def fit_transform(self, X):
    Xt = self._fit_transform(X, transform=True)
    return Xt

def transform(self, X):
    check_is_fitted(self, 'lambdas_')
    X = self._check_input(X, check_positive=True, check_shape=True,
                          copy=self.copy)
    Xt = self._transform_inplace(X)
    return Xt

def _fit_transform(self, X, transform=True):
    X = self._check_input(X, check_positive=True, check_method=True,
                          copy=self.copy or not transform)
    optim_function = ...
    self.lambdas_ = []
    for col in X.T:
        with np.errstate(invalid='ignore'):  # hide NaN warnings
             lmbda = optim_function(col)
             self.lambdas_.append(lmbda)
    self.lambdas_ = np.array(self.lambdas_)

    Xt = None
    if self.standardize or transform:
        Xt = self._transform_inplace(X)

    if self.standardize:
        self._scaler = StandardScaler()
        self._scaler.fit(Xt)
        if transform:
            Xt = self._scaler.transform(Xt)
    return Xt

def _transform_inplace(self, X):
    transform_function = ...
    for i, lmbda in enumerate(self.lambdas_):
        with np.errstate(invalid='ignore'):  # hide NaN warnings
            X[:, i] = transform_function(X[:, i], lmbda)
    return X

Therefore we reuse the transformed data computed for the standardization.

@TomDLT

This comment has been minimized.

Show comment
Hide comment
@TomDLT

TomDLT Jul 17, 2018

Member

In any case, we probably need a test to verify that:

  • The input is not changed by fit, transform or fit_transform when copy=True.
  • The input is not changed by fit, when copy=False.
  • The input is changed by transform or fit_transform, when copy=False.
Member

TomDLT commented Jul 17, 2018

In any case, we probably need a test to verify that:

  • The input is not changed by fit, transform or fit_transform when copy=True.
  • The input is not changed by fit, when copy=False.
  • The input is changed by transform or fit_transform, when copy=False.
@NicolasHug

This comment has been minimized.

Show comment
Hide comment
@NicolasHug

NicolasHug Jul 17, 2018

Contributor

Isn't there a general test for all estimators for that?

Contributor

NicolasHug commented Jul 17, 2018

Isn't there a general test for all estimators for that?

ogrisel and others added some commits Jul 17, 2018

@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel Jul 17, 2018

Member

I pushed a quick implementation for fit_transform but indeed it does not handle in place scaling properly, although it was also wrong when standardize=True: we also need to pass self._scaler = StandardScaler(copy=self.copy).

Member

ogrisel commented Jul 17, 2018

I pushed a quick implementation for fit_transform but indeed it does not handle in place scaling properly, although it was also wrong when standardize=True: we also need to pass self._scaler = StandardScaler(copy=self.copy).

@TomDLT

TomDLT approved these changes Jul 17, 2018

Thanks for the added tests.
We might want to add them as common tests at some point, but it might be for another pull-request.

Show outdated Hide outdated sklearn/preprocessing/data.py

NicolasHug added some commits Jul 17, 2018

@glemaitre

Couple of changes

Show outdated Hide outdated sklearn/preprocessing/data.py
Show outdated Hide outdated sklearn/preprocessing/data.py
Show outdated Hide outdated sklearn/preprocessing/data.py
@glemaitre

This comment has been minimized.

Show comment
Hide comment
@glemaitre

glemaitre Jul 17, 2018

Contributor

I am waiting to check the example in the documentation

Contributor

glemaitre commented Jul 17, 2018

I am waiting to check the example in the documentation

@NicolasHug

This comment has been minimized.

Show comment
Hide comment
@NicolasHug

NicolasHug Jul 18, 2018

Contributor

@glemaitre @ogrisel, I think the plot looks pretty OK now.

Contributor

NicolasHug commented Jul 18, 2018

@glemaitre @ogrisel, I think the plot looks pretty OK now.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jul 19, 2018

Member

I'm finding the plots in plot_map_data_to_normal relatively hard to navigate intuitively. It's not a blocker, but I think it needs to look more tabular: at the moment it takes some effort to see that each row is a different transformation; a label on the left of the row would be more helpful.

Also, having the transformations go from left to right and the datasets from top to bottom doesn't look like it would be infeasible, and would be more familiar from plot_cluster_comparison etc.

Member

jnothman commented Jul 19, 2018

I'm finding the plots in plot_map_data_to_normal relatively hard to navigate intuitively. It's not a blocker, but I think it needs to look more tabular: at the moment it takes some effort to see that each row is a different transformation; a label on the left of the row would be more helpful.

Also, having the transformations go from left to right and the datasets from top to bottom doesn't look like it would be infeasible, and would be more familiar from plot_cluster_comparison etc.

@jnothman

Can I clarify why plot_all_scaling still only shows box-cox?

@NicolasHug

This comment has been minimized.

Show comment
Hide comment
@NicolasHug

NicolasHug Jul 19, 2018

Contributor

Also, having the transformations go from left to right and the datasets from top to bottom doesn't look like it would be infeasible, and would be more familiar from plot_cluster_comparison etc.

Personally I find it easier to compare the transformations when they're stacked on each other, especially since the axes limits are uniform across the plots.

I don't have anything against having the transformation names on the left. It would also make sense to me to have one dataset per column (limiting the plot to 4 rows instead of 8), but that would make the plot wider which can be annoying on mobile.

Thanks for mentioning plot_all_scaling, I missed that one.

Contributor

NicolasHug commented Jul 19, 2018

Also, having the transformations go from left to right and the datasets from top to bottom doesn't look like it would be infeasible, and would be more familiar from plot_cluster_comparison etc.

Personally I find it easier to compare the transformations when they're stacked on each other, especially since the axes limits are uniform across the plots.

I don't have anything against having the transformation names on the left. It would also make sense to me to have one dataset per column (limiting the plot to 4 rows instead of 8), but that would make the plot wider which can be annoying on mobile.

Thanks for mentioning plot_all_scaling, I missed that one.

@NicolasHug

This comment has been minimized.

Show comment
Hide comment
@NicolasHug

NicolasHug Jul 19, 2018

Contributor

Looks like 14e7c32 broke plot_all_scaling on master:

Traceback (most recent call last):
  File "examples/preprocessing/plot_all_scaling.py", line 71, in <module>
    dataset = fetch_california_housing()
  File "/home/nico/dev/sklearn/sklearn/datasets/california_housing.py", line 128, in fetch_california_housing
    cal_housing = joblib.load(filepath)
  File "/home/nico/dev/sklearn/sklearn/externals/joblib/numpy_pickle.py", line 578, in load
    obj = _unpickle(fobj, filename, mmap_mode)
  File "/home/nico/dev/sklearn/sklearn/externals/joblib/numpy_pickle.py", line 508, in _unpickle
    obj = unpickler.load()
  File "/usr/lib64/python3.6/pickle.py", line 1050, in load
    dispatch[key[0]](self)
  File "/usr/lib64/python3.6/pickle.py", line 1338, in load_global
    klass = self.find_class(module, name)
  File "/usr/lib64/python3.6/pickle.py", line 1388, in find_class
    __import__(module, level=0)
ModuleNotFoundError: No module named 'sklearn.externals._joblib.numpy_pickle'

Should I open an issue for this? I'm not sure if this comes from my env (I created a new one from scratch, still same). Doesn't the CI check that all the examples are passing?

Contributor

NicolasHug commented Jul 19, 2018

Looks like 14e7c32 broke plot_all_scaling on master:

Traceback (most recent call last):
  File "examples/preprocessing/plot_all_scaling.py", line 71, in <module>
    dataset = fetch_california_housing()
  File "/home/nico/dev/sklearn/sklearn/datasets/california_housing.py", line 128, in fetch_california_housing
    cal_housing = joblib.load(filepath)
  File "/home/nico/dev/sklearn/sklearn/externals/joblib/numpy_pickle.py", line 578, in load
    obj = _unpickle(fobj, filename, mmap_mode)
  File "/home/nico/dev/sklearn/sklearn/externals/joblib/numpy_pickle.py", line 508, in _unpickle
    obj = unpickler.load()
  File "/usr/lib64/python3.6/pickle.py", line 1050, in load
    dispatch[key[0]](self)
  File "/usr/lib64/python3.6/pickle.py", line 1338, in load_global
    klass = self.find_class(module, name)
  File "/usr/lib64/python3.6/pickle.py", line 1388, in find_class
    __import__(module, level=0)
ModuleNotFoundError: No module named 'sklearn.externals._joblib.numpy_pickle'

Should I open an issue for this? I'm not sure if this comes from my env (I created a new one from scratch, still same). Doesn't the CI check that all the examples are passing?

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Jul 20, 2018

Member

@NicolasHug it's now "fixed" but you need to remove your scikit_learn_data folder in your home folder.

Member

amueller commented Jul 20, 2018

@NicolasHug it's now "fixed" but you need to remove your scikit_learn_data folder in your home folder.

@NicolasHug

This comment has been minimized.

Show comment
Hide comment
@NicolasHug

NicolasHug Jul 20, 2018

Contributor

Thanks, just updated plot_all_scaling

Contributor

NicolasHug commented Jul 20, 2018

Thanks, just updated plot_all_scaling

@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel Jul 20, 2018

Member

The matplotlib rendering of the 2 examples is good enough for now:

https://29575-843222-gh.circle-artifacts.com/0/doc/auto_examples/index.html#preprocessing

Merging. Thanks @NicolasHug for this nice contribution!

Member

ogrisel commented Jul 20, 2018

The matplotlib rendering of the 2 examples is good enough for now:

https://29575-843222-gh.circle-artifacts.com/0/doc/auto_examples/index.html#preprocessing

Merging. Thanks @NicolasHug for this nice contribution!

@ogrisel ogrisel merged commit 2d232ac into scikit-learn:master Jul 20, 2018

7 checks passed

ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: python2 Your tests passed on CircleCI!
Details
ci/circleci: python3 Your tests passed on CircleCI!
Details
codecov/patch 98.74% of diff hit (target 95.3%)
Details
codecov/project 95.3% (+<.01%) compared to 1984dac
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

scikit-learn 0.20 automation moved this from Blockers to Done Jul 20, 2018

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Jul 20, 2018

Member

yay!

Member

amueller commented Jul 20, 2018

yay!

@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel Jul 20, 2018

Member

I agree with @jnothman (#11520 (comment)) that using a layout similar to the cluster comparison plot would improve the readability even further but I don't want to delay the release for this.

Member

ogrisel commented Jul 20, 2018

I agree with @jnothman (#11520 (comment)) that using a layout similar to the cluster comparison plot would improve the readability even further but I don't want to delay the release for this.

@NicolasHug NicolasHug deleted the NicolasHug:yeojohnson branch Jul 20, 2018

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux
Member

GaelVaroquaux commented Jul 20, 2018

@rth rth referenced this pull request Aug 7, 2018

Merged

DOC Formatting in what's new #11766

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment