Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] TransformedTargetRegressor #9041

Merged
merged 83 commits into from
Dec 13, 2017

Conversation

glemaitre
Copy link
Member

@glemaitre glemaitre commented Jun 7, 2017

Continutation of #8988

TODO:

  • User guide doc
  • API doc
  • What's new entry

@glemaitre glemaitre changed the title Targettransformer [WIP] TransformedTargetRegressor Jun 7, 2017
@glemaitre
Copy link
Member Author

glemaitre commented Jun 7, 2017

@jnothman @amueller @dengemann @agramfort Can you have a look before that I am writing some narrative doc accordingly

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite the multitude of comments, overall I think this is what we want. Good work



class TransformedTargetRegressor(BaseEstimator, RegressorMixin):
"""Meta-estimator to apply a transformation to the target before fitting.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"regress on a transformed target" might be simpler than "apply a ..."


Parameters
----------
estimator : object, (default=LinearRegression())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe should call this regressor, because transformer is also an estimator.

Parameters
----------
estimator : object, (default=LinearRegression())
Estimator object derived from ``RegressorMixin``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't usually require inheritance, as long as appropriate methods are available. You could say "such as derived from" ...

Perhaps mention that it will be cloned.

Estimator object derived from ``RegressorMixin``.

transformer : object, (default=None)
Estimator object derived from ``TransformerMixin``. Cannot be set at
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't usually require inheritance.

Perhaps mention that it will be cloned.

``func`` and ``inverse_func`` are ``None`` as well, the transformer
will be an identity transformer.

func : function, (default=None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer "optional" to "default=None" which has no clear semantics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though in this case the semantics of "optional" are not obvious either; the reader needs to look 2 lines below anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hardly see that as a problem, given the context.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just sayin', if this says "optional" instead of "default=None", the docstring below should say "if not passed" instead of "If None". (or the reader needs to scroll and check that the default value is None)

self._validate_transformer(y_2d)
self.estimator_ = clone(self.estimator)
self.estimator_.fit(X, self.transformer_.transform(y_2d),
sample_weight=sample_weight)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current convention is to pass sample_weight only when sample_weight is not None

return pred

def score(self, X, y, sample_weight=None):
"""Returns the coefficient of determination R^2 of the prediction.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should state here that scoring is performed in the original, not the transformed, space.

from sklearn.preprocessing.label import _inverse_binarize_thresholding
from sklearn.preprocessing.label import _inverse_binarize_multiclass

from sklearn import datasets

iris = datasets.load_iris()
friedman = datasets.make_friedman1(random_state=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert?

assert_array_almost_equal((y - y_mean) / y_std, y_tran)
assert_array_almost_equal(y, np.ravel(clf.transformer_.inverse_transform(
y_tran.reshape(-1, 1))))
assert_equal(y.shape, pred.shape)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've failed to test that clf.estimator_ was passed the transformed y.

A better test would just check the equivalence between clf.estimator_.coef_ and LinearRegression().fit(X, StandardScaler().fit_transform(y[:, None])[:, 0]).coef_.

You've also not tested the handling of sample_weight.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this would be a good test, too.
I think testing coef_ and also testing pred would be good.

# memorize if y should be a multi-output
self.y_ndim_ = y.ndim
if y.ndim == 1:
y_2d = y.reshape(-1, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect we don't want to do this when func and inverse_func are provided?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I come back on this point. I am not sure this is a great idea since that it changes the behaviour between if you pass a function or a transformer. We could still make the transform to 2d and build the FunctionTransformer with validate=True and the behaviour will always be the same.

I don't see a case in which the user would define a function which working on a 1D array but would failed on a 2D array

@jnothman
Copy link
Member

jnothman commented Jun 8, 2017 via email

@glemaitre glemaitre changed the title [WIP] TransformedTargetRegressor [MRG] TransformedTargetRegressor Jun 8, 2017
@glemaitre glemaitre changed the title [MRG] TransformedTargetRegressor [WIP] TransformedTargetRegressor Jun 8, 2017
@glemaitre
Copy link
Member Author

@jnothman I miss the what's new but I added some doc and address almost all comments. I am just unsure about fit + transform vs fit_transform and there implications.

self._fit_transformer(y_2d, sample_weight)
self.regressor_ = clone(self.regressor)
if sample_weight is not None:
self.regressor_.fit(X, self.transformer_.transform(y_2d),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that we should really be using fit_transform to produce the downstream y here

@jnothman
Copy link
Member

jnothman commented Jun 9, 2017 via email

@glemaitre glemaitre changed the title [WIP] TransformedTargetRegressor [MRG] TransformedTargetRegressor Jun 9, 2017
@@ -31,6 +31,10 @@ Changelog
New features
............

- Added the :class:`sklearn.preprocessing.TransformedTargetRegressor` which
is a meta-estimator to regress on a modified ``y``. :issue:`9041` by
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to make this more relatable, describe a small use case, maybe
"for example, to perform regression in log-space"

or something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@glemaitre
Copy link
Member Author

Let's make a separate issue for this so that we don't crowd out the discussion here.

Then this PR will be invaded by a pink unicorn :)

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Nov 28, 2017 via email

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of small things to test. And the naming/placement questions stand. Apart from which LGTM.

>>> def inverse_func(x):
... return x
>>> regr = TransformedTargetRegressor(regressor=regressor,
... func=func,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not fixed

# non-negative and (ii) applying an exponential function to obtain non-linear
# targets which cannot be fitted using a simple linear model.
#
# Therefore, a logarithmic and an exponential functions will be used to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

functions -> function


regr_trans = TransformedTargetRegressor(
regressor=RidgeCV(),
transformer=QuantileTransformer(output_distribution='normal'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amueller, maybe this is somewhere we can illustrate PowerTransformer rather than changing #10210

>>> from sklearn.linear_model import LinearRegression
>>> from sklearn.preprocessing import TransformedTargetRegressor
>>> tt = TransformedTargetRegressor(regressor=LinearRegression(),
... func=np.log, inverse_func=np.exp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation

-----
Internally, the target ``y`` is always converted into a 2-dimensional array
to be used by scikit-learn transformers. At the time of prediction, the
output will be reshaped to a have the same number of dimension as ``y``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*dimensions

regr = TransformedTargetRegressor(regressor=LinearRegression(),
func=np.sqrt, inverse_func=np.log,
check_inverse=False)
# the transformer/functions are not checked to be invertible the fitting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd drop this comment, but would make it clearer by replacing the previous statement with regr.set_params(check_inverse=False)

y_tran = regr.transformer_.transform(y)
assert_allclose(np.log(y), y_tran)
assert_allclose(y, regr.transformer_.inverse_transform(y_tran))
assert_equal(y.shape, y_pred.shape)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with pytest, we can just use a bare assert y.shape == y_pred.shape, which I find much more legible.

assert_allclose(regr.regressor_.coef_, lr.coef_)


def test_transform_target_regressor_1d_transformer_multioutput():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is hard to see how similar or different the code is here from the previous test. Perhaps use a loop or or a check function or pytest.mark.parametrize

assert_allclose(regr.regressor_.coef_, lr.coef_)


def test_transform_target_regressor_2d_transformer_multioutput():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here that it's hard to see how similar or different the tests are.

# check that the target ``y`` passed to the transformer will always be a
# numpy array
X, y = friedman
tt = TransformedTargetRegressor(transformer=DummyTransformer(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please check similarly that the predictor receives X as a list? Thanks.

@glemaitre
Copy link
Member Author

Done

@amueller
Copy link
Member

Is the +1 from me or @jnothman or @ogrisel? hm...

@amueller
Copy link
Member

the real world example is not very convincing... though I'm not sure there's a better one with the data we have....

@jnothman
Copy link
Member

It's your +1 in the title, I think, or at least it's not mine...

@jnothman
Copy link
Member

I think this is also just waiting on where to put in...

@glemaitre
Copy link
Member Author

glemaitre commented Dec 13, 2017 via email

@amueller
Copy link
Member

I think it's fine where it is ;)

@amueller
Copy link
Member

We can always move before the release, I think delaying features for module naming bike-shedding will get us in trouble...

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Dec 13, 2017 via email

@jnothman jnothman changed the title [MRG + 1] TransformedTargetRegressor [MRG+2] TransformedTargetRegressor Dec 13, 2017
@jnothman
Copy link
Member

Let's do it! Thanks, @glemaitre.

@jnothman jnothman merged commit 4f710cd into scikit-learn:master Dec 13, 2017
@amueller
Copy link
Member

One less hack for my class! This is moving forward quite nicely lol. (PowerTransformer was another). Can we do KNN imputation, missing value features and ColumnTransformer next? Oh and blanced random forests (though actually imblearn has it now :)? I think then I'm good... just need to implement a decent time series library in python, or something...

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
Add a new meta-regressor which transforms y for training.
@jnothman jnothman moved this from In progress to Done in API and interoperability Jan 11, 2018
@@ -77,6 +77,11 @@ Model evaluation
- Added :class:`multioutput.RegressorChain` for multi-target
regression. :issue:`9257` by :user:`Kumar Ashutosh <thechargedneutron>`.

- Added the :class:`preprocessing.TransformedTargetRegressor` which transforms
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason this had disappeared from what's new and I've just reinserted it :\

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

8 participants