Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG + 1] ENH add check_inverse in FunctionTransformer #9399

Merged
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
4659cb4
EHN add check_inverse in FunctionTransformer
glemaitre Jul 18, 2017
72d3c54
Add whats new entry and short narrative doc
glemaitre Jul 18, 2017
df07603
Sparse support
glemaitre Jul 18, 2017
9a5777c
better handle sparse data
glemaitre Jul 19, 2017
bd7ad2f
Address andreas comments
glemaitre Jul 21, 2017
5c1851b
PEP8
glemaitre Jul 21, 2017
4fd988c
Merge branch 'master' into check_inverse_function_transformer
glemaitre Jul 26, 2017
3a764a7
Absolute tolerance default
glemaitre Jul 26, 2017
586e8ca
DOC fix docstring
glemaitre Jul 27, 2017
43f876c
Remove random state and make check_inverse deterministic
glemaitre Jul 27, 2017
f3c0d10
FIX remove random_state from init
glemaitre Jul 31, 2017
7a19979
PEP8
glemaitre Jul 31, 2017
e59f493
DOC motivation for the inverse
glemaitre Aug 1, 2017
6cb5b5d
make check_inverse=True default with a warning
glemaitre Aug 2, 2017
72e2005
PEP8
glemaitre Aug 2, 2017
45e0cb3
Merge remote-tracking branch 'origin/master' into check_inverse_funct…
glemaitre Aug 2, 2017
afdeca7
FIX get back X from check_array
glemaitre Aug 2, 2017
e4045a1
Andread comments
glemaitre Aug 2, 2017
c8c23fa
Merge branch 'master' into check_inverse_function_transformer
glemaitre Aug 17, 2017
4276618
Update whats new
glemaitre Aug 17, 2017
0297a4a
remove blank line
glemaitre Aug 17, 2017
677cd2a
joel s comments
glemaitre Aug 17, 2017
cec6f53
no check if one of forward or inverse not provided
glemaitre Aug 17, 2017
5238a33
DOC fixes and example of filterwarnings
glemaitre Aug 18, 2017
31abd47
DOC fix warningfiltering
glemaitre Aug 22, 2017
4d31e52
Merge remote-tracking branch 'origin/master' into check_inverse_funct…
glemaitre Oct 25, 2017
65b134a
DOC fix merge error git
glemaitre Oct 25, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 3 additions & 0 deletions doc/modules/preprocessing.rst
Expand Up @@ -610,6 +610,9 @@ a transformer that applies a log transformation in a pipeline, do::
array([[ 0. , 0.69314718],
[ 1.09861229, 1.38629436]])

We can ensure that ``func`` and ``inverse_func`` are the inverse of each other
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say "You" but doesn't matter ;)

by setting ``check_inverse=True``.

For a full code example that demonstrates using a :class:`FunctionTransformer`
to do custom feature selection,
see :ref:`sphx_glr_auto_examples_preprocessing_plot_function_transformer.py`
18 changes: 16 additions & 2 deletions doc/whats_new.rst
Expand Up @@ -5,11 +5,25 @@
Release history
===============

Version 0.19
Version 0.20
============

**In Development**

Changelog
---------

Enhancements
............

- A parameter ``check_inverse`` was added to :class:`FunctionTransformer`
to ensure that ``func`` and ``inverse_func`` are the inverse of each
other.
:issue:`9399` by :user:`Guillaume Lemaitre <glemaitre>`.

Version 0.19
============

Highlights
----------

Expand Down Expand Up @@ -490,7 +504,7 @@ Decomposition, manifold learning and clustering
in :class:`decomposition.PCA`,
:class:`decomposition.RandomizedPCA` and
:class:`decomposition.IncrementalPCA`.
:issue:`9105` by `Hanmin Qin <https://github.com/qinhanmin2014>`_.
:issue:`9105` by `Hanmin Qin <https://github.com/qinhanmin2014>`_.

- Fixed a bug where :class:`cluster.DBSCAN` gives incorrect
result when input is a precomputed sparse matrix with initial
Expand Down
45 changes: 42 additions & 3 deletions sklearn/preprocessing/_function_transformer.py
@@ -1,7 +1,8 @@
import warnings

from ..base import BaseEstimator, TransformerMixin
from ..utils import check_array
from ..utils import check_array, check_random_state, safe_indexing
from ..utils.testing import assert_allclose_dense_sparse
from ..externals.six import string_types


Expand Down Expand Up @@ -59,23 +60,59 @@ class FunctionTransformer(BaseEstimator, TransformerMixin):

.. deprecated::0.19

check_inverse : bool, (default=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spurious comma ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 23 above (can't put a comment there), it is stated that "A FunctionTransformer will not do any checks on its function's output.", which is still correct for the default, but might get an update to mention check_inverse ?

Whether to check that ``transform`` followed by ``inverse_transform``
or ``func`` followed by ``inverse_func`` leads to the original inputs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mentioning both transform/inverse_transform and func/inverse_func is a leftover from copying from the other PR?
As here you only have func and inverse_func as kwargs.


.. versionadded:: 0.20

kw_args : dict, optional
Dictionary of additional keyword arguments to pass to func.

inv_kw_args : dict, optional
Dictionary of additional keyword arguments to pass to inverse_func.

random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by np.random. Note that this is used to compute if func and
inverse_func are the inverse of each other.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would clarify that this is only done for check_inverse=True



"""
def __init__(self, func=None, inverse_func=None, validate=True,
accept_sparse=False, pass_y='deprecated',
kw_args=None, inv_kw_args=None):
accept_sparse=False, pass_y='deprecated', check_inverse=False,
kw_args=None, inv_kw_args=None, random_state=None):
self.func = func
self.inverse_func = inverse_func
self.validate = validate
self.accept_sparse = accept_sparse
self.pass_y = pass_y
self.check_inverse = check_inverse
self.kw_args = kw_args
self.inv_kw_args = inv_kw_args
self.random_state = random_state

def _validate_inverse(self, X):
"""Check that func and inverse_func are the inverse."""
random_state = check_random_state(self.random_state)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utils.resample?

n_subsample = min(100, X.shape[0])
subsample_idx = random_state.choice(range(X.shape[0]),
size=n_subsample,
replace=False)

X_sel = safe_indexing(X, subsample_idx)
print(subsample_idx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print?

try:
assert_allclose_dense_sparse(
X_sel, self.inverse_transform(self.transform(X_sel)),
atol=1e-7)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not the default?

except AssertionError:
raise ValueError("The provided functions are not strictly"
" inverse of each other. If you are sure you"
" want to proceed regardless, set"
" 'check_inverse=False'")

def fit(self, X, y=None):
"""Fit transformer by checking X.
Expand All @@ -93,6 +130,8 @@ def fit(self, X, y=None):
"""
if self.validate:
check_array(X, self.accept_sparse)
if self.check_inverse:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and self.inverse_func is not None

self._validate_inverse(X)
return self

def transform(self, X, y='deprecated'):
Expand Down
36 changes: 34 additions & 2 deletions sklearn/preprocessing/tests/test_function_transformer.py
@@ -1,8 +1,10 @@
import numpy as np
from scipy import sparse

from sklearn.preprocessing import FunctionTransformer
from sklearn.utils.testing import assert_equal, assert_array_equal
from sklearn.utils.testing import assert_warns_message
from sklearn.utils.testing import (assert_equal, assert_array_equal,
assert_allclose_dense_sparse)
from sklearn.utils.testing import assert_warns_message, assert_raises_regex


def _make_func(args_store, kwargs_store, func=lambda X, *a, **k: X):
Expand Down Expand Up @@ -126,3 +128,33 @@ def test_inverse_transform():
F.inverse_transform(F.transform(X)),
np.around(np.sqrt(X), decimals=3),
)


def test_check_inverse():
X_dense = np.array([1, 4, 9, 16], dtype=np.float64).reshape((2, 2))

X_list = [X_dense,
sparse.csr_matrix(X_dense),
sparse.csc_matrix(X_dense)]

for X in X_list:
print(X)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print?

if sparse.issparse(X):
accept_sparse = True
else:
accept_sparse = False
trans = FunctionTransformer(func=np.sqrt,
inverse_func=np.around,
accept_sparse=accept_sparse,
check_inverse=True)
assert_raises_regex(ValueError, "The provided functions are not"
" strictly inverse of each other. If you are sure"
" you want to proceed regardless, set"
" 'check_inverse=False'",
trans.fit, X)
trans = FunctionTransformer(func=np.expm1,
inverse_func=np.log1p,
accept_sparse=accept_sparse,
check_inverse=True)
Xt = trans.fit_transform(X)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert_no_warning?

assert_allclose_dense_sparse(X, trans.inverse_transform(Xt))