Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Added inverse_transform for pls base object and test #15304

Merged
merged 19 commits into from Oct 28, 2019

Conversation

@jiwidi
Copy link
Contributor

jiwidi commented Oct 20, 2019

Reference Issues/PRs

This PR is the same as #15289 but with squeezed commits since I failed with docstring syntax.

What does this implement/fix? Explain your changes.

With this PR I'm adding a function inverse_transform to the base _PLS object and the consecuent test. This function is already implemented in other decomposition methods like PCA here.

The function transforms back to the original space by multiplying the transformed data with the x_loadings as X_original = np.matmul(X, self.x_loadings_.T) and denormalizes the result with the model std and mean.

This function allows you to transform back data to the original space (this transformation will only be exact if n_components=n_features). This transformation is widely used to compute the Squared Prediction Error of our _PLS model. This metric is famous for its use in industry scenarios where PLS acts as a statistical model to control processes where time takes a big act (papers on this: 1, 2 )

Following Sklearn _PLS example this is how the function should be used:

from sklearn.cross_decomposition import PLSRegression
X = [[0., 0., 1.], [1.,0.,0.], [2.,2.,2.], [2.,5.,4.]]
Y = [[0.1, -0.2], [0.9, 1.1], [6.2, 5.9], [11.9, 12.3]]
pls2 = PLSRegression(n_components=2)
pls2.fit(X, Y)


t = [0., 0., 1.]
Y_pred = pls2.transform([t])
X_reconstructed = pls2.inverse_transform(Y_pred) # Value will be [0.02257852, 0.11391906, 0.87805722]

And a example to showcase the correctness of the function with n_components==n_features

from sklearn.cross_decomposition import PLSRegression
X = [[0., 0., 1.], [1.,0.,0.], [2.,2.,2.], [2.,5.,4.]]
Y = [[0.1, -0.2], [0.9, 1.1], [6.2, 5.9], [11.9, 12.3]]
pls2 = PLSRegression(n_components=3)
pls2.fit(X, Y)


t = [0., 0., 1.]
Y_pred = pls2.transform([t])
X_reconstructed = pls2.inverse_transform(Y_pred) # Value will be [0, 0, 1]

Any other comments?

I have been developing software for multivariate statistical process control for some time now and Sklearn implementation of PLS has been widely used in this field. I always thought the _PLS was lacking from this method while PCA had it and decided to make a contribution for it :)

@jiwidi

This comment has been minimized.

Copy link
Contributor Author

jiwidi commented Oct 20, 2019

ping @NicolasHug @adrinjalali Since I have seen you guys both touch _pls files recently.

@jiwidi jiwidi changed the title Added inverse_Transform for pls base object and test [WIP] Added inverse_Transform for pls base object and test Oct 20, 2019
Copy link
Member

adrinjalali left a comment

Some nitpicks, otherwise looks good to me, thanks @jiwidi

sklearn/cross_decomposition/_pls_.py Outdated Show resolved Hide resolved
sklearn/cross_decomposition/_pls_.py Show resolved Hide resolved
sklearn/cross_decomposition/_pls_.py Outdated Show resolved Hide resolved
sklearn/cross_decomposition/_pls_.py Outdated Show resolved Hide resolved
sklearn/cross_decomposition/_pls_.py Show resolved Hide resolved
sklearn/cross_decomposition/_pls_.py Outdated Show resolved Hide resolved
Notes
This transformation will only be exact if n_components=n_features
-----
Comment on lines 438 to 440

This comment has been minimized.

Copy link
@adrinjalali

adrinjalali Oct 21, 2019

Member
        Notes
        -----
        This transformation will only be exact if n_components=n_features
check_is_fitted(self)
X = check_array(X, copy=False, dtype=FLOAT_DTYPES)
# From pls space to original space
X_original = np.matmul(X, self.x_loadings_.T)

This comment has been minimized.

Copy link
@adrinjalali

adrinjalali Oct 21, 2019

Member
Suggested change
X_original = np.matmul(X, self.x_loadings_.T)
np.matmul(X, self.x_loadings_.T, out=X)

this way you can do it in place.

You should also add a test for this case.

This comment has been minimized.

Copy link
@jiwidi

jiwidi Oct 22, 2019

Author Contributor

So I came up with this:

def inverse_transform(self, X, copy=True):
        check_is_fitted(self)
        X = check_array(X, copy=copy, dtype=FLOAT_DTYPES)
        # From pls space to original space
        np.matmul(X, self.x_loadings_.T, out=X)

        # Denormalize
        X *= self.x_std_
        X += self.x_mean_
        return X

And the extra test:

    # Check inplace usage
    plsca.inverse_transform(Xr,copy=False)
    assert_array_almost_equal(Xr, X,
                              err_msg="inverse_transform failed")

I tested it and it works with both inplace. Just checking this is what you meant before commiting it :)

This comment has been minimized.

Copy link
@adrinjalali

adrinjalali Oct 23, 2019

Member

yeah this looks good to me, I think.

jiwidi added 4 commits Oct 23, 2019
@jiwidi

This comment has been minimized.

Copy link
Contributor Author

jiwidi commented Oct 23, 2019

@adrinjalali newbie question here. All the last commits have failed to run the test tho I can not replicate the same error on local (they pass here). Is there any way to debug this in an easier way rather than just committing and make the test run every time? Thanks!

@adrinjalali

This comment has been minimized.

Copy link
Member

adrinjalali commented Oct 23, 2019

The failing tests are all on python 3.5. You should create a similar environment and see why they fail. You can see the environment on top of the failing tests. For example:

##[section]Starting: Test Library
==============================================================================
Task         : Command line
Description  : Run a command line script using Bash on Linux and macOS and cmd.exe on Windows
Version      : 2.151.2
Author       : Microsoft Corporation
Help         : https://docs.microsoft.com/azure/devops/pipelines/tasks/utility/command-line
==============================================================================
Generating script.
Script contents:
build_tools/azure/test_script.sh
========================== Starting Command Output ===========================
[command]/bin/bash --noprofile --norc /home/vsts/work/_temp/d1d9cf55-3eb0-4aa7-af95-8ce64d06a52e.sh
Python 3.5.2
numpy 1.11.0
scipy 0.17.0
pandas not installed
2 CPUs
Package             Version                Location           
------------------- ---------------------- -------------------
atomicwrites        1.3.0                  
attrs               19.3.0                 
blinker             1.3                    
chardet             2.3.0                  
cloud-init          19.2                   
command-not-found   0.3                    
configobj           5.0.6                  
coverage            4.5.4                  
cryptography        1.2.3                  
cycler              0.9.0                  
Cython              0.29.13                
decorator           4.0.6                  
idna                2.0                    
importlib-metadata  0.23                   
Jinja2              2.8                    
joblib              0.11                   
jsonpatch           1.10                   
jsonpointer         1.9                    
language-selector   0.1                    
MarkupSafe          0.23                   
matplotlib          1.5.1                  
more-itertools      7.2.0                  
numpy               1.11.0                 
oauthlib            1.0.3                  
pathlib2            2.3.5                  
Pillow              3.1.2                  
pip                 19.3.1                 
pkg-resources       0.0.0                  
pluggy              0.13.0                 
prettytable         0.7.2                  
py                  1.8.0                  
pyasn1              0.1.9                  
pycurl              7.43.0                 
pygobject           3.20.0                 
PyJWT               1.3.0                  
pyparsing           2.0.3                  
pyserial            3.0.1                  
pytest              3.8.1                  
pytest-cov          2.8.1                  
python-apt          1.1.0b1+ubuntu0.16.4.5 
python-dateutil     2.4.2                  
python-debian       0.1.27                 
python-systemd      231                    
pytz                2014.10                
PyYAML              3.11                   
requests            2.9.1                  
scikit-learn        0.22.dev0              /home/vsts/work/1/s
scipy               0.17.0                 
setuptools          41.4.0                 
six                 1.10.0                 
ssh-import-id       5.5                    
ufw                 0.35                   
unattended-upgrades 0.1                    
urllib3             1.13.1                 
virtualenv          15.0.1                 
WALinuxAgent        2.2.40                 
wheel               0.33.6                 
zipp                0.6.0             
@jiwidi

This comment has been minimized.

Copy link
Contributor Author

jiwidi commented Oct 24, 2019

@adrinjalali I have been thinking about the inplace support for this function. It's a different behavior from the transform() inplace, where if you select copy=False the only change reflected to X would be the normalization:

def transform(self, X, Y=None, copy=True):
        """Apply the dimension reduction learned on the train data.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            Training vectors, where n_samples is the number of samples and
            n_features is the number of predictors.

        Y : array-like of shape (n_samples, n_targets)
            Target vectors, where n_samples is the number of samples and
            n_targets is the number of response variables.

        copy : boolean, default True
            Whether to copy X and Y, or perform in-place normalization.

        Returns
        -------
        x_scores if Y is not given, (x_scores, y_scores) otherwise.
        """
        check_is_fitted(self)
        X = check_array(X, copy=copy, dtype=FLOAT_DTYPES)
        # Normalize
        X -= self.x_mean_
        X /= self.x_std_
        # Apply rotation
        x_scores = np.dot(X, self.x_rotations_)
        if Y is not None:
            Y = check_array(Y, ensure_2d=False, copy=copy, dtype=FLOAT_DTYPES)
            if Y.ndim == 1:
                Y = Y.reshape(-1, 1)
            Y -= self.y_mean_
            Y /= self.y_std_
            y_scores = np.dot(Y, self.y_rotations_)
            return x_scores, y_scores

        return x_scores

In inverse_transform the denormalization is computed on the matrix result and not in the X parameter.

    def inverse_transform(self, X, copy=True):
        """Transform data back to its original space.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_components)
            New data, where n_samples is the number of samples
            and n_components is the number of pls components.
        copy : bool, default=True
            Whether to copy X, or perform in-place normalization.

        Returns
        -------
        X_original array-like of shape (n_samples, n_features)

        Notes
        -----
        This transformation will only be exact if n_components=n_features
        """
        check_is_fitted(self)
        X = check_array(X, copy=copy, dtype=FLOAT_DTYPES)
        # From pls space to original space
        np.matmul(X, self.x_loadings_.T, out=X)

        # Denormalize
        X *= self.x_std_
        X += self.x_mean_
        return X

And by contrast in PCA inverse_transform they seem to not support the inplace functionality.

It doesn't feel natural to support inplace functionality here if it isn't supported in other decomposition methods. What do you think? Or you see value to this inplace feature? (I'll be happy to work in a PR to add it for PCA).


Coming back to the tests, I was able to reproduce the failure test. This is happening because np.matmul(X, self.x_loadings_.T, out=X) is following the known error with numpy==1.11.0 where if the out variable is the same as input it will return a zero matrix (tested by myself).

With numpy==1.17 the test pass as this bug was fixed. Is there a specific reason why numpy==1.11 is being used? Should I push for a solution that solves that bug or ignore the inplace feature?

I believe to have found a solution to bypass this bug and it would be by just hard copying X.
Rewrite the matmul: to np.matmul(X.copy(), self.x_loadings_.T, out=X)

And thanks again for the feedback, I'm learning some new stuff with this PR :)

@adrinjalali

This comment has been minimized.

Copy link
Member

adrinjalali commented Oct 24, 2019

Numpy 1.11 is the oldest numpy we support. We'll be dropping support for that version soon, but not yet.

As you suggest, I recommend you remove the copy parameter, and always create a new array then. It would also avoid the issue, since you won't be using the out parameter anyway.

@jiwidi

This comment has been minimized.

Copy link
Contributor Author

jiwidi commented Oct 24, 2019

Numpy 1.11 is the oldest numpy we support. We'll be dropping support for that version soon, but not yet.

As you suggest, I recommend you remove the copy parameter, and always create a new array then. It would also avoid the issue, since you won't be using the out parameter anyway.

This will mean dropping inplace support as well. There will be no reason to keep the copy parameter. You okay with this?

@@ -440,14 +440,14 @@ def inverse_transform(self, X, copy=True):
This transformation will only be exact if n_components=n_features
"""
check_is_fitted(self)
X = check_array(X, copy=copy, dtype=FLOAT_DTYPES)
X = check_array(X, copy=True, dtype=FLOAT_DTYPES)

This comment has been minimized.

Copy link
@adrinjalali

adrinjalali Oct 24, 2019

Member

you don't need to copy here. Just leave the copy as default.

This comment has been minimized.

Copy link
@jiwidi

jiwidi Oct 24, 2019

Author Contributor

Does it make sense to accept the copy parameter even if it will not affect the parameter X? I was thinking on removing copy from the function arg list and only accept X (and making it copy=True x default)

Wether copy is true or false will not affect the execution or trace it leaves on the parameters.

This comment has been minimized.

Copy link
@adrinjalali

adrinjalali Oct 25, 2019

Member

yes that's what I meant.

@jiwidi

This comment has been minimized.

Copy link
Contributor Author

jiwidi commented Oct 25, 2019

@adrinjalali Last commit should be ready to merge version I believe

Copy link
Member

adrinjalali left a comment

Otherwise LGTM. You also need a whats_new entry for this one.

sklearn/cross_decomposition/_pls_.py Outdated Show resolved Hide resolved
@adrinjalali adrinjalali changed the title [WIP] Added inverse_Transform for pls base object and test Added inverse_Transform for pls base object and test Oct 25, 2019
Co-Authored-By: Adrin Jalali <adrin.jalali@gmail.com>
@jiwidi

This comment has been minimized.

Copy link
Contributor Author

jiwidi commented Oct 25, 2019

Otherwise LGTM. You also need a whats_new entry for this one.

You mean a whats_new entry on the PR text? Doesn't What does this implement/fix? Explain your changes. Refer to that?

@adrinjalali

This comment has been minimized.

Copy link
Member

adrinjalali commented Oct 25, 2019

Sorry for not being clear in my previous comment. I meant an entry in the v0.22.rst file like the other ones.

@jiwidi

This comment has been minimized.

Copy link
Contributor Author

jiwidi commented Oct 25, 2019

Sorry for not being clear in my previous comment. I meant an entry in the v0.22.rst file like the other ones.

no problem. It should ready to merge now :)

- |Feature| :class:`cross_decomposition._PLS` Has a new function :func:`cross_decomposition._PLS.inverse_transform` to transform
data to the original space`. :pr:`15304` by :user:`Jaime Ferrando Huertas <jiwidi>`.

Comment on lines 121 to 123

This comment has been minimized.

Copy link
@adrinjalali

adrinjalali Oct 25, 2019

Member

Please keep the line length <80.

Also, we should point to the public API here, i.e. list the classes which inherit from _PLS instead.

This comment has been minimized.

Copy link
@jiwidi

jiwidi Oct 25, 2019

Author Contributor

something like this maybe?

- |Feature| :class:`cross_decomposition.PLSCanonical` and 
  :class:`cross_decomposition.PLSRegression` 
  have a new function `inverse_transform` 
  :func:`cross_decomposition.PLSCanonical.inverse_transform`
  :func:`cross_decomposition.PLSRegression.inverse_transform` 
  to transform data to the original space`.
  :pr:`15304` by :user:`Jaime Ferrando Huertas <jiwidi>`.

This comment has been minimized.

Copy link
@adrinjalali

adrinjalali Oct 25, 2019

Member

Yeah, I'd do something like:

- |Feature| :class:`cross_decomposition.PLSCanonical` and
  :class:`cross_decomposition.PLSRegression` have a new function
  ``inverse_transform`` to transform data to the original space`.
  :pr:`15304` by :user:`Jaime Ferrando Huertas <jiwidi>`.
Copy link
Member

adrinjalali left a comment

Thanks @jiwidi for your prompt responses. Now we need to wait for another maintainer to do a second review, and hopefully it'll get in soon :)

@adrinjalali adrinjalali changed the title Added inverse_Transform for pls base object and test [MRG] Added inverse_Transform for pls base object and test Oct 25, 2019
@jiwidi

This comment has been minimized.

Copy link
Contributor Author

jiwidi commented Oct 25, 2019

Thanks @jiwidi for your prompt responses. Now we need to wait for another maintainer to do a second review, and hopefully it'll get in soon :)

Thank you for all the feedback! :)

Copy link
Member

jnothman left a comment

Otherwise LGTM!

Returns
-------
x_reconstructed array-like of shape (n_samples, n_features)

This comment has been minimized.

Copy link
@jnothman

jnothman Oct 27, 2019

Member
Suggested change
x_reconstructed array-like of shape (n_samples, n_features)
x_reconstructed : array-like of shape (n_samples, n_features)

This comment has been minimized.

Copy link
@jnothman

jnothman Oct 28, 2019

Member

You removed this colon...

This comment has been minimized.

Copy link
@jiwidi

jiwidi Oct 28, 2019

Author Contributor

my bad, fixed.

@jnothman jnothman changed the title [MRG] Added inverse_Transform for pls base object and test [MRG] Added inverse_transform for pls base object and test Oct 27, 2019
jiwidi and others added 4 commits Oct 27, 2019
Co-Authored-By: Joel Nothman <joel.nothman@gmail.com>
…n into pls_inverse-transform
Copy link
Member

jnothman left a comment

I don't think reverting the inclusion of the colon will fix CI... It seems something is wrong with installation at the moment, not your fault

@@ -0,0 +1,224 @@
"""
This is a backport of assertRaises() and assertRaisesRegex from Python 3.5.4

This comment has been minimized.

Copy link
@jnothman

jnothman Oct 28, 2019

Member

Why has this been added?

This comment has been minimized.

Copy link
@jiwidi

jiwidi Oct 28, 2019

Author Contributor

This is my fault, I pulled from origin and messed things up... Tha's why I tried reverting to the commit before all of this happened

This comment has been minimized.

Copy link
@jiwidi

jiwidi Oct 28, 2019

Author Contributor

I removed the files now, sorry for the mess

Returns
-------
x_reconstructed array-like of shape (n_samples, n_features)

This comment has been minimized.

Copy link
@jnothman

jnothman Oct 28, 2019

Member

You removed this colon...

@jiwidi jiwidi requested a review from jnothman Oct 28, 2019
@jnothman jnothman merged commit 89b15e1 into scikit-learn:master Oct 28, 2019
17 of 19 checks passed
17 of 19 checks passed
scikit-learn.scikit-learn Build #20191028.25 failed
Details
scikit-learn.scikit-learn (Linux pylatest_pip_openblas_pandas) Linux pylatest_pip_openblas_pandas failed
Details
LGTM analysis: C/C++ No code changes detected
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python No new or fixed alerts
Details
ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: doc Your tests passed on CircleCI!
Details
ci/circleci: doc artifact Link to 0/doc/_changed.html
Details
ci/circleci: doc-min-dependencies Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 97.21%)
Details
codecov/project Absolute coverage decreased by -<.01% but relative coverage increased by +2.78% compared to bbc2369
Details
scikit-learn.scikit-learn (Linux py35_conda_openblas) Linux py35_conda_openblas succeeded
Details
scikit-learn.scikit-learn (Linux py35_ubuntu_atlas) Linux py35_ubuntu_atlas succeeded
Details
scikit-learn.scikit-learn (Linux pylatest_conda_mkl) Linux pylatest_conda_mkl succeeded
Details
scikit-learn.scikit-learn (Linux32 py35_ubuntu_atlas_32bit) Linux32 py35_ubuntu_atlas_32bit succeeded
Details
scikit-learn.scikit-learn (Windows py35_pip_openblas_32bit) Windows py35_pip_openblas_32bit succeeded
Details
scikit-learn.scikit-learn (Windows py37_conda_mkl) Windows py37_conda_mkl succeeded
Details
scikit-learn.scikit-learn (macOS pylatest_conda_mkl) macOS pylatest_conda_mkl succeeded
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.