Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Added inverse_transform for pls base object and test #15304

Merged
merged 19 commits into from Oct 28, 2019

Conversation

jiwidi
Copy link
Contributor

@jiwidi jiwidi commented Oct 20, 2019

Reference Issues/PRs

This PR is the same as #15289 but with squeezed commits since I failed with docstring syntax.

What does this implement/fix? Explain your changes.

With this PR I'm adding a function inverse_transform to the base _PLS object and the consecuent test. This function is already implemented in other decomposition methods like PCA here.

The function transforms back to the original space by multiplying the transformed data with the x_loadings as X_original = np.matmul(X, self.x_loadings_.T) and denormalizes the result with the model std and mean.

This function allows you to transform back data to the original space (this transformation will only be exact if n_components=n_features). This transformation is widely used to compute the Squared Prediction Error of our _PLS model. This metric is famous for its use in industry scenarios where PLS acts as a statistical model to control processes where time takes a big act (papers on this: 1, 2 )

Following Sklearn _PLS example this is how the function should be used:

from sklearn.cross_decomposition import PLSRegression
X = [[0., 0., 1.], [1.,0.,0.], [2.,2.,2.], [2.,5.,4.]]
Y = [[0.1, -0.2], [0.9, 1.1], [6.2, 5.9], [11.9, 12.3]]
pls2 = PLSRegression(n_components=2)
pls2.fit(X, Y)


t = [0., 0., 1.]
Y_pred = pls2.transform([t])
X_reconstructed = pls2.inverse_transform(Y_pred) # Value will be [0.02257852, 0.11391906, 0.87805722]

And a example to showcase the correctness of the function with n_components==n_features

from sklearn.cross_decomposition import PLSRegression
X = [[0., 0., 1.], [1.,0.,0.], [2.,2.,2.], [2.,5.,4.]]
Y = [[0.1, -0.2], [0.9, 1.1], [6.2, 5.9], [11.9, 12.3]]
pls2 = PLSRegression(n_components=3)
pls2.fit(X, Y)


t = [0., 0., 1.]
Y_pred = pls2.transform([t])
X_reconstructed = pls2.inverse_transform(Y_pred) # Value will be [0, 0, 1]

Any other comments?

I have been developing software for multivariate statistical process control for some time now and Sklearn implementation of PLS has been widely used in this field. I always thought the _PLS was lacking from this method while PCA had it and decided to make a contribution for it :)

@jiwidi
Copy link
Contributor Author

@jiwidi jiwidi commented Oct 20, 2019

ping @NicolasHug @adrinjalali Since I have seen you guys both touch _pls files recently.

@jiwidi jiwidi changed the title Added inverse_Transform for pls base object and test [WIP] Added inverse_Transform for pls base object and test Oct 20, 2019
Copy link
Member

@adrinjalali adrinjalali left a comment

Some nitpicks, otherwise looks good to me, thanks @jiwidi

sklearn/cross_decomposition/_pls_.py Outdated Show resolved Hide resolved
sklearn/cross_decomposition/_pls_.py Show resolved Hide resolved
sklearn/cross_decomposition/_pls_.py Outdated Show resolved Hide resolved
sklearn/cross_decomposition/_pls_.py Outdated Show resolved Hide resolved
sklearn/cross_decomposition/_pls_.py Show resolved Hide resolved
sklearn/cross_decomposition/_pls_.py Outdated Show resolved Hide resolved
Notes
This transformation will only be exact if n_components=n_features
-----
Copy link
Member

@adrinjalali adrinjalali Oct 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

        Notes
        -----
        This transformation will only be exact if n_components=n_features

check_is_fitted(self)
X = check_array(X, copy=False, dtype=FLOAT_DTYPES)
# From pls space to original space
X_original = np.matmul(X, self.x_loadings_.T)
Copy link
Member

@adrinjalali adrinjalali Oct 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
X_original = np.matmul(X, self.x_loadings_.T)
np.matmul(X, self.x_loadings_.T, out=X)

this way you can do it in place.

You should also add a test for this case.

Copy link
Contributor Author

@jiwidi jiwidi Oct 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I came up with this:

def inverse_transform(self, X, copy=True):
        check_is_fitted(self)
        X = check_array(X, copy=copy, dtype=FLOAT_DTYPES)
        # From pls space to original space
        np.matmul(X, self.x_loadings_.T, out=X)

        # Denormalize
        X *= self.x_std_
        X += self.x_mean_
        return X

And the extra test:

    # Check inplace usage
    plsca.inverse_transform(Xr,copy=False)
    assert_array_almost_equal(Xr, X,
                              err_msg="inverse_transform failed")

I tested it and it works with both inplace. Just checking this is what you meant before commiting it :)

Copy link
Member

@adrinjalali adrinjalali Oct 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this looks good to me, I think.

@jiwidi
Copy link
Contributor Author

@jiwidi jiwidi commented Oct 23, 2019

@adrinjalali newbie question here. All the last commits have failed to run the test tho I can not replicate the same error on local (they pass here). Is there any way to debug this in an easier way rather than just committing and make the test run every time? Thanks!

@adrinjalali
Copy link
Member

@adrinjalali adrinjalali commented Oct 23, 2019

The failing tests are all on python 3.5. You should create a similar environment and see why they fail. You can see the environment on top of the failing tests. For example:

##[section]Starting: Test Library
==============================================================================
Task         : Command line
Description  : Run a command line script using Bash on Linux and macOS and cmd.exe on Windows
Version      : 2.151.2
Author       : Microsoft Corporation
Help         : https://docs.microsoft.com/azure/devops/pipelines/tasks/utility/command-line
==============================================================================
Generating script.
Script contents:
build_tools/azure/test_script.sh
========================== Starting Command Output ===========================
[command]/bin/bash --noprofile --norc /home/vsts/work/_temp/d1d9cf55-3eb0-4aa7-af95-8ce64d06a52e.sh
Python 3.5.2
numpy 1.11.0
scipy 0.17.0
pandas not installed
2 CPUs
Package             Version                Location           
------------------- ---------------------- -------------------
atomicwrites        1.3.0                  
attrs               19.3.0                 
blinker             1.3                    
chardet             2.3.0                  
cloud-init          19.2                   
command-not-found   0.3                    
configobj           5.0.6                  
coverage            4.5.4                  
cryptography        1.2.3                  
cycler              0.9.0                  
Cython              0.29.13                
decorator           4.0.6                  
idna                2.0                    
importlib-metadata  0.23                   
Jinja2              2.8                    
joblib              0.11                   
jsonpatch           1.10                   
jsonpointer         1.9                    
language-selector   0.1                    
MarkupSafe          0.23                   
matplotlib          1.5.1                  
more-itertools      7.2.0                  
numpy               1.11.0                 
oauthlib            1.0.3                  
pathlib2            2.3.5                  
Pillow              3.1.2                  
pip                 19.3.1                 
pkg-resources       0.0.0                  
pluggy              0.13.0                 
prettytable         0.7.2                  
py                  1.8.0                  
pyasn1              0.1.9                  
pycurl              7.43.0                 
pygobject           3.20.0                 
PyJWT               1.3.0                  
pyparsing           2.0.3                  
pyserial            3.0.1                  
pytest              3.8.1                  
pytest-cov          2.8.1                  
python-apt          1.1.0b1+ubuntu0.16.4.5 
python-dateutil     2.4.2                  
python-debian       0.1.27                 
python-systemd      231                    
pytz                2014.10                
PyYAML              3.11                   
requests            2.9.1                  
scikit-learn        0.22.dev0              /home/vsts/work/1/s
scipy               0.17.0                 
setuptools          41.4.0                 
six                 1.10.0                 
ssh-import-id       5.5                    
ufw                 0.35                   
unattended-upgrades 0.1                    
urllib3             1.13.1                 
virtualenv          15.0.1                 
WALinuxAgent        2.2.40                 
wheel               0.33.6                 
zipp                0.6.0             

@jiwidi
Copy link
Contributor Author

@jiwidi jiwidi commented Oct 24, 2019

@adrinjalali I have been thinking about the inplace support for this function. It's a different behavior from the transform() inplace, where if you select copy=False the only change reflected to X would be the normalization:

def transform(self, X, Y=None, copy=True):
        """Apply the dimension reduction learned on the train data.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            Training vectors, where n_samples is the number of samples and
            n_features is the number of predictors.

        Y : array-like of shape (n_samples, n_targets)
            Target vectors, where n_samples is the number of samples and
            n_targets is the number of response variables.

        copy : boolean, default True
            Whether to copy X and Y, or perform in-place normalization.

        Returns
        -------
        x_scores if Y is not given, (x_scores, y_scores) otherwise.
        """
        check_is_fitted(self)
        X = check_array(X, copy=copy, dtype=FLOAT_DTYPES)
        # Normalize
        X -= self.x_mean_
        X /= self.x_std_
        # Apply rotation
        x_scores = np.dot(X, self.x_rotations_)
        if Y is not None:
            Y = check_array(Y, ensure_2d=False, copy=copy, dtype=FLOAT_DTYPES)
            if Y.ndim == 1:
                Y = Y.reshape(-1, 1)
            Y -= self.y_mean_
            Y /= self.y_std_
            y_scores = np.dot(Y, self.y_rotations_)
            return x_scores, y_scores

        return x_scores

In inverse_transform the denormalization is computed on the matrix result and not in the X parameter.

    def inverse_transform(self, X, copy=True):
        """Transform data back to its original space.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_components)
            New data, where n_samples is the number of samples
            and n_components is the number of pls components.
        copy : bool, default=True
            Whether to copy X, or perform in-place normalization.

        Returns
        -------
        X_original array-like of shape (n_samples, n_features)

        Notes
        -----
        This transformation will only be exact if n_components=n_features
        """
        check_is_fitted(self)
        X = check_array(X, copy=copy, dtype=FLOAT_DTYPES)
        # From pls space to original space
        np.matmul(X, self.x_loadings_.T, out=X)

        # Denormalize
        X *= self.x_std_
        X += self.x_mean_
        return X

And by contrast in PCA inverse_transform they seem to not support the inplace functionality.

It doesn't feel natural to support inplace functionality here if it isn't supported in other decomposition methods. What do you think? Or you see value to this inplace feature? (I'll be happy to work in a PR to add it for PCA).


Coming back to the tests, I was able to reproduce the failure test. This is happening because np.matmul(X, self.x_loadings_.T, out=X) is following the known error with numpy==1.11.0 where if the out variable is the same as input it will return a zero matrix (tested by myself).

With numpy==1.17 the test pass as this bug was fixed. Is there a specific reason why numpy==1.11 is being used? Should I push for a solution that solves that bug or ignore the inplace feature?

I believe to have found a solution to bypass this bug and it would be by just hard copying X.
Rewrite the matmul: to np.matmul(X.copy(), self.x_loadings_.T, out=X)

And thanks again for the feedback, I'm learning some new stuff with this PR :)

@adrinjalali
Copy link
Member

@adrinjalali adrinjalali commented Oct 24, 2019

Numpy 1.11 is the oldest numpy we support. We'll be dropping support for that version soon, but not yet.

As you suggest, I recommend you remove the copy parameter, and always create a new array then. It would also avoid the issue, since you won't be using the out parameter anyway.

@jiwidi
Copy link
Contributor Author

@jiwidi jiwidi commented Oct 24, 2019

Numpy 1.11 is the oldest numpy we support. We'll be dropping support for that version soon, but not yet.

As you suggest, I recommend you remove the copy parameter, and always create a new array then. It would also avoid the issue, since you won't be using the out parameter anyway.

This will mean dropping inplace support as well. There will be no reason to keep the copy parameter. You okay with this?

@@ -440,14 +440,14 @@ def inverse_transform(self, X, copy=True):
This transformation will only be exact if n_components=n_features
"""
check_is_fitted(self)
X = check_array(X, copy=copy, dtype=FLOAT_DTYPES)
X = check_array(X, copy=True, dtype=FLOAT_DTYPES)
Copy link
Member

@adrinjalali adrinjalali Oct 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to copy here. Just leave the copy as default.

Copy link
Contributor Author

@jiwidi jiwidi Oct 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to accept the copy parameter even if it will not affect the parameter X? I was thinking on removing copy from the function arg list and only accept X (and making it copy=True x default)

Wether copy is true or false will not affect the execution or trace it leaves on the parameters.

Copy link
Member

@adrinjalali adrinjalali Oct 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that's what I meant.

@jiwidi
Copy link
Contributor Author

@jiwidi jiwidi commented Oct 25, 2019

@adrinjalali Last commit should be ready to merge version I believe

Copy link
Member

@adrinjalali adrinjalali left a comment

Otherwise LGTM. You also need a whats_new entry for this one.

sklearn/cross_decomposition/_pls_.py Outdated Show resolved Hide resolved
@adrinjalali adrinjalali changed the title [WIP] Added inverse_Transform for pls base object and test Added inverse_Transform for pls base object and test Oct 25, 2019
Co-Authored-By: Adrin Jalali <adrin.jalali@gmail.com>
@jiwidi
Copy link
Contributor Author

@jiwidi jiwidi commented Oct 25, 2019

Otherwise LGTM. You also need a whats_new entry for this one.

You mean a whats_new entry on the PR text? Doesn't What does this implement/fix? Explain your changes. Refer to that?

@adrinjalali
Copy link
Member

@adrinjalali adrinjalali commented Oct 25, 2019

Sorry for not being clear in my previous comment. I meant an entry in the v0.22.rst file like the other ones.

@jiwidi
Copy link
Contributor Author

@jiwidi jiwidi commented Oct 25, 2019

Sorry for not being clear in my previous comment. I meant an entry in the v0.22.rst file like the other ones.

no problem. It should ready to merge now :)

- |Feature| :class:`cross_decomposition._PLS` Has a new function :func:`cross_decomposition._PLS.inverse_transform` to transform
data to the original space`. :pr:`15304` by :user:`Jaime Ferrando Huertas <jiwidi>`.

Copy link
Member

@adrinjalali adrinjalali Oct 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep the line length <80.

Also, we should point to the public API here, i.e. list the classes which inherit from _PLS instead.

Copy link
Contributor Author

@jiwidi jiwidi Oct 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something like this maybe?

- |Feature| :class:`cross_decomposition.PLSCanonical` and 
  :class:`cross_decomposition.PLSRegression` 
  have a new function `inverse_transform` 
  :func:`cross_decomposition.PLSCanonical.inverse_transform`
  :func:`cross_decomposition.PLSRegression.inverse_transform` 
  to transform data to the original space`.
  :pr:`15304` by :user:`Jaime Ferrando Huertas <jiwidi>`.

Copy link
Member

@adrinjalali adrinjalali Oct 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'd do something like:

- |Feature| :class:`cross_decomposition.PLSCanonical` and
  :class:`cross_decomposition.PLSRegression` have a new function
  ``inverse_transform`` to transform data to the original space`.
  :pr:`15304` by :user:`Jaime Ferrando Huertas <jiwidi>`.

Copy link
Member

@adrinjalali adrinjalali left a comment

Thanks @jiwidi for your prompt responses. Now we need to wait for another maintainer to do a second review, and hopefully it'll get in soon :)

@adrinjalali adrinjalali changed the title Added inverse_Transform for pls base object and test [MRG] Added inverse_Transform for pls base object and test Oct 25, 2019
@jiwidi
Copy link
Contributor Author

@jiwidi jiwidi commented Oct 25, 2019

Thanks @jiwidi for your prompt responses. Now we need to wait for another maintainer to do a second review, and hopefully it'll get in soon :)

Thank you for all the feedback! :)

Copy link
Member

@jnothman jnothman left a comment

Otherwise LGTM!

Returns
-------
x_reconstructed array-like of shape (n_samples, n_features)
Copy link
Member

@jnothman jnothman Oct 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
x_reconstructed array-like of shape (n_samples, n_features)
x_reconstructed : array-like of shape (n_samples, n_features)

Copy link
Member

@jnothman jnothman Oct 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You removed this colon...

Copy link
Contributor Author

@jiwidi jiwidi Oct 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad, fixed.

@jnothman jnothman changed the title [MRG] Added inverse_Transform for pls base object and test [MRG] Added inverse_transform for pls base object and test Oct 27, 2019
Copy link
Member

@jnothman jnothman left a comment

I don't think reverting the inclusion of the colon will fix CI... It seems something is wrong with installation at the moment, not your fault

@@ -0,0 +1,224 @@
"""
This is a backport of assertRaises() and assertRaisesRegex from Python 3.5.4
Copy link
Member

@jnothman jnothman Oct 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why has this been added?

Copy link
Contributor Author

@jiwidi jiwidi Oct 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my fault, I pulled from origin and messed things up... Tha's why I tried reverting to the commit before all of this happened

Copy link
Contributor Author

@jiwidi jiwidi Oct 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the files now, sorry for the mess

Returns
-------
x_reconstructed array-like of shape (n_samples, n_features)
Copy link
Member

@jnothman jnothman Oct 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You removed this colon...

@jiwidi jiwidi requested a review from jnothman Oct 28, 2019
@jnothman jnothman merged commit 89b15e1 into scikit-learn:master Oct 28, 2019
17 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants