Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
[MRG] Added inverse_transform for pls base object and test #15304
This PR is the same as #15289 but with squeezed commits since I failed with docstring syntax.
What does this implement/fix? Explain your changes.
With this PR I'm adding a function
The function transforms back to the original space by multiplying the transformed data with the x_loadings as
This function allows you to transform back data to the original space (this transformation will only be exact if n_components=n_features). This transformation is widely used to compute the Squared Prediction Error of our _PLS model. This metric is famous for its use in industry scenarios where PLS acts as a statistical model to control processes where time takes a big act (papers on this: 1, 2 )
Following Sklearn _PLS example this is how the function should be used:
from sklearn.cross_decomposition import PLSRegression X = [[0., 0., 1.], [1.,0.,0.], [2.,2.,2.], [2.,5.,4.]] Y = [[0.1, -0.2], [0.9, 1.1], [6.2, 5.9], [11.9, 12.3]] pls2 = PLSRegression(n_components=2) pls2.fit(X, Y) t = [0., 0., 1.] Y_pred = pls2.transform([t]) X_reconstructed = pls2.inverse_transform(Y_pred) # Value will be [0.02257852, 0.11391906, 0.87805722]
And a example to showcase the correctness of the function with n_components==n_features
from sklearn.cross_decomposition import PLSRegression X = [[0., 0., 1.], [1.,0.,0.], [2.,2.,2.], [2.,5.,4.]] Y = [[0.1, -0.2], [0.9, 1.1], [6.2, 5.9], [11.9, 12.3]] pls2 = PLSRegression(n_components=3) pls2.fit(X, Y) t = [0., 0., 1.] Y_pred = pls2.transform([t]) X_reconstructed = pls2.inverse_transform(Y_pred) # Value will be [0, 0, 1]
Any other comments?
I have been developing software for multivariate statistical process control for some time now and Sklearn implementation of PLS has been widely used in this field. I always thought the _PLS was lacking from this method while PCA had it and decided to make a contribution for it :)
@adrinjalali newbie question here. All the last commits have failed to run the test tho I can not replicate the same error on local (they pass here). Is there any way to debug this in an easier way rather than just committing and make the test run every time? Thanks!
The failing tests are all on python 3.5. You should create a similar environment and see why they fail. You can see the environment on top of the failing tests. For example:
@adrinjalali I have been thinking about the inplace support for this function. It's a different behavior from the transform() inplace, where if you select copy=False the only change reflected to X would be the normalization:
def transform(self, X, Y=None, copy=True): """Apply the dimension reduction learned on the train data. Parameters ---------- X : array-like of shape (n_samples, n_features) Training vectors, where n_samples is the number of samples and n_features is the number of predictors. Y : array-like of shape (n_samples, n_targets) Target vectors, where n_samples is the number of samples and n_targets is the number of response variables. copy : boolean, default True Whether to copy X and Y, or perform in-place normalization. Returns ------- x_scores if Y is not given, (x_scores, y_scores) otherwise. """ check_is_fitted(self) X = check_array(X, copy=copy, dtype=FLOAT_DTYPES) # Normalize X -= self.x_mean_ X /= self.x_std_ # Apply rotation x_scores = np.dot(X, self.x_rotations_) if Y is not None: Y = check_array(Y, ensure_2d=False, copy=copy, dtype=FLOAT_DTYPES) if Y.ndim == 1: Y = Y.reshape(-1, 1) Y -= self.y_mean_ Y /= self.y_std_ y_scores = np.dot(Y, self.y_rotations_) return x_scores, y_scores return x_scores
In inverse_transform the denormalization is computed on the matrix result and not in the X parameter.
def inverse_transform(self, X, copy=True): """Transform data back to its original space. Parameters ---------- X : array-like of shape (n_samples, n_components) New data, where n_samples is the number of samples and n_components is the number of pls components. copy : bool, default=True Whether to copy X, or perform in-place normalization. Returns ------- X_original array-like of shape (n_samples, n_features) Notes ----- This transformation will only be exact if n_components=n_features """ check_is_fitted(self) X = check_array(X, copy=copy, dtype=FLOAT_DTYPES) # From pls space to original space np.matmul(X, self.x_loadings_.T, out=X) # Denormalize X *= self.x_std_ X += self.x_mean_ return X
And by contrast in PCA inverse_transform they seem to not support the inplace functionality.
It doesn't feel natural to support inplace functionality here if it isn't supported in other decomposition methods. What do you think? Or you see value to this inplace feature? (I'll be happy to work in a PR to add it for PCA).
Coming back to the tests, I was able to reproduce the failure test. This is happening because
With numpy==1.17 the test pass as this bug was fixed. Is there a specific reason why numpy==1.11 is being used? Should I push for a solution that solves that bug or ignore the inplace feature?
I believe to have found a solution to bypass this bug and it would be by just hard copying X.
And thanks again for the feedback, I'm learning some new stuff with this PR :)
Numpy 1.11 is the oldest numpy we support. We'll be dropping support for that version soon, but not yet.
As you suggest, I recommend you remove the
This will mean dropping inplace support as well. There will be no reason to keep the