Fix PLS coef_ bug #7819

Merged
merged 5 commits into from Jun 9, 2017

Conversation

Projects
None yet
7 participants
@jayzed82
Contributor

jayzed82 commented Nov 3, 2016

Reference Issue

What does this implement/fix? Explain your changes.

In the PLS class, the Y predicted by fit function was being divided twice by x_std because X is standardized inside the fit function and the coef_ matrix was also divided by x_std.

I have removed the x_std coefficient from self.coef_.

Any other comments?

Remove x_std divisor in coef_ because X is normalized in fit function

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Nov 4, 2016

Member

Yes, it looks like this is a bug. Though, we could choose to report the coefficients with respect to the unscaled data, and then not do the scaling by x_std_ in transform, am I right?

Member

jnothman commented Nov 4, 2016

Yes, it looks like this is a bug. Though, we could choose to report the coefficients with respect to the unscaled data, and then not do the scaling by x_std_ in transform, am I right?

@jnothman jnothman added the Bug label Nov 4, 2016

@jayzed82

This comment has been minimized.

Show comment
Hide comment
@jayzed82

jayzed82 Nov 4, 2016

Contributor

Yes, we could also keep the normalization by (1/x_std_x) in coef_ and remove it from predict. However, we still need to remove the mean of X in the predict method so it seems to be more consistent to leave the full standardization of X (X->(X-x_mean_)/x_std_) in the predict method.

Actually, to be fully consistent we should remove the y_std_ factor from coef_ as well and move it to the predict method.

Contributor

jayzed82 commented Nov 4, 2016

Yes, we could also keep the normalization by (1/x_std_x) in coef_ and remove it from predict. However, we still need to remove the mean of X in the predict method so it seems to be more consistent to leave the full standardization of X (X->(X-x_mean_)/x_std_) in the predict method.

Actually, to be fully consistent we should remove the y_std_ factor from coef_ as well and move it to the predict method.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Nov 7, 2016

Member

Ping @arthurmensch for an opinion...?

Member

jnothman commented Nov 7, 2016

Ping @arthurmensch for an opinion...?

@agramfort

This comment has been minimized.

Show comment
Hide comment
@agramfort

agramfort Nov 8, 2016

Member

is it possible to have a test or improve existing ones? maybe compare with R solution?

Member

agramfort commented Nov 8, 2016

is it possible to have a test or improve existing ones? maybe compare with R solution?

@jayzed82

This comment has been minimized.

Show comment
Hide comment
@jayzed82

jayzed82 Nov 8, 2016

Contributor

I'm not an R user but it's easy to show with a simple example that the proposed fix produce estimations with significantly better r2:

import scipy as sp
from sklearn.cross_decomposition import PLSRegression

### GENERATE DATA
T = 20000
N = 5
M = 10

Q = sp.randn(N,M)
Y = randn(T,N)
X = sp.dot(Y, Q) + 2*randn(T,M)

Xt = X[:T/2] # train data
Yt = Y[:T/2]

Xv = X[T/2:] # validation data
Yv = Y[T/2:]

### CURRENT IMPLEMENTATION
pls = PLSRegression(n_components=5)
pls.fit(X=Xt,Y=Yt)

Yp = pls.predict(Xv)
r = (1 - (Yv-Yp).std(axis=0)/Yv.std(axis=0))
print '%.5f  %.5f  %.5f  %.5f' % (min(r), max(r), sp.mean(r), sp.median(r))
> 0.10685  0.25652  0.16498  0.15017

### FIXING THE COEF
pls2 = PLSRegression(n_components=5)
pls2.fit(X=Xt,Y=Yt)

pls2.coef_ =  np.dot(pls2.x_rotations_, pls2.y_loadings_.T) * pls2.y_std_ # change coef_

Yp2 = pls2.predict(Xv) 
r2 = (1 - (Yv-Yp2).std(axis=0)/Yv.std(axis=0))
print '%.5f  %.5f  %.5f  %.5f' % (min(r2), max(r2), sp.mean(r2), sp.median(r2))
> 0.18684  0.54416  0.31753  0.26289
Contributor

jayzed82 commented Nov 8, 2016

I'm not an R user but it's easy to show with a simple example that the proposed fix produce estimations with significantly better r2:

import scipy as sp
from sklearn.cross_decomposition import PLSRegression

### GENERATE DATA
T = 20000
N = 5
M = 10

Q = sp.randn(N,M)
Y = randn(T,N)
X = sp.dot(Y, Q) + 2*randn(T,M)

Xt = X[:T/2] # train data
Yt = Y[:T/2]

Xv = X[T/2:] # validation data
Yv = Y[T/2:]

### CURRENT IMPLEMENTATION
pls = PLSRegression(n_components=5)
pls.fit(X=Xt,Y=Yt)

Yp = pls.predict(Xv)
r = (1 - (Yv-Yp).std(axis=0)/Yv.std(axis=0))
print '%.5f  %.5f  %.5f  %.5f' % (min(r), max(r), sp.mean(r), sp.median(r))
> 0.10685  0.25652  0.16498  0.15017

### FIXING THE COEF
pls2 = PLSRegression(n_components=5)
pls2.fit(X=Xt,Y=Yt)

pls2.coef_ =  np.dot(pls2.x_rotations_, pls2.y_loadings_.T) * pls2.y_std_ # change coef_

Yp2 = pls2.predict(Xv) 
r2 = (1 - (Yv-Yp2).std(axis=0)/Yv.std(axis=0))
print '%.5f  %.5f  %.5f  %.5f' % (min(r2), max(r2), sp.mean(r2), sp.median(r2))
> 0.18684  0.54416  0.31753  0.26289
@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Nov 8, 2016

Member

Yes, something like that might be an appropriate test, but not with a separate validation set: can we just test that the model is able to accurately fit and predict a training set?

Member

jnothman commented Nov 8, 2016

Yes, something like that might be an appropriate test, but not with a separate validation set: can we just test that the model is able to accurately fit and predict a training set?

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Nov 8, 2016

Member

Either that or build a test around a toy case or something written up in a paper if possible.

Member

jnothman commented Nov 8, 2016

Either that or build a test around a toy case or something written up in a paper if possible.

@jayzed82

This comment has been minimized.

Show comment
Hide comment
@jayzed82

jayzed82 Nov 9, 2016

Contributor

You will get the same results if you look at r2 on the training set. I think the change to coef_ is quite trivial to need to write a paper about it.

Contributor

jayzed82 commented Nov 9, 2016

You will get the same results if you look at r2 on the training set. I think the change to coef_ is quite trivial to need to write a paper about it.

@jayzed82

This comment has been minimized.

Show comment
Hide comment
@jayzed82

jayzed82 Nov 9, 2016

Contributor

Maybe this is a more clear example of the bug. Just multiplying X by a factor 1000, you see the big difference in r2 for the current implementation and the new one:

import scipy as sp
from sklearn.cross_decomposition import PLSRegression

# GENERATE DATA
T = 100000
N = 5
M = 10

Q = sp.randn(N,M)

Y = randn(T,N)
X = 1000*(sp.dot(Y, Q) + 2*randn(T,M))

pls = PLSRegression(n_components=5)
pls.fit(X=X,Y=Y)

pls2 = pls

# CURRENT IMPLEMENTATION
Yp = pls.predict(X)
r = (1 - (Y-Yp).std(axis=0)/Y.std(axis=0))
'%.5f  %.5f  %.5f  %.5f' % (min(r), max(r), sp.mean(r), sp.median(r))
> '0.00012  0.00024  0.00020  0.00022'

# FIXING THE COEF
pls2.coef_ =  np.dot(pls.x_rotations_, pls.y_loadings_.T) * pls.y_std_ 
Yp2 = pls2.predict(X) #sp.dot((X - pls.x_mean_)/pls.x_std_, _m_coef2)  + pls.y_mean_
r2 = (1 - (Y-Yp2).std(axis=0)/Y.std(axis=0))
'%.5f  %.5f  %.5f  %.5f' % (min(r2), max(r2), sp.mean(r2), sp.median(r2))
>'0.20299  0.48110  0.38954  0.44634'
Contributor

jayzed82 commented Nov 9, 2016

Maybe this is a more clear example of the bug. Just multiplying X by a factor 1000, you see the big difference in r2 for the current implementation and the new one:

import scipy as sp
from sklearn.cross_decomposition import PLSRegression

# GENERATE DATA
T = 100000
N = 5
M = 10

Q = sp.randn(N,M)

Y = randn(T,N)
X = 1000*(sp.dot(Y, Q) + 2*randn(T,M))

pls = PLSRegression(n_components=5)
pls.fit(X=X,Y=Y)

pls2 = pls

# CURRENT IMPLEMENTATION
Yp = pls.predict(X)
r = (1 - (Y-Yp).std(axis=0)/Y.std(axis=0))
'%.5f  %.5f  %.5f  %.5f' % (min(r), max(r), sp.mean(r), sp.median(r))
> '0.00012  0.00024  0.00020  0.00022'

# FIXING THE COEF
pls2.coef_ =  np.dot(pls.x_rotations_, pls.y_loadings_.T) * pls.y_std_ 
Yp2 = pls2.predict(X) #sp.dot((X - pls.x_mean_)/pls.x_std_, _m_coef2)  + pls.y_mean_
r2 = (1 - (Y-Yp2).std(axis=0)/Y.std(axis=0))
'%.5f  %.5f  %.5f  %.5f' % (min(r2), max(r2), sp.mean(r2), sp.median(r2))
>'0.20299  0.48110  0.38954  0.44634'
@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Nov 9, 2016

Member

I'm not saying you need to write a paper about it. I'm saying that a good
way to test our implementation would be to check it matches some published
(e.g. textbook) examples.

On 10 November 2016 at 00:53, jayzed82 notifications@github.com wrote:

Maybe this is a more clear example of the bug. Just multiplying X by a
factor 1000, you see the big difference in r2 for the current
implementation and the new one:

import scipy as sp
from sklearn.cross_decomposition import PLSRegression

GENERATE DATA

T = 100000
N = 5
M = 10

Q = sp.randn(N,M)

Y = randn(T,N)
X = 1000_(sp.dot(Y, Q) + 2_randn(T,M))

pls = PLSRegression(n_components=5)
pls.fit(X=X,Y=Y)

pls2 = pls

CURRENT IMPLEMENTATION

Yp = pls.predict(X)
r = (1 - (Y-Yp).std(axis=0)/Y.std(axis=0))
'%.5f %.5f %.5f %.5f' % (min(r), max(r), sp.mean(r), sp.median(r))

'0.00012 0.00024 0.00020 0.00022'

FIXING THE COEF

pls2.coef_ = np.dot(pls.x_rotations_, pls.y_loadings_.T) * pls.y_std_
Yp2 = pls2.predict(X) #sp.dot((X - pls.x_mean_)/pls.x_std_, m_coef2) + pls.y_mean
r2 = (1 - (Y-Yp2).std(axis=0)/Y.std(axis=0))
'%.5f %.5f %.5f %.5f' % (min(r2), max(r2), sp.mean(r2), sp.median(r2))

'0.20299 0.48110 0.38954 0.44634'


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#7819 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz699tyI3gcXKd08FoB9qCCAyyvmHsks5q8dBUgaJpZM4Koqpu
.

Member

jnothman commented Nov 9, 2016

I'm not saying you need to write a paper about it. I'm saying that a good
way to test our implementation would be to check it matches some published
(e.g. textbook) examples.

On 10 November 2016 at 00:53, jayzed82 notifications@github.com wrote:

Maybe this is a more clear example of the bug. Just multiplying X by a
factor 1000, you see the big difference in r2 for the current
implementation and the new one:

import scipy as sp
from sklearn.cross_decomposition import PLSRegression

GENERATE DATA

T = 100000
N = 5
M = 10

Q = sp.randn(N,M)

Y = randn(T,N)
X = 1000_(sp.dot(Y, Q) + 2_randn(T,M))

pls = PLSRegression(n_components=5)
pls.fit(X=X,Y=Y)

pls2 = pls

CURRENT IMPLEMENTATION

Yp = pls.predict(X)
r = (1 - (Y-Yp).std(axis=0)/Y.std(axis=0))
'%.5f %.5f %.5f %.5f' % (min(r), max(r), sp.mean(r), sp.median(r))

'0.00012 0.00024 0.00020 0.00022'

FIXING THE COEF

pls2.coef_ = np.dot(pls.x_rotations_, pls.y_loadings_.T) * pls.y_std_
Yp2 = pls2.predict(X) #sp.dot((X - pls.x_mean_)/pls.x_std_, m_coef2) + pls.y_mean
r2 = (1 - (Y-Yp2).std(axis=0)/Y.std(axis=0))
'%.5f %.5f %.5f %.5f' % (min(r2), max(r2), sp.mean(r2), sp.median(r2))

'0.20299 0.48110 0.38954 0.44634'


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#7819 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz699tyI3gcXKd08FoB9qCCAyyvmHsks5q8dBUgaJpZM4Koqpu
.

@agramfort

This comment has been minimized.

Show comment
Hide comment
@agramfort

agramfort Nov 18, 2016

Member

or maybe can you provide the solution with an R or matlab package showing the numbers match?

you're code certainly correct but these tests are the guarantee that we don't break it later...

thx

Member

agramfort commented Nov 18, 2016

or maybe can you provide the solution with an R or matlab package showing the numbers match?

you're code certainly correct but these tests are the guarantee that we don't break it later...

thx

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Nov 21, 2016

Member

Not an expert at all, but can we not compare the score on non-scaled and scaled data and check that they are comparable, i.e. something like this:

import numpy as np
from numpy.testing import assert_approx_equal

import scipy as sp

from sklearn.cross_decomposition import PLSRegression

T = 1000
N = 5
M = 10

Q = sp.randn(N, M)

Y = np.random.randn(T, N)
X = sp.dot(Y, Q) + 2*np.random.randn(T, M)
X_scaled = 1000 * X

pls = PLSRegression(n_components=5)
pls.fit(X, Y)
score = pls.score(X, Y)

pls.fit(X_scaled, Y)
score_scaled = pls.score(X_scaled, Y)

assert_approx_equal(score, score_scaled)

This snippet fails on master and pass on this PR.

Member

lesteve commented Nov 21, 2016

Not an expert at all, but can we not compare the score on non-scaled and scaled data and check that they are comparable, i.e. something like this:

import numpy as np
from numpy.testing import assert_approx_equal

import scipy as sp

from sklearn.cross_decomposition import PLSRegression

T = 1000
N = 5
M = 10

Q = sp.randn(N, M)

Y = np.random.randn(T, N)
X = sp.dot(Y, Q) + 2*np.random.randn(T, M)
X_scaled = 1000 * X

pls = PLSRegression(n_components=5)
pls.fit(X, Y)
score = pls.score(X, Y)

pls.fit(X_scaled, Y)
score_scaled = pls.score(X_scaled, Y)

assert_approx_equal(score, score_scaled)

This snippet fails on master and pass on this PR.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Nov 21, 2016

Member

For completeness the scores are quite different on master:

Items are not equal to 7 significant digits:
 ACTUAL: 0.33059861837396415
 DESIRED: 0.0004005089894247131
Member

lesteve commented Nov 21, 2016

For completeness the scores are quite different on master:

Items are not equal to 7 significant digits:
 ACTUAL: 0.33059861837396415
 DESIRED: 0.0004005089894247131
@Gscorreia89

This comment has been minimized.

Show comment
Hide comment
@Gscorreia89

Gscorreia89 Mar 21, 2017

What is the current status on this issue? I use these methods extensively, so I can help verifying.

A quick note though - lesteve's test will not fail if the PLSRegression object is called with scale=False.
pls = PLSRegression(n_components=5, scale=False)
From a method "correctness" point of view, the patch fixes the bug when scale=True, which is actually the default... But comparing with other scikit-learn classifiers, why should the scaling require handling at the core of the classifier/regressor here? Why not leave that to the user - using the StandardScaler/RobustScaler and other nice scikit-learn objects? This is actually the way I have been using this code, always with scaling = False.

Also, there is no strict necessity for the algorithm to be run with standardized data, in fact in my field (Metabolomics/Chemometrics), we use it with a varying range of scaling factors, including mean centring only.

Why not just remove the scaling from the object - maybe even make mean centering optional (so RobustScalers can be used before...) and keep the behaviour in a way that can be nicely used in an sklearn Pipeline, as with other sklearn objects?

Btw, I have compared this algorithm recently with other software (commercial and open source), and so far everything worked flawlessly ;) - but I always run scale=False and handle the scaling explicitly myself...

Let me know what is the status, maybe there is something I can help with.

What is the current status on this issue? I use these methods extensively, so I can help verifying.

A quick note though - lesteve's test will not fail if the PLSRegression object is called with scale=False.
pls = PLSRegression(n_components=5, scale=False)
From a method "correctness" point of view, the patch fixes the bug when scale=True, which is actually the default... But comparing with other scikit-learn classifiers, why should the scaling require handling at the core of the classifier/regressor here? Why not leave that to the user - using the StandardScaler/RobustScaler and other nice scikit-learn objects? This is actually the way I have been using this code, always with scaling = False.

Also, there is no strict necessity for the algorithm to be run with standardized data, in fact in my field (Metabolomics/Chemometrics), we use it with a varying range of scaling factors, including mean centring only.

Why not just remove the scaling from the object - maybe even make mean centering optional (so RobustScalers can be used before...) and keep the behaviour in a way that can be nicely used in an sklearn Pipeline, as with other sklearn objects?

Btw, I have compared this algorithm recently with other software (commercial and open source), and so far everything worked flawlessly ;) - but I always run scale=False and handle the scaling explicitly myself...

Let me know what is the status, maybe there is something I can help with.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Mar 21, 2017

Member

What is the current status on this issue? I use these methods extensively, so I can help verifying.

Help would be great I don't think anyone one on the development team is an expert of PLS or even uses it very regularly.

Member

lesteve commented Mar 21, 2017

What is the current status on this issue? I use these methods extensively, so I can help verifying.

Help would be great I don't think anyone one on the development team is an expert of PLS or even uses it very regularly.

@Gscorreia89

This comment has been minimized.

Show comment
Hide comment
@Gscorreia89

Gscorreia89 Mar 21, 2017

As it stands, I am of the opinion that this patch is correct, but from a more "strategic/big picture" point of view, I am of the opinion of just unequivocally split/remove scaling considerations from the algorithm/PLS object. I'm basing this on my experience with this code: it works fine, as long as I tell it to leave these things to me!

Also, is this not related to pull request #6077 ?
Shouldn't this be merged, to avoid duplication of effort and less confusion in the direction to go?
I am still wrapping my head around everything going on with pull issue #6077, and what and where in the code are things actually being modified.

Gscorreia89 commented Mar 21, 2017

As it stands, I am of the opinion that this patch is correct, but from a more "strategic/big picture" point of view, I am of the opinion of just unequivocally split/remove scaling considerations from the algorithm/PLS object. I'm basing this on my experience with this code: it works fine, as long as I tell it to leave these things to me!

Also, is this not related to pull request #6077 ?
Shouldn't this be merged, to avoid duplication of effort and less confusion in the direction to go?
I am still wrapping my head around everything going on with pull issue #6077, and what and where in the code are things actually being modified.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Mar 21, 2017

Member

As it stands, I am of the opinion that this patch is correct

Good to know. What do you think about the test in #7819 (comment). Could you suggest a better one? Maybe something that we can check in a textbook or against another implementation.

Also, is this not related to pull request #6077 ?

Thanks for digging this up I knew there was a similar PR but I was not able to find it in a reasonable amount of time. There may be related issues lying around as well. The problem with #6077 was that there seemed to be a lot unrelated changes which is probably the main reason why it was never merged.

Member

lesteve commented Mar 21, 2017

As it stands, I am of the opinion that this patch is correct

Good to know. What do you think about the test in #7819 (comment). Could you suggest a better one? Maybe something that we can check in a textbook or against another implementation.

Also, is this not related to pull request #6077 ?

Thanks for digging this up I knew there was a similar PR but I was not able to find it in a reasonable amount of time. There may be related issues lying around as well. The problem with #6077 was that there seemed to be a lot unrelated changes which is probably the main reason why it was never merged.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Mar 21, 2017

Member

pinging @arthurmensch just in case he has some memories of #6077.

Member

lesteve commented Mar 21, 2017

pinging @arthurmensch just in case he has some memories of #6077.

@Gscorreia89

This comment has been minimized.

Show comment
Hide comment
@Gscorreia89

Gscorreia89 Mar 22, 2017

@lesteve regarding the test, I think there will have to be some major decisions in how to handle the scaling - which probably means that we won't need that type of test. But let's see.

Let's wait for an update on status and a clarification on #6077

@lesteve regarding the test, I think there will have to be some major decisions in how to handle the scaling - which probably means that we won't need that type of test. But let's see.

Let's wait for an update on status and a clarification on #6077

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Mar 22, 2017

Member

@lesteve regarding the test, I think there will have to be some major decisions in how to handle the scaling - which probably means that we won't need that type of test. But let's see.

We should go for the small fix first so that scaling=True does what it is supposed to do. The fix seems to be one line and adding a test should not be too much of an effort especially with your domain expert knowledge.

To be completely honest, I don't think it is very likely that we are going to drop the scaling parameter from the PLS. Even if that happens, it is not going to happen overnight. For example we need two releases the parameter is deprecated before dropping it completely. Look at our deprecation policy for more details.

Member

lesteve commented Mar 22, 2017

@lesteve regarding the test, I think there will have to be some major decisions in how to handle the scaling - which probably means that we won't need that type of test. But let's see.

We should go for the small fix first so that scaling=True does what it is supposed to do. The fix seems to be one line and adding a test should not be too much of an effort especially with your domain expert knowledge.

To be completely honest, I don't think it is very likely that we are going to drop the scaling parameter from the PLS. Even if that happens, it is not going to happen overnight. For example we need two releases the parameter is deprecated before dropping it completely. Look at our deprecation policy for more details.

@bthirion

This comment has been minimized.

Show comment
Hide comment
@bthirion

bthirion Jun 7, 2017

Member

I think that the fix is important, and that most users probably want to have the scaling done right by default , without handling it explicitly. So I'm in favor of merging that one, but with a test, e.g. based on the above example.

Member

bthirion commented Jun 7, 2017

I think that the fix is important, and that most users probably want to have the scaling done right by default , without handling it explicitly. So I'm in favor of merging that one, but with a test, e.g. based on the above example.

jayzed82 and others added some commits Nov 3, 2016

Fix PLS coef_ bug
Remove x_std divisor in coef_ because X is normalized in fit function
@Gscorreia89

This comment has been minimized.

Show comment
Hide comment
@Gscorreia89

Gscorreia89 Jun 7, 2017

Ah, its been a few weeks since I looked in to this so forgot completely, but I had some extensions to @lesteve previous test to suggest... They are just some extra checks, to see if the behaviour of scale is still coherent when using the StandardScaler.

Basically, additionally checking that selecting non-scaled data and adding scale=True is the same as using pre-scaled data followed by scale=False, and that pre-scaled data should give the same result with either scale=True or False.

What do you think?

btw, running this through current version all fails.

import numpy as np
from numpy.testing import assert_approx_equal

import scipy as sp

from sklearn.cross_decomposition import PLSRegression
from sklearn.preprocessing import StandardScaler


T = 1000
N = 5
M = 10

Q = sp.randn(N, M)

Y = np.random.randn(T, N)
X = sp.dot(Y, Q) + 2*np.random.randn(T, M)
X = 1000*X

# One instance of the PLS regression object with scale=False...
pls_nscale = PLSRegression(n_components=5, scale=False)
# And another for scale = True
pls = PLSRegression(n_components=5, scale=True)

# Fit both to "original" data
pls.fit(X, Y)
pls_nscale.fit(X, Y)

# Obtain both scores = Should be different here

score_xoriginal_scale_off = pls_nscale.score(X,Y)
score_xoriginal_scale_on = pls.score(X, Y)

# Scale the data
X_scaled = StandardScaler().fit_transform(X)

# Refit...
pls.fit(X_scaled, Y)
pls_nscale.fit(X_scaled, Y)

# Re-score now they should be the same as the data was pre-scaled
score_xscaled_scale_on = pls.score(X_scaled, Y)
score_xscaled_scale_off = pls_nscale.score(X_scaled, Y)


# Scaling inside the PLS object should be give same behaviour as scaling the object independently 
assert_approx_equal(score_xscaled_scale_off, score_xoriginal_scale_on)
assert_approx_equal(score_xscaled_scale_on, score_xoriginal_scale_on)

# If the object has been scaled before, should be no difference between behaviour anyway
assert_approx_equal(score_xscaled_scale_on, score_xscaled_scale_off)

Gscorreia89 commented Jun 7, 2017

Ah, its been a few weeks since I looked in to this so forgot completely, but I had some extensions to @lesteve previous test to suggest... They are just some extra checks, to see if the behaviour of scale is still coherent when using the StandardScaler.

Basically, additionally checking that selecting non-scaled data and adding scale=True is the same as using pre-scaled data followed by scale=False, and that pre-scaled data should give the same result with either scale=True or False.

What do you think?

btw, running this through current version all fails.

import numpy as np
from numpy.testing import assert_approx_equal

import scipy as sp

from sklearn.cross_decomposition import PLSRegression
from sklearn.preprocessing import StandardScaler


T = 1000
N = 5
M = 10

Q = sp.randn(N, M)

Y = np.random.randn(T, N)
X = sp.dot(Y, Q) + 2*np.random.randn(T, M)
X = 1000*X

# One instance of the PLS regression object with scale=False...
pls_nscale = PLSRegression(n_components=5, scale=False)
# And another for scale = True
pls = PLSRegression(n_components=5, scale=True)

# Fit both to "original" data
pls.fit(X, Y)
pls_nscale.fit(X, Y)

# Obtain both scores = Should be different here

score_xoriginal_scale_off = pls_nscale.score(X,Y)
score_xoriginal_scale_on = pls.score(X, Y)

# Scale the data
X_scaled = StandardScaler().fit_transform(X)

# Refit...
pls.fit(X_scaled, Y)
pls_nscale.fit(X_scaled, Y)

# Re-score now they should be the same as the data was pre-scaled
score_xscaled_scale_on = pls.score(X_scaled, Y)
score_xscaled_scale_off = pls_nscale.score(X_scaled, Y)


# Scaling inside the PLS object should be give same behaviour as scaling the object independently 
assert_approx_equal(score_xscaled_scale_off, score_xoriginal_scale_on)
assert_approx_equal(score_xscaled_scale_on, score_xoriginal_scale_on)

# If the object has been scaled before, should be no difference between behaviour anyway
assert_approx_equal(score_xscaled_scale_on, score_xscaled_scale_off)
@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Jun 7, 2017

Member

Nice thanks @Gscorreia89. I think this makes sense to add these tests as well.

Member

lesteve commented Jun 7, 2017

Nice thanks @Gscorreia89. I think this makes sense to add these tests as well.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Jun 7, 2017

Member

@Gscorreia89 you can use py after the three backquotes for python snippets, see syntax highlighting with py for python snippets and pytb for tracebacks. I have edited your last post.

Member

lesteve commented Jun 7, 2017

@Gscorreia89 you can use py after the three backquotes for python snippets, see syntax highlighting with py for python snippets and pytb for tracebacks. I have edited your last post.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Jun 7, 2017

Member

@Gscorreia89 one of your additional test fail, not sure why. If you have any insights, let me know.

/tmp/test_pls.py in <module>()
     45 
     46 # Scaling inside the PLS object should be give same behaviour as scaling the object independently
---> 47 assert_approx_equal(score_xscaled_scale_off, score_xoriginal_scale_on)
     48 assert_approx_equal(score_xscaled_scale_on, score_xoriginal_scale_on)
     49 

/home/lesteve/miniconda3/lib/python3.5/site-packages/numpy/testing/utils.py in assert_approx_equal(actual, desired, significant, err_msg, verbose)
    685         pass
    686     if np.abs(sc_desired - sc_actual) >= np.power(10., -(significant-1)):
--> 687         raise AssertionError(msg)
    688 
    689 

AssertionError: 
Items are not equal to 7 significant digits:
 ACTUAL: 0.6425429908367526
 DESIRED: 0.6425467856056288
Member

lesteve commented Jun 7, 2017

@Gscorreia89 one of your additional test fail, not sure why. If you have any insights, let me know.

/tmp/test_pls.py in <module>()
     45 
     46 # Scaling inside the PLS object should be give same behaviour as scaling the object independently
---> 47 assert_approx_equal(score_xscaled_scale_off, score_xoriginal_scale_on)
     48 assert_approx_equal(score_xscaled_scale_on, score_xoriginal_scale_on)
     49 

/home/lesteve/miniconda3/lib/python3.5/site-packages/numpy/testing/utils.py in assert_approx_equal(actual, desired, significant, err_msg, verbose)
    685         pass
    686     if np.abs(sc_desired - sc_actual) >= np.power(10., -(significant-1)):
--> 687         raise AssertionError(msg)
    688 
    689 

AssertionError: 
Items are not equal to 7 significant digits:
 ACTUAL: 0.6425429908367526
 DESIRED: 0.6425467856056288
@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Jun 7, 2017

Member

I guess the scores are not that different so we could relax the equality check.

Member

lesteve commented Jun 7, 2017

I guess the scores are not that different so we could relax the equality check.

@Gscorreia89

This comment has been minimized.

Show comment
Hide comment
@Gscorreia89

Gscorreia89 Jun 7, 2017

Hum... True.
That one test is basically saying - scaling before and running scale = False is the same as not scaling and running scale = True. In theory should be the same, and they look very similar, but scaling inside the object might lead to more scaling steps and might not be as neatly coded as the StandardScaler, hence the loss of precision.
Maybe OK, but still not ideal - can you please run it a couple more times with different seeds and changing the value X is being multiplied by?

PS: thanks for the '''py tip ;)

Gscorreia89 commented Jun 7, 2017

Hum... True.
That one test is basically saying - scaling before and running scale = False is the same as not scaling and running scale = True. In theory should be the same, and they look very similar, but scaling inside the object might lead to more scaling steps and might not be as neatly coded as the StandardScaler, hence the loss of precision.
Maybe OK, but still not ideal - can you please run it a couple more times with different seeds and changing the value X is being multiplied by?

PS: thanks for the '''py tip ;)

@bthirion

This comment has been minimized.

Show comment
Hide comment
@bthirion

bthirion Jun 7, 2017

Member

Let me adda related point: for consistency with linear_model, we should probably replace scale with normalize when creating the PLS* objects.

Member

bthirion commented Jun 7, 2017

Let me adda related point: for consistency with linear_model, we should probably replace scale with normalize when creating the PLS* objects.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Jun 7, 2017

Member

Let me adda related point: for consistency with linear_model, we should probably replace scale with normalize when creating the PLS* objects.

I think I saw in another issue that X was always demeaned (even if scale=False) so maybe scale makes sense.

Member

lesteve commented Jun 7, 2017

Let me adda related point: for consistency with linear_model, we should probably replace scale with normalize when creating the PLS* objects.

I think I saw in another issue that X was always demeaned (even if scale=False) so maybe scale makes sense.

@bthirion

This comment has been minimized.

Show comment
Hide comment
@bthirion

bthirion Jun 7, 2017

Member

But we should no demean X if the user does not want to.

Member

bthirion commented Jun 7, 2017

But we should no demean X if the user does not want to.

@Gscorreia89

This comment has been minimized.

Show comment
Hide comment
@Gscorreia89

Gscorreia89 Jun 7, 2017

@bthirion - agree, although in this case its quite standard to do so (same behaviour as it is in PCA). I already mentioned before I personally would prefer to have the scaling completely off and tell the user to apply the standard scaler or equivalent anyway.

But I think there are more pressing issues here: fix the scaler behaviour, then solve a couple of other PLS related issues (including confirming the overlap of this with #6077 that was never fully applied?)

Gscorreia89 commented Jun 7, 2017

@bthirion - agree, although in this case its quite standard to do so (same behaviour as it is in PCA). I already mentioned before I personally would prefer to have the scaling completely off and tell the user to apply the standard scaler or equivalent anyway.

But I think there are more pressing issues here: fix the scaler behaviour, then solve a couple of other PLS related issues (including confirming the overlap of this with #6077 that was never fully applied?)

@bthirion

This comment has been minimized.

Show comment
Hide comment
@bthirion

bthirion Jun 7, 2017

Member

Regarding #6077 Clearly both PRs are tackling the same issue indeed. I have the impression that your fix is more appropriate, but this has to be checked.

Member

bthirion commented Jun 7, 2017

Regarding #6077 Clearly both PRs are tackling the same issue indeed. I have the impression that your fix is more appropriate, but this has to be checked.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Jun 8, 2017

Member

I am in favour of merging this first and tackle deeper issues in another PR.

Member

lesteve commented Jun 8, 2017

I am in favour of merging this first and tackle deeper issues in another PR.

@Gscorreia89

This comment has been minimized.

Show comment
Hide comment
@Gscorreia89

Gscorreia89 Jun 8, 2017

A couple of things before merging:

The problem with #6077 is that there is a lot going on there, especially with point 3:
Point 3 is something else altogether, needs to be checked separately. - lets leave it for now.

Point 2 brings another scaling issue - what is happening to the scaling of Y.
PLS can be used with single or multiple Y. From my practical point of view, it is perfectly legitimate in the Y matrix case to want to run one type of scaling for X and another for Y (for ex, scale X but not Y). Again, scale=False and doing it manually would do it, so not a big problem here, and we would be then assuming that scale=True scales X and Y at the same time.
It seems to me that regarding point 2 of #6077, transform is working as expected, but predict isn't and adding a rescaling of Y is necessary when scale=True. Everything fine in scale = False.
Btw, this will be silent in the tests as it is, because the Y is never being re-scaled whatever the choice of X or Y!! Maybe add something to my test suggestions like a couple more combinations with yscaled = StandardScaler().fit_transform(y) is a good idea!

Now a bit of my opinion and arguments regarding the issue with the coefficients:
I think the scaling should definitely be removed from their calculation, and the coefficients displayed should be those where Ypred = X*coef_. This X can be the non scaled or scaled data depending on the scale option. The predict (and all other functions) should handle the scaling of the data, never the coefficients.

Scaling changes what the model actually "sees" does from the data, as expected, so I am of the opinion that the model coefficients should always reflect how the model actually works, not a back-projection to the original data - because the R2, score, projections etc we are obtaining are only valid for the transformed data, not to the original. These are not simply standardized regression coefficients. The parameters of the PLS components that maximized covariance between X and Y in the scaled data will not be the same up to a scaling of those who would have been found if the user didn't scale the data, so back-porting the coefficients will not give "the direction of maximum covariance in the raw data space", just give a non-representative and possibly confusing view.

I am of the opinion that if the user scaled the data, its on him to remember what the interpretation of the coefficients should be.

Gscorreia89 commented Jun 8, 2017

A couple of things before merging:

The problem with #6077 is that there is a lot going on there, especially with point 3:
Point 3 is something else altogether, needs to be checked separately. - lets leave it for now.

Point 2 brings another scaling issue - what is happening to the scaling of Y.
PLS can be used with single or multiple Y. From my practical point of view, it is perfectly legitimate in the Y matrix case to want to run one type of scaling for X and another for Y (for ex, scale X but not Y). Again, scale=False and doing it manually would do it, so not a big problem here, and we would be then assuming that scale=True scales X and Y at the same time.
It seems to me that regarding point 2 of #6077, transform is working as expected, but predict isn't and adding a rescaling of Y is necessary when scale=True. Everything fine in scale = False.
Btw, this will be silent in the tests as it is, because the Y is never being re-scaled whatever the choice of X or Y!! Maybe add something to my test suggestions like a couple more combinations with yscaled = StandardScaler().fit_transform(y) is a good idea!

Now a bit of my opinion and arguments regarding the issue with the coefficients:
I think the scaling should definitely be removed from their calculation, and the coefficients displayed should be those where Ypred = X*coef_. This X can be the non scaled or scaled data depending on the scale option. The predict (and all other functions) should handle the scaling of the data, never the coefficients.

Scaling changes what the model actually "sees" does from the data, as expected, so I am of the opinion that the model coefficients should always reflect how the model actually works, not a back-projection to the original data - because the R2, score, projections etc we are obtaining are only valid for the transformed data, not to the original. These are not simply standardized regression coefficients. The parameters of the PLS components that maximized covariance between X and Y in the scaled data will not be the same up to a scaling of those who would have been found if the user didn't scale the data, so back-porting the coefficients will not give "the direction of maximum covariance in the raw data space", just give a non-representative and possibly confusing view.

I am of the opinion that if the user scaled the data, its on him to remember what the interpretation of the coefficients should be.

@Gscorreia89

This comment has been minimized.

Show comment
Hide comment
@Gscorreia89

Gscorreia89 Jun 8, 2017

A clarification of something I wrote in previous post that was not absolutely True: "because the R2, score, projections etc we are obtaining are only valid for the transformed data, not to the original."

Not "True" as the R2 is being calculated in the back-scaled data correctly (or will be once predict is fixed) - but the model "maximized R2" by fitting on the scaled data - so by not scaling the data it might be possible to get a better R2 for the non-scaled Y.

Some of these things might be quite subtle or even show more obviously in multi Y cases. Most PLS software and open source implementations follows the convention of providing R2 values obtained in the scaled data, and the original data if no scaling is selected.

A clarification of something I wrote in previous post that was not absolutely True: "because the R2, score, projections etc we are obtaining are only valid for the transformed data, not to the original."

Not "True" as the R2 is being calculated in the back-scaled data correctly (or will be once predict is fixed) - but the model "maximized R2" by fitting on the scaled data - so by not scaling the data it might be possible to get a better R2 for the non-scaled Y.

Some of these things might be quite subtle or even show more obviously in multi Y cases. Most PLS software and open source implementations follows the convention of providing R2 values obtained in the scaled data, and the original data if no scaling is selected.

Simplify example
Cosmetic improvement of whats_new
@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Jun 9, 2017

Member

I am of the opinion of merging this as is, i.e. simple fix + simple test. @Gscorreia89 feel free to open PRs about adding more tests.

Member

lesteve commented Jun 9, 2017

I am of the opinion of merging this as is, i.e. simple fix + simple test. @Gscorreia89 feel free to open PRs about adding more tests.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Jun 9, 2017

Member

I'll wait for the CIs and merge this.

Member

lesteve commented Jun 9, 2017

I'll wait for the CIs and merge this.

@Gscorreia89

This comment has been minimized.

Show comment
Hide comment
@Gscorreia89

Gscorreia89 Jun 9, 2017

I think the predict as mentioned in #6077 should be patched next. Then its more a mater of documentation.

I think the predict as mentioned in #6077 should be patched next. Then its more a mater of documentation.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Jun 9, 2017

Member

Merging this one, thanks a lot for your input @Gscorreia89 and @bthirion !

Member

lesteve commented Jun 9, 2017

Merging this one, thanks a lot for your input @Gscorreia89 and @bthirion !

@lesteve lesteve merged commit b15818e into scikit-learn:master Jun 9, 2017

2 of 3 checks passed

continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Jun 10, 2017

Member

@Gscorreia89 do you have a good reference for using PLS in with 1d y?

Member

amueller commented Jun 10, 2017

@Gscorreia89 do you have a good reference for using PLS in with 1d y?

Sundrique added a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017

@Gscorreia89

This comment has been minimized.

Show comment
Hide comment
@Gscorreia89

Gscorreia89 Jul 13, 2017

@amueller Of the top of my head, here is a varied list with an emphasis of PLS-Regression for the 1d y case. These range from basic tutorials to some deeper (and very interesting) statistical works that try to put PLS in the context of other regression techniques and figure out what exactly is going on there... Also added SIMPLS and KernelPLS - although they are "alternative" algorithms (and there are MANY for PLS, each with its own particular difference, and not always obvious...), I think there is important information there (especially the SIMPLS article) to help understand the method. Let me know if you are interested in other cases, (multi-y, PLS-DA models, orthogonal filters and O-PLS family etc), I can share some for these cases too.

The references:
Partial least-squares regression: a tutorial
http://www.sciencedirect.com/science/article/pii/0003267086800289

PLS-regression: a basic tool of chemometrics
http://www.sciencedirect.com/science/article/pii/S0169743901001551

PLS regression methods
http://onlinelibrary.wiley.com/doi/10.1002/cem.1180020306/abstract

A Statistical View of Some Chemometrics Regression Tools
https://www.jstor.org/stable/1269656

A Survey of Partial Least Squares (PLS) Methods, with Emphasis on the Two-Block Case (2000)
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.30.16

Overview and recent advances in partial least squares (2006)
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.85.7735

SIMPLS: An alternative approach to partial least squares regression
http://www.sciencedirect.com/science/article/pii/016974399385002X

Improved PLS algorithms
http://onlinelibrary.wiley.com/doi/10.1002/(SICI)1099-128X(199701)11:1%3C73::AID-CEM435%3E3.0.CO;2-%23/full

@amueller Of the top of my head, here is a varied list with an emphasis of PLS-Regression for the 1d y case. These range from basic tutorials to some deeper (and very interesting) statistical works that try to put PLS in the context of other regression techniques and figure out what exactly is going on there... Also added SIMPLS and KernelPLS - although they are "alternative" algorithms (and there are MANY for PLS, each with its own particular difference, and not always obvious...), I think there is important information there (especially the SIMPLS article) to help understand the method. Let me know if you are interested in other cases, (multi-y, PLS-DA models, orthogonal filters and O-PLS family etc), I can share some for these cases too.

The references:
Partial least-squares regression: a tutorial
http://www.sciencedirect.com/science/article/pii/0003267086800289

PLS-regression: a basic tool of chemometrics
http://www.sciencedirect.com/science/article/pii/S0169743901001551

PLS regression methods
http://onlinelibrary.wiley.com/doi/10.1002/cem.1180020306/abstract

A Statistical View of Some Chemometrics Regression Tools
https://www.jstor.org/stable/1269656

A Survey of Partial Least Squares (PLS) Methods, with Emphasis on the Two-Block Case (2000)
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.30.16

Overview and recent advances in partial least squares (2006)
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.85.7735

SIMPLS: An alternative approach to partial least squares regression
http://www.sciencedirect.com/science/article/pii/016974399385002X

Improved PLS algorithms
http://onlinelibrary.wiley.com/doi/10.1002/(SICI)1099-128X(199701)11:1%3C73::AID-CEM435%3E3.0.CO;2-%23/full

dmohns added a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017

dmohns added a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017

NelleV added a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017

paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

AishwaryaRK added a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

@marta-sd marta-sd referenced this pull request in oddt/oddt Dec 1, 2017

Merged

Tests and cleanup for scoring #49

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment