Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PLS reports "array must not contain nan" if a feature is constant #13609

Closed
jnothman opened this issue Apr 10, 2019 · 7 comments · Fixed by #14450
Closed

PLS reports "array must not contain nan" if a feature is constant #13609

jnothman opened this issue Apr 10, 2019 · 7 comments · Fixed by #14450

Comments

@jnothman
Copy link
Member

@jnothman jnothman commented Apr 10, 2019

Originally reported at #2089 (comment) by @Franck-Dernoncourt. Reproduce with:

import numpy as np
import sklearn.cross_decomposition

pls2 = sklearn.cross_decomposition.PLSRegression()
xx = np.random.random((5,5))
yy = np.zeros((5,5) ) 

yy[0,:] = [0,1,0,0,0]
yy[1,:] = [0,0,0,1,0]
yy[2,:] = [0,0,0,0,1]
#yy[3,:] = [1,0,0,0,0] # Uncommenting this line solves the issue

pls2.fit(xx, yy)

The obscure error message is due to the presence of a column containing only 0.

@MarcoGorelli

This comment has been minimized.

Copy link
Contributor

@MarcoGorelli MarcoGorelli commented Apr 10, 2019

What would you like to see instead? An assertion when the fit method is called that checks that no feature is constant, and returns a clear error if the assertion fails?

@jnothman

This comment has been minimized.

Copy link
Member Author

@jnothman jnothman commented Apr 10, 2019

@iodapro

This comment has been minimized.

Copy link

@iodapro iodapro commented Apr 16, 2019

As far as I understand we need to remove the warning message keeping the correct answer (when line yy[3,:] = [1,0,0,0,0] is uncommented ).
Can I try to solve this issue if nobody minds?

@MarcoGorelli

This comment has been minimized.

Copy link
Contributor

@MarcoGorelli MarcoGorelli commented Apr 17, 2019

@jnothman

This comment has been minimized.

Copy link
Member Author

@jnothman jnothman commented Apr 24, 2019

As far as I understand we need to remove the warning message keeping the correct answer

I'm not an expert on PLS; I was relying on the comments historically related to this issue to describe it as a simple fix. But certainly the problem is constant features.

Go ahead and submit a pull request, @iodapro

@camilaagw

This comment has been minimized.

Copy link
Contributor

@camilaagw camilaagw commented Jul 13, 2019

@jnothman there is something I can't undestand about the example you give in the issue: Even when we are uncommenting the line yy[3,:] = [1,0,0,0,0], the third column of yy is constant, but in that case pls2.fit(xx, yy) works. Do we need two columns to be constant for the PLS to fail?

@camilaagw

This comment has been minimized.

Copy link
Contributor

@camilaagw camilaagw commented Jul 23, 2019

After taking a deeper look at the problem, the problem is not constant features. The problem is that the first column of the target (yy) is constant. For instance, this case will work (constant features and some constant columns in the target that are not the first column):

import numpy as np
import sklearn.cross_decomposition

pls2 = sklearn.cross_decomposition.PLSRegression()
xx = np.random.random((5,5))
xx[:,1] = 1
xx[:,2] = 0
yy = np.random.random((5,5))
yy[:,2] = 5
yy[:,4] = 1
pls2.fit(xx, yy)
pls2.predict(xx)

But this case won't (the first column in the target is a constant):

import numpy as np
import sklearn.cross_decomposition

pls2 = sklearn.cross_decomposition.PLSRegression()
xx = np.random.random((5,5))
yy = np.random.random((5,5))
yy[:,0] = 4
pls2.fit(xx, yy)
pls2.predict(xx)

This is because the first step of the _nipals_twoblocks_inner_loop algorithm is to calculate y_score = Y[:, [0]] and this will cause the x_weights = np.dot(X.T, y_score) / np.dot(y_score.T, y_score) to be an array of nan. This happens because _center_scale_xy will cause the first column of yy to be a column of zeros.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.