Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

LDA transform gives wrong dimensions #1967

Closed
twiecki opened this Issue · 31 comments

9 participants

@twiecki

Help doc of LDA.transform says:

Returns
-------
X_new : array, shape = [n_samples, n_components]

However:

X = np.array([[-1, -1, -1], [-2, -1, -1], [-3, -2, -2], [1, 1, 3], [2, 1, 4], [3, 2, 1]])
y = np.array([1, 1, 1, 2, 2, 2])
lda = LDA(n_components=3).fit(X, y)
lda.transform(X).shape

produces
(6, 2) while I expect it to be (6, 3) as n_components == 3

Looking at the code of transform:
return np.dot(X, self.coef_[:n_comp].T)

Where self.coef_ is computed as in .fit():
self.coef_ = np.dot(self.means_ - self.xbar_, self.scalings_)

which produces a 2x1 matrix (or whatever the number of means is).

Not sure what the correct thing would be but it seems that LDA.coef_ should have length n_compoments, no?

@amueller
Owner

yes, it should. I'm not happy with our LDA implementation, there was a reimplementation that we wanted to use but somehow we didn't move on it :-/

@amueller
Owner

Wait, the behavior was correct. It returns min(n_components, n_means), which is what it should do (I think).
I tried to fix the doc-strings a while ago, it seems I overlooked this place.

@jaquesgrobler

@amueller sorry to be that guy, but you mind explaining your last comment a bit more? My logic is still stuck on what @twiecki said up-top and I don't know the lda's code that well. :)

@amueller
Owner

LDA projects into the space spanned by the means of the classes (and possibly discards the least informative directions if n_components < n_classes). This is a linear method, it can only make statements that are orthogonal to the direction of the separating planes. Along those planes, everything looks the same basically.

@twiecki

My understanding of LDA (or Fisher LDA) was that it projects not into the space spanned by the means but also takes the covariance of the classes into account (the figure at the bottom of https://www.projectrhea.org/rhea/index.php/Fisher_Linear_Discriminant_OldKiwi is pretty informative I think), essentially finding the maximal Mahalonbis distance.

Or is the LDA class supposed to be a different algorithm?

@twiecki

The projection in the 2-class case should then be onto a 1-D space though (the separating hyper-plane), no? I guess I'm not sure what n_components is doing then to begin with.

@amueller
Owner

This is the right algorithm. And yes, my explanation left out the covariances to simplify explanation.
Basically you can imagine first computing the covariance that is shared for all classes and then transforming the space by the (square root of the) inverse (that's how I imagine it). The distance in the new space is the mahalanobis distance in the old space.
The figure you referred to shows exactly what is happening: you project to the line connecting the centers (which is what I said).
So naturally you would project to n_classes dimensions.
The n_components then optionally throws away some, in order of how much information they yield on the classes.

@twiecki

OK, seems we're on the same page regarding the algorithm. The n_components also makes sense. My final question is why you wouldn't project onto n_classes - 1 dimensions as the image suggests for the 2-class case.

@amueller
Owner

Sorry, n_classes - 1 was what I meant. The space spanned by the n_classes many means.

@twiecki

OK, but the 2-class example from above returns shape (6, 2), not shape (6,1).

@twiecki

Taking a closer look, the returned values are: lda.transform(X)[:,0] == -lda.transform(X)[:,1].

@amueller
Owner

What, that doesn't make sense... It should always be n_classes - 1 so I don't see how it should be two here.
This is probably some error in using predict_proba or decision_function - sorry, I don't have time currently.

@twiecki

OK, at least we identified the problem. Thanks for helping with this!

@mblondel
Owner

I would suggest that we switch to a more textbook-like implementation. The one in scikit-learn is hard to follow. Personally, I don't understand it...

@twiecki

Apparently there was a reimplementation but it didn't made it in. Maybe that one would be worth looking at? Anyone know where to find it?

@amueller
Owner

Pretty sure there is an issue, maybe even a PR somewhere? Gotta have a look.

@twiecki

FWIW I did look through the PRs. Not too carefully so could have missed it.

@kanielc

What would that other implementation need to make it in?

Unrelated, I've been looking through the issues for sklearn, and there seem to be quite a few in a semi-resolved state, but not identified as such.

Are the options for issues only open or closed?

@agramfort
Owner
@kanielc

I mean when there's been a pull request for a while now, and no follow up comment as to the state of that request.

I take your point about checking first before attempting a fix though.

@agramfort
Owner

can you remind me where is the PR? I see juste issues.

@kanielc
@agramfort
Owner
@mblondel
Owner

If LDA ever gets rewritten (which I hope so), I would take this as an opportunity to move it to the linear_model folder.

@amueller
Owner

There is a working implementation of LDA in a gist an linked to in one of the LDA issues:
https://gist.github.com/ksemb/4c3c86c6b62e7a99989b

For moving to linear model: kind of a good idea. It is a linear model but then QDA would be so lonely ;)
Or policy on the modules is also very ... vague... do you want to move LinearSVC also into linear_model?

@ksemb

Hi, I think the code in the scikit-learn implementation is not correct. As I wrote the implementation that is linked in the post above such that it generates the same output as the original scikit-learn implementation (at least in the overdetermined case), it should also not be correct.

I did not test the following code, but I think a correct implementation would be

return np.dot(X - self.xbar_, self.scalings_)[:,:self.n_components]

instead of the current implementation, which is

X = np.dot(X - self.xbar_, self.scalings_)
n_comp = X.shape[1] if self.n_components is None else self.n_components
return np.dot(X, self.coef_[:n_comp].T)

Andreas Mueller already noticed in #1649 that the current implementation could as well merge scalings_ and coef_, but the corrected transform function needs a seperate scaling matrix.

@blunghino

@amueller Early in the conversation you say "The n_components then optionally throws away some, in order of how much information they yield on the classes."

Are you sure that this is how LDA.transform() behaves. The documentation does not specify. In basic tests I have done, with n_classes > 3 and n_components = 2 the integer values that I assign as classes to the data affect the output of LDA.transform(). It seems that the order of the input, rather than how much information is yielded on the classes, is determining what n_components throws away.

Perhaps this is a separate issue, but the behavior you described above is the behavior I was hoping for when I began working with the sklearn LDA. It would be useful to have access to values that describe how much information is yielded by each transformed dimension (perhaps % variance). (Or is there already an easy way to calculate this from the information available?)

@amueller amueller added this to the 0.15.1 milestone
@cle1109

I just checked the example on top with the refactored code from #3523, and it gives the correct output (6, 1).

@agramfort
Owner

good.

@amueller
Owner

I am closing this one now as the original issue is resolved.

@amueller amueller closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.