LDA transform gives wrong dimensions #1967

Closed
twiecki opened this Issue May 15, 2013 · 31 comments

Comments

Projects
None yet
9 participants

twiecki commented May 15, 2013

Help doc of LDA.transform says:

Returns
-------
X_new : array, shape = [n_samples, n_components]

However:

X = np.array([[-1, -1, -1], [-2, -1, -1], [-3, -2, -2], [1, 1, 3], [2, 1, 4], [3, 2, 1]])
y = np.array([1, 1, 1, 2, 2, 2])
lda = LDA(n_components=3).fit(X, y)
lda.transform(X).shape

produces
(6, 2) while I expect it to be (6, 3) as n_components == 3

Looking at the code of transform:
return np.dot(X, self.coef_[:n_comp].T)

Where self.coef_ is computed as in .fit():
self.coef_ = np.dot(self.means_ - self.xbar_, self.scalings_)

which produces a 2x1 matrix (or whatever the number of means is).

Not sure what the correct thing would be but it seems that LDA.coef_ should have length n_compoments, no?

Owner

amueller commented May 15, 2013

yes, it should. I'm not happy with our LDA implementation, there was a reimplementation that we wanted to use but somehow we didn't move on it :-/

Owner

amueller commented May 15, 2013

Wait, the behavior was correct. It returns min(n_components, n_means), which is what it should do (I think).
I tried to fix the doc-strings a while ago, it seems I overlooked this place.

Owner

jaquesgrobler commented May 15, 2013

@amueller sorry to be that guy, but you mind explaining your last comment a bit more? My logic is still stuck on what @twiecki said up-top and I don't know the lda's code that well. :)

Owner

amueller commented May 15, 2013

LDA projects into the space spanned by the means of the classes (and possibly discards the least informative directions if n_components < n_classes). This is a linear method, it can only make statements that are orthogonal to the direction of the separating planes. Along those planes, everything looks the same basically.

twiecki commented May 15, 2013

My understanding of LDA (or Fisher LDA) was that it projects not into the space spanned by the means but also takes the covariance of the classes into account (the figure at the bottom of https://www.projectrhea.org/rhea/index.php/Fisher_Linear_Discriminant_OldKiwi is pretty informative I think), essentially finding the maximal Mahalonbis distance.

Or is the LDA class supposed to be a different algorithm?

twiecki commented May 15, 2013

The projection in the 2-class case should then be onto a 1-D space though (the separating hyper-plane), no? I guess I'm not sure what n_components is doing then to begin with.

Owner

amueller commented May 15, 2013

This is the right algorithm. And yes, my explanation left out the covariances to simplify explanation.
Basically you can imagine first computing the covariance that is shared for all classes and then transforming the space by the (square root of the) inverse (that's how I imagine it). The distance in the new space is the mahalanobis distance in the old space.
The figure you referred to shows exactly what is happening: you project to the line connecting the centers (which is what I said).
So naturally you would project to n_classes dimensions.
The n_components then optionally throws away some, in order of how much information they yield on the classes.

twiecki commented May 15, 2013

OK, seems we're on the same page regarding the algorithm. The n_components also makes sense. My final question is why you wouldn't project onto n_classes - 1 dimensions as the image suggests for the 2-class case.

Owner

amueller commented May 15, 2013

Sorry, n_classes - 1 was what I meant. The space spanned by the n_classes many means.

twiecki commented May 15, 2013

OK, but the 2-class example from above returns shape (6, 2), not shape (6,1).

twiecki commented May 15, 2013

Taking a closer look, the returned values are: lda.transform(X)[:,0] == -lda.transform(X)[:,1].

Owner

amueller commented May 15, 2013

What, that doesn't make sense... It should always be n_classes - 1 so I don't see how it should be two here.
This is probably some error in using predict_proba or decision_function - sorry, I don't have time currently.

twiecki commented May 15, 2013

OK, at least we identified the problem. Thanks for helping with this!

Owner

mblondel commented May 15, 2013

I would suggest that we switch to a more textbook-like implementation. The one in scikit-learn is hard to follow. Personally, I don't understand it...

twiecki commented May 15, 2013

Apparently there was a reimplementation but it didn't made it in. Maybe that one would be worth looking at? Anyone know where to find it?

Owner

amueller commented May 18, 2013

Pretty sure there is an issue, maybe even a PR somewhere? Gotta have a look.

twiecki commented May 18, 2013

FWIW I did look through the PRs. Not too carefully so could have missed it.

Contributor

kanielc commented Jun 24, 2013

What would that other implementation need to make it in?

Unrelated, I've been looking through the issues for sklearn, and there seem to be quite a few in a semi-resolved state, but not identified as such.

Are the options for issues only open or closed?

Owner

agramfort commented Jun 25, 2013

What would that other implementation need to make it in?

a clean PR ? performance issue was raised but I don't know how true this is.

Unrelated, I've been looking through the issues for sklearn, and there
seem to be quite a few in a semi-resolved state, but not identified as such.

Are the options for issues only open or closed?

I don't get it. If it's open then it should be fixed. But say when you
work on an issue
to avoid duplicated work.

Contributor

kanielc commented Jun 25, 2013

I mean when there's been a pull request for a while now, and no follow up comment as to the state of that request.

I take your point about checking first before attempting a fix though.

Owner

agramfort commented Jun 26, 2013

can you remind me where is the PR? I see juste issues.

Contributor

kanielc commented Jun 26, 2013

Not a pull request, but should be closed (?):
#339 (if not closed,
would it never by closed since it's a speed-up issue?)

Still open: #406
Still open: #1118

These are just a sample. It makes it confusing when getting started,
because some issues that are older, and still open, would suggest a good
thing to work on. Unfortunately, you then see a "solution" there, so you
don't know whether to tackle it still.

On Wed, Jun 26, 2013 at 4:24 AM, Alexandre Gramfort <
notifications@github.com> wrote:

can you remind me where is the PR? I see juste issues.


Reply to this email directly or view it on GitHubhttps://github.com/scikit-learn/scikit-learn/issues/1967#issuecomment-20033113
.

Owner

agramfort commented Jun 26, 2013

ok but this is unrelated to LDA. You should comment on these PRs and not
here to see if the original author of the PR is still up for it and willing
to accept some help.

Owner

mblondel commented Aug 14, 2013

If LDA ever gets rewritten (which I hope so), I would take this as an opportunity to move it to the linear_model folder.

Owner

amueller commented Aug 15, 2013

There is a working implementation of LDA in a gist an linked to in one of the LDA issues:
https://gist.github.com/ksemb/4c3c86c6b62e7a99989b

For moving to linear model: kind of a good idea. It is a linear model but then QDA would be so lonely ;)
Or policy on the modules is also very ... vague... do you want to move LinearSVC also into linear_model?

ksemb commented Sep 1, 2013

Hi, I think the code in the scikit-learn implementation is not correct. As I wrote the implementation that is linked in the post above such that it generates the same output as the original scikit-learn implementation (at least in the overdetermined case), it should also not be correct.

I did not test the following code, but I think a correct implementation would be

return np.dot(X - self.xbar_, self.scalings_)[:,:self.n_components]

instead of the current implementation, which is

X = np.dot(X - self.xbar_, self.scalings_)
n_comp = X.shape[1] if self.n_components is None else self.n_components
return np.dot(X, self.coef_[:n_comp].T)

Andreas Mueller already noticed in #1649 that the current implementation could as well merge scalings_ and coef_, but the corrected transform function needs a seperate scaling matrix.

@amueller Early in the conversation you say "The n_components then optionally throws away some, in order of how much information they yield on the classes."

Are you sure that this is how LDA.transform() behaves. The documentation does not specify. In basic tests I have done, with n_classes > 3 and n_components = 2 the integer values that I assign as classes to the data affect the output of LDA.transform(). It seems that the order of the input, rather than how much information is yielded on the classes, is determining what n_components throws away.

Perhaps this is a separate issue, but the behavior you described above is the behavior I was hoping for when I began working with the sklearn LDA. It would be useful to have access to values that describe how much information is yielded by each transformed dimension (perhaps % variance). (Or is there already an easy way to calculate this from the information available?)

amueller added this to the 0.15.1 milestone Jul 18, 2014

Contributor

cbrnr commented Nov 28, 2014

I just checked the example on top with the refactored code from #3523, and it gives the correct output (6, 1).

Owner

agramfort commented Nov 28, 2014

good.

Owner

amueller commented Jan 9, 2015

I am closing this one now as the original issue is resolved.

amueller closed this Jan 9, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment