Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PCA bug partially solved #2071

Merged
merged 2 commits into from
Mar 27, 2014
Merged

Conversation

mazumdarparijat
Copy link
Contributor

  • PCA whitening bug addressed
  • bug in matrix transformation in-place addressed
  • unit-tests added
  • PCA(True) doesn't work yet

@mazumdarparijat
Copy link
Contributor Author

@iglesias @karlnapf Please have a look. This rectifies the PCA whitening bug. But PCA(True) doesn;t work yet. whitening can be accessed using PCA(AUTO, True). This also solves a small bug which went unnoticed in the MEM_IN_PLACE case. I have hence added additional unit-tests and removed redundant/useless ones.

Travis fails because of the knn data change (which will get corrected once the knn PR is in). I have tested locally, its all good.

@mazumdarparijat
Copy link
Contributor Author

#1876

@@ -172,7 +172,7 @@ bool CPCA::init(CFeatures* features)
{
for (int32_t i=0; i<num_dim; i++)
transformMatrix.col(i) /=
sqrt(eigenValues[i+max_dim_allowed-num_dim]);
sqrt(eigenValues[i+max_dim_allowed-num_dim]*(num_vectors-1)+1e-15);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the small number here? I guess to prevent negative numbers - thats ok in principle but a bit ad-hoc.
Whats the justification? Why can eigenvalues even be negative (should not happen, and if it does, should be prevented so that we dont need this here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karlnapf no its not for negative. Its to stop dividing by zero when some eigenvalues are zero (or to stop final value from blowing up when some eigen value is very very low [ie 0 theoretically]) Adding this epsilon prevents these 0/0 kind of situations during whitening. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I see. Sorry I did not think about this properly before.
Ok then, do this, but make the ridge you add a class variable that one can set (not through constructor though, give it a default value). And then check whether some eigenvalues are too small, and then print a warning in which you suggest to drop that dimension via PCA.
"Covariance matrix has zero Eigenvalues at dimension %d. Consider reducing its dimension. Adding %f to avoid numerical problems"

@karlnapf
Copy link
Member

Ok, nice!
Once my comments are resolved, this is ready to merge. And then we can do the whitening preprocessor based on this class, which is nice since it avoids having duplicate code

@karlnapf karlnapf closed this Mar 23, 2014
@mazumdarparijat
Copy link
Contributor Author

Thats a good idea. Let me do the suggested changes by tomorrow.
Btw, Did you close this PR by mistake or you want me to send a fresh PR?
On 24 Mar 2014 01:53, "Heiko Strathmann" notifications@github.com wrote:

Closed #2071 #2071.

Reply to this email directly or view it on GitHubhttps://github.com//pull/2071
.

@karlnapf
Copy link
Member

Sorry that was a mistake

@karlnapf karlnapf reopened this Mar 23, 2014
@iglesias
Copy link
Collaborator

Nice one! Travis failed but it is not due to the changes here. I think it is my mistake actually, I merged a couple of commits to shogun-data for the nanoflann's PR (there is some changes in the KNN class) and the PR has not already been merged.

@iglesias
Copy link
Collaborator

All right, Travis is back to green now!

@mazumdarparijat, once you address the first comment by Heiko (which should be completely straightforward for you btw) we are ready to merge :-)

@mazumdarparijat
Copy link
Contributor Author

@iglesias @karlnapf sorry for stalling this. Actually I had exams going on and couldn't find a moment to do this.
I have tackled the 0 eigenvalues issue in a slightly different way than I earlier tried to do. This new way is more accurate I think. Let me know if this is ok.

@iglesias
Copy link
Collaborator

Lgtm! If @karlnapf agrees as well, it should be ready to merge.

@iglesias
Copy link
Collaborator

All right, I merge now and take the responsibility ;)

What's next @mazumdarparijat? :)

iglesias added a commit that referenced this pull request Mar 27, 2014
@iglesias iglesias merged commit b266e01 into shogun-toolbox:develop Mar 27, 2014
@mazumdarparijat
Copy link
Contributor Author

@iglesias Thanks!! :)
Next in queue is the decision trees notebook (ID3 algorithm). :-)

@karlnapf
Copy link
Member

Totally agree with everything :)
Sorry for delay, had some presentations and deadlines over the last days

@mazumdarparijat
Copy link
Contributor Author

No worries Heiko! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants