-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PCA eigen-ized #1915
PCA eigen-ized #1915
Conversation
mazumdarparijat
commented
Mar 1, 2014
- Replaced lapack with Eigen3 code
- Added unit-test comparing Shogun results with that of Matlab implementation.
// loop varibles | ||
int32_t i,j,k; | ||
// loop variable | ||
int32_t i; | ||
|
||
ASSERT(features->get_feature_class()==C_DENSE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you change those here to REQUIRE(condition, "PCA only works with dense features") and "PCA only works with real features" while you are touching things?
@mazumdarparijat Nice work once more! :) A few comments:
Again, nice work! Very useful. |
@karlnapf thanks again! :) |
Ah I just saw your comment on SVD, nevermind all my output from before, but maybe its useful! Nice! |
@@ -34,6 +35,7 @@ | |||
m_mode = mode_; | |||
thresh = thresh_; | |||
mem_mode = mem_; | |||
method = meth_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you change things to be consistent with Shogun style here?
memer variables have a m_ prefix and avoid these ugly underscores in the parameters of the methods.
m_method=method;
m_thresh=thresh; etc
BTW could you also update apply_to_feature_vector |
Ok one last thing: Documentation. I feel bad for adding more and more to this, but it is just too useful :) |
@karlnapf Please don't mind suggesting comments, no matter how many they are! I have added the documentation that you suggested. Please have a look at it. I have addressed your other comments as well. Since things have started shaping up, I have sent a PR to shogun-data. Please merge it once this is finalized |
m_whitening = do_whitening; | ||
m_mode = mode; | ||
m_thresh = thresh; | ||
mem_mode = mem; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
m_mem_mode=mem_mode would be best here :)
okay, just a few minor glitches regarding style and doc. |
* threshold. | ||
/** @brief Preprocessor PCA performs principial component analysis on input | ||
* feature vectors/matrices. When the init method in PCA is called with proper | ||
* feature matrix X (with say N number of vectors and D feature dimension), a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor (for later, doesnt prevent merge): Could you write all math expressions (X, X', XX', UDV') in latex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay. Let me update this with the other minor things left.
ok waiting for travis now and then this is merged finally :) |
oh and pls add all this to NEWS |
whooooo! finally! :) |
It has to be related to this
This Australian data is probably not available any longer for some reason. I wonder however where should it be located, if in the local machine or it is downloaded from somewhere using curl (although I don't see the complete url for the second case). @sonney2k, any idea? @mazumdarparijat, it might be that "it works" in your machine because you don't have installed either hdf5 or curl. See the www.github.com/shogun-toolbox/shogun/blob/develop/examples/undocumented/libshogun/library_mldatahdf5.cpp. |
When mldata is in a flaky state this example throws an exception. So the example should be fixed to issue just a warning when mldata doesn't work. |
Understood, thanks! |
@mazumdarparijat, do you want to apply that fix too? :-) |
Issue open @ #1924. |
oh! so now we know. |
|
||
SG_INFO("Computing Eigenvalues ... ") | ||
// eigen value computed | ||
SelfAdjointEigenSolver<MatrixXd> eigenSolve = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i know that this has been already merged and i'm sorry that i'm only joining so late the discussion, but if possible it would be great to set the eigen solver as a parameter of the class.
this would for example enable us to set a GPU based eigen solver for PCA-ing. See viennaCL's lanczos eigen solver:
http://viennacl.sourceforge.net/doc/lanczos_8hpp.html
@karlnapf @mazumdarparijat what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vigsterkr Its not late actually. I am still touching up things in this. :)
I just now saw that there are already some files of LanczosEigenSolver and DirectEigenSolver in mathematics/linalg . But it seems that not all methods are implemented there. As of now, we can access only max eigenvalue and min eigenvalue. We could work on adding all methods there and then use them here. But in PCA we would still like to store just the transformation matrix and eigenvalues vector as parameters.
But this is all gibberish from the mind of an inexperienced chap (ie me :))
Lets wait for input from @karlnapf and @iglesias on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mazumdarparijat on one hand of course it makes sense to go with the eigen solvers that are already within shogun, i.e. mathematics/linalg/eigsolver
. But on the other hand what I really would like to see here is to finally have support in shogun for GPU based calculation (or rather say OpenCL
based ones), and that is currently not supported by those implementations. Maybe we should investigate of course how we could achieve that there, but the main idea here would be that the eigen solver in PCA is actually a parameter, so that the user of the PCA class can choose what eigen solver is being used for calculating PCA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I agree it would be great to have an abstract class for Eigensolvers (in fact we have a base class for that), Lanzcos is not good here since it is iterative and more suitable for large sparse matrices (we dont have that here)
However, I would love to see
- an Eigensolver base class (general, the existing class in linalg would have to be modified slightly)
- A subclass for Dense Eigen problems like here in PCA
- that is used for all of Shoguns Eigenproblems (like PCA)
- that can be changed easily to run problems on GPU with a global call (dont make this a parameter but a global flag for dense eigenproblems)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vigsterkr will write an entrance task issue on that :)
Looking forward to see this! :)
BTW we need a setter for EPCAMethod |
@karlnapf ok! let me do this. |
data files changed corresponding to shogun-toolbox#1915
lmnn_modular data changed shogun-toolbox#1915