Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PCA eigen-ized #1915

Merged
merged 9 commits into from
Mar 3, 2014
Merged

PCA eigen-ized #1915

merged 9 commits into from
Mar 3, 2014

Conversation

mazumdarparijat
Copy link
Contributor

  • Replaced lapack with Eigen3 code
  • Added unit-test comparing Shogun results with that of Matlab implementation.

@mazumdarparijat
Copy link
Contributor Author

@sonney2k @karlnapf @iglesias I have eigen-ized PCA code as well as added rigorous unit tests for the same. Please have a look at it. PCA Integration test fails but will be ok once I update data. But before that, lets make this ready to be merged.
This partly addresses #1876

// loop varibles
int32_t i,j,k;
// loop variable
int32_t i;

ASSERT(features->get_feature_class()==C_DENSE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change those here to REQUIRE(condition, "PCA only works with dense features") and "PCA only works with real features" while you are touching things?

@karlnapf
Copy link
Member

karlnapf commented Mar 1, 2014

@mazumdarparijat Nice work once more! :)

A few comments:

  • You should add you name in the headers of CPCA.h CPCA.cpp
  • See my comment of the SVD vs Eigendecomposition. In my opinion, this should be done now that you are touching things. There was a guy in IRC yesterday that had a problem where N=10 and D=20000, for which Shogun explodes - although its an easy problem. Let me know if you have questions on this. The implementation in the David Barber toolbox should help. (You should also compare against it).
  • Travis fails due to integration tests, but yeah those can be done after all other comments have been addressed

Again, nice work! Very useful.

@mazumdarparijat
Copy link
Contributor Author

@karlnapf thanks again! :)
I will address your comments. The SVD thing shouldn't be difficult given that we are using Eigen3. Actually I needed to ensure that I am headed the right way, thats why I sent the PR early. I will add commit with SVD implementation to this PR very soon.

@karlnapf
Copy link
Member

karlnapf commented Mar 1, 2014

Ah I just saw your comment on SVD, nevermind all my output from before, but maybe its useful! Nice!

@mazumdarparijat
Copy link
Contributor Author

@iglesias @karlnapf I have updated the code.Please have a look. For now I have kept SVD as default and removed AUTO mode owing to the fact that EIGEN can be less accurate in many cases. But in case you feel we should add such an AUTO mode, its just 2-3 lines code. Let me know!

@@ -34,6 +35,7 @@
m_mode = mode_;
thresh = thresh_;
mem_mode = mem_;
method = meth_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change things to be consistent with Shogun style here?

memer variables have a m_ prefix and avoid these ugly underscores in the parameters of the methods.
m_method=method;
m_thresh=thresh; etc

@karlnapf
Copy link
Member

karlnapf commented Mar 2, 2014

BTW could you also update apply_to_feature_vector
and unit test this one?

@karlnapf
Copy link
Member

karlnapf commented Mar 2, 2014

Ok one last thing: Documentation.
Could you write a brief description of the basic math involved in PCA in the "@brief" class doxygen comment (have a look at the class list for how to write math in doxygen). In particular, explain all those different modes and what it corresponds to mathematically and what the costs are and what users should choose.
This is the most important part since it will make sure people will use and appreciate all your changes. Without it, people just get confused :)
I really like the PCA implementation now: Complete, fast, flexible. Good work!

I feel bad for adding more and more to this, but it is just too useful :)

@mazumdarparijat
Copy link
Contributor Author

@karlnapf Please don't mind suggesting comments, no matter how many they are! I have added the documentation that you suggested. Please have a look at it. I have addressed your other comments as well. Since things have started shaping up, I have sent a PR to shogun-data. Please merge it once this is finalized

m_whitening = do_whitening;
m_mode = mode;
m_thresh = thresh;
mem_mode = mem;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m_mem_mode=mem_mode would be best here :)

@karlnapf
Copy link
Member

karlnapf commented Mar 2, 2014

okay, just a few minor glitches regarding style and doc.
Could you also commit your new data version? The other PR is already merged, so that will not break travis anymore

* threshold.
/** @brief Preprocessor PCA performs principial component analysis on input
* feature vectors/matrices. When the init method in PCA is called with proper
* feature matrix X (with say N number of vectors and D feature dimension), a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor (for later, doesnt prevent merge): Could you write all math expressions (X, X', XX', UDV') in latex?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay. Let me update this with the other minor things left.

@karlnapf
Copy link
Member

karlnapf commented Mar 3, 2014

ok waiting for travis now and then this is merged finally :)
Could you however still change the two minor things? Thanks!

@karlnapf
Copy link
Member

karlnapf commented Mar 3, 2014

oh and pls add all this to NEWS
-PCA now depends on Eigen3 instead of LAPACK [yourname]
-New modes for PCA matrix factorizations: SVD & EVD, in-place or reallocating [yourname]

@karlnapf
Copy link
Member

karlnapf commented Mar 3, 2014

whooooo! finally! :)

karlnapf added a commit that referenced this pull request Mar 3, 2014
@karlnapf karlnapf merged commit df06a0e into shogun-toolbox:develop Mar 3, 2014
This was referenced Mar 3, 2014
@mazumdarparijat
Copy link
Contributor Author

Can't stop smiling! FINALLY!! Thanks @karlnapf @iglesias . Let me address the remaining comments and get back.

@mazumdarparijat
Copy link
Contributor Author

@karlnapf @iglesias Could you please check 'libshogun-library_mldatahdf5' test? Even buildbot cries that it couldn't open data repository 'australian' !! This error got fixed in Travis on its own.

@iglesias
Copy link
Collaborator

iglesias commented Mar 4, 2014

It has to be related to this

CMLDataHDF5File* hdf = new CMLDataHDF5File((char *)"australian", "/data/data");

This Australian data is probably not available any longer for some reason. I wonder however where should it be located, if in the local machine or it is downloaded from somewhere using curl (although I don't see the complete url for the second case). @sonney2k, any idea?

@mazumdarparijat, it might be that "it works" in your machine because you don't have installed either hdf5 or curl. See the www.github.com/shogun-toolbox/shogun/blob/develop/examples/undocumented/libshogun/library_mldatahdf5.cpp.

@sonney2k
Copy link
Member

sonney2k commented Mar 4, 2014

When mldata is in a flaky state this example throws an exception. So the example should be fixed to issue just a warning when mldata doesn't work.

@iglesias
Copy link
Collaborator

iglesias commented Mar 4, 2014

Understood, thanks!

@iglesias
Copy link
Collaborator

iglesias commented Mar 4, 2014

@mazumdarparijat, do you want to apply that fix too? :-)

@iglesias
Copy link
Collaborator

iglesias commented Mar 4, 2014

Issue open @ #1924.

@mazumdarparijat
Copy link
Contributor Author

oh! so now we know.
@iglesias I will be extremely happy to do it. Let me push this later today along with other very small changes in PCA documentation.


SG_INFO("Computing Eigenvalues ... ")
// eigen value computed
SelfAdjointEigenSolver<MatrixXd> eigenSolve =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know that this has been already merged and i'm sorry that i'm only joining so late the discussion, but if possible it would be great to set the eigen solver as a parameter of the class.
this would for example enable us to set a GPU based eigen solver for PCA-ing. See viennaCL's lanczos eigen solver:
http://viennacl.sourceforge.net/doc/lanczos_8hpp.html

@karlnapf @mazumdarparijat what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vigsterkr Its not late actually. I am still touching up things in this. :)
I just now saw that there are already some files of LanczosEigenSolver and DirectEigenSolver in mathematics/linalg . But it seems that not all methods are implemented there. As of now, we can access only max eigenvalue and min eigenvalue. We could work on adding all methods there and then use them here. But in PCA we would still like to store just the transformation matrix and eigenvalues vector as parameters.

But this is all gibberish from the mind of an inexperienced chap (ie me :))
Lets wait for input from @karlnapf and @iglesias on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mazumdarparijat on one hand of course it makes sense to go with the eigen solvers that are already within shogun, i.e. mathematics/linalg/eigsolver. But on the other hand what I really would like to see here is to finally have support in shogun for GPU based calculation (or rather say OpenCL based ones), and that is currently not supported by those implementations. Maybe we should investigate of course how we could achieve that there, but the main idea here would be that the eigen solver in PCA is actually a parameter, so that the user of the PCA class can choose what eigen solver is being used for calculating PCA.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree it would be great to have an abstract class for Eigensolvers (in fact we have a base class for that), Lanzcos is not good here since it is iterative and more suitable for large sparse matrices (we dont have that here)

However, I would love to see

  • an Eigensolver base class (general, the existing class in linalg would have to be modified slightly)
  • A subclass for Dense Eigen problems like here in PCA
  • that is used for all of Shoguns Eigenproblems (like PCA)
  • that can be changed easily to run problems on GPU with a global call (dont make this a parameter but a global flag for dense eigenproblems)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vigsterkr will write an entrance task issue on that :)
Looking forward to see this! :)

@karlnapf
Copy link
Member

karlnapf commented Mar 6, 2014

BTW we need a setter for EPCAMethod

@mazumdarparijat
Copy link
Contributor Author

@karlnapf ok! let me do this.

spothound pushed a commit to spothound/shogun that referenced this pull request Feb 1, 2018
spothound pushed a commit to spothound/shogun that referenced this pull request Feb 1, 2018
spothound pushed a commit to spothound/shogun that referenced this pull request Feb 1, 2018
spothound pushed a commit to spothound/shogun that referenced this pull request Feb 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants