-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
IPCA and CCIPCA #1885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IPCA and CCIPCA #1885
Conversation
…e, also added documentation, tests and an example
Candid Covariance-Free Incremental Principal Component Analysis (CCIPCA) | ||
------------------------------------------------------------------------ | ||
|
||
CCIPCA like PCA is used to decompose a multivariate dataset in a set of successive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small thing - Would you mind just switching up the wording in this paragraph a bit, just so it's not so similiar to the IPCA's opening paragraph. Since this will go in the documentation, it may look a bit copy/pasty to users. Thanks :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay will do, I figured there would be more opinions on the documentation so I didn't make them too different for the initial commit.
Went through it very quickly. Very nice work as far as I could tell so far. |
Hi @pickle27, thanks a lot for your pull request. I am wonder, IPCA and CCIPCA are The reason that I am asking this is that we still have some fairly core |
The main advantage of the incremental methods is being able to update the learned model if more data is acquired. I haven't tested this implementation but because it processes one sample at a time it should be able to handle larger data sets. Also the incremental methods are useful when developing an online system that is constantly receiving video for example a computer vision problem |
Great! Online PCA is certainly a very useful thing, and I for one would In terms of fast online algorithms for PCA, there is a nice table in Other interesting resource is I am sorry if I appear picky when asking all these questions. I am very A few minor remarks on the code (I just had a quick look): the method Thanks a lot for your PR: this is an incredibly useful feature! [Rehurek2010] http://arxiv.org/pdf/1102.5597.pdf |
Made the small changes you suggested. I noticed the duplication of the transform method etc as well. What do you propose for the base class structure? I came across IPCA and CCIPCA for my thesis research dealing with subspace learning applied to computer vision. Most of the vision papers were doing something very similar to the IPCA implementation. There are probably a few other references that would be suitable to add to IPCA as that technique is quite common, for example I took a quick read through Manjunath1995 from http://www.math.fsu.edu/~cbaker/IncPACK/ and it looks to be quite similar. Perhaps IPCA is just lacking a clear reference choice. I read through Rehurek2010 and it looks interesting. I'm happy to keep working on this and find which algorithm we want for scikit-learn and it will also serve to help my thesis :) I am also going to be submitting a proposal for GSoC, I had originally planned to apply to work on a online NMF algorithm (as per the ideas page) but perhaps I could propose to continue to work on this if that sounds like a good idea. |
Definitely put this on hold, I've found a bug in plain incremental PCA and it doesn't scale at all - CCIPCA is fine though |
bug is fixed but ipca is still very slow on higher dimensional data (images) - lets talk about which incremental method we want! |
I have another variant of IPCA that does scale, I won't bother formatting it for scikit-learn until we finalize which algorithm we want |
Hi Kevin ! I am very interested by this topic as I am going to need such method soon. I will make my own idea thanks to all the references here but I would like to know if you have made some progress on this question, please let me know. |
Would you mind giving the reference in this pull request. I am very |
hey I am actually formatting the reference and the code right now (not with the intention of merging but just to help the discussion) |
That's really helpful. @AlexandreAbraham is a PhD student working with me G |
I added my implementation for the incremental technique described in: Skocaj, Danijel, and Ales Leonardis. "Weighted and robust incremental method for subspace learning." Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on. IEEE, 2003. It should produce the same subspace as Hall but is more efficient with higher dimensional data like images because it uses the projections to update the basis vectors rather than the original data. Google scholar says the paper has been cited 135 times and this paper is very practical for my research but as per our discussion before I don't really know if this is the canonical incremental method that we should focus on for scikit-learn. Anyways here it is and keep me in the loop about what direction you're heading and I'll be happy to help out! IPCA is a big part of my thesis so I am definitely invested in exploring alternatives. |
I am also quite interested in this topic, and would note that one of the papers that I frequently use cites:
for incremental learning. This paper has 544 citations and is known to be robust. |
cool thanks for sharing I will give that paper a read when I get a chance |
looks interesting and the authors did post matlab source code. They compared their method to batch PCA and Hall IPCA. It should be quite easy to port their work into python if thats what we want. I am short on time right now so I don't want to do this until I get some confirmation that this is the technique that scikit learn wants. |
@pickle27 Since this PR has stalled, I think it would be a good idea to extract the source into a github repo or gist and add it to https://github.com/scikit-learn/scikit-learn/wiki/Third-party-projects-and-code-snippets |
@mblondel sure I like that idea. I'll work on doing this in the next couple of days. |
As a side note, once we have more insight on the different choices (which |
I think that for self-contained PRs which do not require modifying existing code in scikit-learn, it would be useful if people could first create a github repo then discuss inclusion into scikit-learn on the ML. This way i) we can test the repo beforehand without putting to much effort in review ii) we can reject early if needed iii) code is not lost if the PR is stalled iv) we promote our fit / predict / transform API. |
A gist, or a separate small repo (commenting on a repo is richer than in Indeed, I agree, and I think that the contributing notes could be |
I am actually going to make this into a small repo called pyIPCA. How do you suppose we can monitor the usage of each IPCA algorithm from the repo? |
Sounds great.
Not that I know. |
Also can we get a link to that github page on the main scikit learn site if there isn't one already? Something like "didn't find what you're looking for? Check the waiting list!" Then explain a bit about the process for who code gets into sklearn and why we want to be careful |
There is this wiki page: Not yet advertised on the website though. |
done: https://github.com/pickle27/pyIPCA I do think that a link from the scikit to that wiki webpage would be good |
+1 |
WIP - I expect some more work will be required before this can be merged wanted to make a PR now to get some feedback