-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is PMA::CCA implemented in your package? #107
Comments
Yes - my PMD is sparse CCA by penalized matrix decomposition from Witten 09 (and the more than 2 view extension described in the follow up paper). That being said your comment is interesting in that I obviously haven't made that clear. I guess because I've called it PMD it seems to refer to the Penalized PCA rather than CCA? I have seen the method referred to Interchangeably as Sparse CCA (because it was the first of the sparse CCA methods), Penalized Matrix Decomposition, and Sparse Partial Least Squares (the latter actually being my preferred name for mathematical reasons!). |
Perhaps the actionable change I can make is to make the docs explicit that this refers to PMD for CCA? |
Hi James, yeah issues with nomenclature seem to be strong with sparse CCA in general...And I must admit that I am also not quite sure if I get your answer right, so, just to get on the same page here:
I can definitely tell that the python version of Is there a Class in your repository that would give the same outputs (possibly with a different name but with the same underlying algorithm?) |
Other way round :) So my PMD class is [Sparse CCA by] Penalized Matrix Decomposition - and this thread is convincing me I should go with SCCA_PMD as the class name. My PMD should be able to perform both CCA and MultiCCA from PMA in R looking through those Docs. I would expect results to match in all cases for the first component and then for further components I have the option of what I would call CCA (or projection) deflation and classical PLS deflation. PMA performs classical PLS deflation. I could probably write a quick script based off of that one to check that. |
Ah okay so currently:
But are you really sure about that? Because I couldn't reproduce the results from |
Yep - I use the scale from the original paper 1 to sqrt number of features in PMA it looks like they convert it after the argument is passed. That being said there seem to be some numerical differences in the 2nd and 3rd mode on your example that I'll try to take a look at (objective value pretty close but converging to a different place). |
So the following passes a similar test:
Note that I have changed the number of components to 1. Differences in further components come down to the following differences in the intialization of the inner loop:
I've checked the latter point by passing the exact same initializations and then they converge to the same place so overall I'm satisfied that I'm implementing the algorithm in a valid way. With your permission I'll add your test for the first component as it's a very useful ground truth. |
Hi James, thanks for setting up the test. I asked @teekuningas for permission, since he finalized the test and will let you know once he responded :) Unfortunately I couldn't run it on my (Windows) machine (even with your latest version from today)?:
Yeah, that's the problem, I am really lacking the math here, so I cannot really contribute anything meaningful to this discussion... Running
Since I would also need to reproduce results from a former package (but still would like to use Python instead of R) I wonder if one could tweak the class in such a way that you would get more similar results also on the other variates? |
Ah, good to know! Just as a side note (but really only a UX thing, so not really important): Wouldn't it be then a good idea to adopt the |
@teekuningas agreed, so feel free to implement our test in your package :) |
Thanks @JohannesWiesner The deflation kwarg should be ok now (if not then do a pip install cca-zoo --upgrade or pull). I'll take a look at the initialization problem some time today most likely. I have an idea how I'd achieve it. That being said your result above is super interesting for my own work (and lots of other work using sparse CCA) in that it shows just how much of a problem the lack of global convergence of the algorithm is i.e. small differences in initialization result in substantial differences in weights (even though the value of the objective is similar!). Your UI point is interesting. Do you feel that it is more intuitive to have parameters entered between 0 and 1 and an error if c*#features < 1 or c*#features>sqrt(#features) or parameters entered between 1 and sqrt(#features)? |
Yup, works now! However, with my current project data and your settings I get the following
Ha, interesting, glad I could generate interesting research questions!
Yup, exactly! It would not require calculating the number of features before-hand but just uses error handling in case users don't provide a number between 0 and 1. |
I've done some tinkering on the dev branch. The hidden dev release pip install cca-zoo==1.10.2.dev1 contains my adjusted version. Now the initialization at the start of each loop is the same. There is still a bit of a difference but the weights in all 3 modes are now correlated with the R package above 0.9 so a script of the form:
will work. This checks that each dimensions has correlation with the original weights greater than 0.9. I'll probably keep the changes I've made in a future version as they've actually sped things up a bit but I've left them in dev for the moment until I get a chance to review. |
This also contains the UI change - I'm torn on that change though! |
Quick note to say I have now put all of these changes in the main version i.e. from 1.10.3. |
Closing for now |
Hi! First of all, thank you for the great work! I stumbled upon your package because I was looking for a python alternative to the
CCA
function as implemented in the PMA-package from Witten et al.:https://github.com/bnaras/PMA/blob/e36a23c506b17be03d23c547075ac134861f7ba0/R/CCA.R#L197
I already found a package that can handle this, but I wonder if
cca-zoo
has also implemented it? The only method from Witten et al. that I could only find wasPMD
The text was updated successfully, but these errors were encountered: