-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xmipp_matrix_dimred gives wrong mapping matrix with PCA and pPCA #315
Comments
Hi @MohamadHarastani , |
@cossorzano , can you please have a look at the output, or assign somebody who knows this program? |
Hi @DStrelak , While it works for the remaining methods (LTSA, DM, kPCA, LE, HLLE) |
I will have a look into this to see if I can find the problem |
Hi @MohamadHarastani , |
I have gone over the PCA problem, and it is indeed a memory problem. The possible solution by David is not really that useful, because it frees the memory in Xmipp in the function firstEigs, which is certainly not pleasant, as in that function we do not know what that matrix is used for. I am checking now for LLTSA |
Indeed. |
Hi @MohamadHarastani , I have gone over LLTSA and it is also a memory problem. I guess it is the same for all the methods that get killed. Apart from the interesting stress problem, in which context in CryoEM do you have such a high input dimensionality? I will check now about the mapping problem you reported first. |
Hi @MohamadHarastani , the reason for the difference between the projected PCA and the one in python, is that in PCA you have to subtract first the mean by columns from numpy import loadtxt, matmul, sum, mean, outer, ones The result is Cheers and thank you for such detailed issues, they are very easy to reproduce. |
Hi @cossorzano and @DStrelak ,
Each line of this data is an atomic structure (atom coordinates reshaped into a line).
I looked at how PCA is handled there and they seem to use the method of (Halko et al. 2009) for a size bigger than 500*500. Thanks for your efforts |
Thanks a lot @cossorzano for the solution. Thanks again, |
I strongly recommend to NOT use xmipp_matrix_dimred for this. I tried it using your dataset. My patience run out after 80 minutes :-D |
Oh 80 minutes! I didn't reach to this point before. If you try to use the sklearn PCA that I put before it works in seconds on the same data. |
I want to close this issue so I did another check.
And here is the result:
Meaning, only this description of "xmipp_matrix_dimred" should explain what is x, y and m to solve this confusion:
I will add a little description to the file and open a PR. Regards, |
Hello,
xmipp_matrix_dimred has an option to save the mapping for linear methods.
X is the input matrix, Y is the output matrix and M is the mapping.
The function seems to give correct output matrix (Y) but incorrect mapping matrix (M) for two methods: PCA and pPCA.
Here is a sample (200 samples, 3 features per sample) for testing.
X.txt
The following lines can generate all the matrices after:
The following Python commands can be used to test the output:
Here is what the output looks like:
Another problem is happening when the size of the input is high. Some of the methods fail and give a message "Killed".
Regards,
Mohamad
The text was updated successfully, but these errors were encountered: