Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get eigenvectors of kernel pca #17171

Open
s0rel opened this issue May 9, 2020 · 7 comments
Open

How to get eigenvectors of kernel pca #17171

s0rel opened this issue May 9, 2020 · 7 comments

Comments

@s0rel
Copy link

s0rel commented May 9, 2020

Hello,

I'm using kernel pca to reduce dimensionality and I need eigenvalues and eigenvectors. In PCA, I know pca.explained_variance_ is eigenvalues and pca.components_ is eigenvectors. I read the sklearn document and found the below words in kpca.

lambdas_ : array, (n_components,)
Eigenvalues of the centered kernel matrix in decreasing order.

alphas_ : array, (n_samples, n_components)
Eigenvectors of the centered kernel matrix.

Compare with pca's document, I'm confused about why the eigenvectors's shape is not equal. In pca, the shape is (n_components, n_features) while kpca is (n_samples, n_components). Here is pca's document.

explained_variance_array, shape (n_components,)
The amount of variance explained by each of the selected components.Equal to n_components largest eigenvalues of the covariance matrix of X.

components_ : array, shape (n_components, n_features)
Principal axes in feature space, representing the directions of maximum variance in the data. The components are sorted by explained_variance_.

I know if the kpca's kernel is linear, it is exactly pca. So I want know how to get eigenvalues and eigenvectors in kpca. Can you help me?

@TomDLT
Copy link
Member

TomDLT commented May 12, 2020

The docstrings are correct, alphas_ contains the eigenvectors of the kernel PCA, and components_ the eigenvectors of the PCA.


PCA is based on the eigen-decomposition of the covariance matrix C = X.T @ X, which is of shape (n_features, n_features). Therefore, the eigenvectors are vectors of length (n_features).

KernelPCA(kernel="linear") is based on the eigen-decomposition of the Gram matrix G = X @ X.T, which is of shape (n_samples, n_samples). Therefore, the eigenvectors are vectors of length (n_samples).

The equivalence comes from the fact that G has the same eigenvalues as C, and its eigenvectors are also identical, up to a normalization.
About this equivalence, we would gladly accept improvement of our documentation, for instance in the user guide (/doc/modules/decomposition.rst).

@Asami-1
Copy link

Asami-1 commented Dec 6, 2020

take

@Asami-1
Copy link

Asami-1 commented Jan 18, 2021

Hi, first time contributor here.

I've added some explanations about KPCA, as asked in the issue and created a PR (#19201 ).
I went into the details of the approach, hoping to clarify the misunderstanding concerning the shapes.
I also added an image and wasn't sure where to put it, I went with doc/images, hoping it's alright.

If there's anything that you think should be changed, don't hesitate. I'm still a student so I wouldn't be surprised if I messed something up.

Thank you and have a nice day.

@Asami-1
Copy link

Asami-1 commented Jan 21, 2021

@TomDLT Hi, could you take a look at this if you have time please ? Thanks

@TomDLT
Copy link
Member

TomDLT commented Jan 21, 2021

Hi Asami, thanks a lot for your pull-request, it is much appreciated. I will take a look at it when I have time.
Note that reviewing take quite some time, especially on documentation, which needs to be correct, clear, and concise.

@EmilioIppoliti
Copy link

Hi, I would like to obtain the Eigenvectors Matrix (n_samples ,n_featurees) from Kernel PCA , is possible?

@TsChala
Copy link

TsChala commented Sep 30, 2021

Hi, I would like to obtain the Eigenvectors Matrix (n_samples ,n_featurees) from Kernel PCA , is possible?

To get the eigenvectors in a shape (n_samples, n_components), you just need to compute the kPCA for the transpose of X, instead of X. Since usually n_features << n_samples, this will take way more time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants