Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements for Principal Component Analysis #22

Open
5 tasks
bytesnake opened this issue Jul 14, 2020 · 3 comments
Open
5 tasks

Improvements for Principal Component Analysis #22

bytesnake opened this issue Jul 14, 2020 · 3 comments
Labels
enhancement New feature or request

Comments

@bytesnake
Copy link
Member

bytesnake commented Jul 14, 2020

A plain Principal Component Analysis algorithm was added in 7b6075e. The next steps should improve upon edge-cases and features.

  • implement Roweis Discriminant Analysis which mixes supervised and unsupervised models
  • implement sparse PCA. By adding a sparsity constraint (like LASSO) only certain principal components are selected to represent the data
  • implement robust PCA to improve robustness to outliers by using a L1 norm instead of the normal Frobenius norm
  • (?) implement non-linear PCA (should be similar to diffusion maps except for scaling)
  • add tests for edge-cases for very large, sparse or ill-behaving datasets
@bytesnake
Copy link
Member Author

sparse PCA depends on #46

@sjaustirni
Copy link

sjaustirni commented Jul 7, 2021

It seems like some Sparse/Robust PCA tests use Yale face dataset. It might be too big for linfa-datasets though (as it is a lot of image data), but I think it's way too nice of a "real-world" example to let this one slip away. Maybe we could have a separate repository for this test?

I am not sure about licensing though, the page I linked above does not mention it and the link they give to the original dataset seems broken.

EDIT: Another page about the original dataset says:

NOTE: You are free to use the Yale Face Database B for research purposes. If experimental results are obtained that use images from within the database, all publications of these results should acknowledge the use of the "Yale Face Database B" and reference this paper. Without permission from Yale University, images from within the database cannot be incorporated into a larger database which is then publicly distributed.

@bytesnake
Copy link
Member Author

Perhaps, you can take a look at the mnist crate and implement something similar for the face dataset? There is also an open issue here to replace the downloader: davidMcneil/mnist#8

@bytesnake bytesnake added the enhancement New feature or request label Jul 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants