sPCA

Scalable PCA (sPCA) is a scalable implementation of Principal component analysis (PCA) on top of Spark and MapReduce. sPCA achieves scalability via employing efficient large matrix operations, effectively leveraging matrix sparsity, and minimizing intermediate data. The repository contains two README files that will take you through running sPCA on Spark and MapReduce, respectively: (sPCA-Spark README, sPCA-mapreduce README).

People

Ashraf Aboulnaga
Mohamed Hefeeda
Tarek Elgamal
Maysam Yabandeh
Waleed Mustafa

Publications

T. Elgamal, M. Yabandeh, A. Aboulnaga, W. Mustafa, and M. Hefeeda. sPCA: Scalable Principal Component Analysis fo Big Data on Distributed Platforms. In Proc. of ACM SIGMOD’15, Melbourne, Australia, May 2015. [pdf] [bibtex]
T. Elgamal and M. Hefeeda. Analysis of PCA Algorithms in Distributed Environments. Technical Report arXiv:1503.05214. [pdf][bibtex]

License

sPCA is released under the terms of the MIT License.

Contact

For any issues or enhancement please use the issue pages in Github, or contact us. We will try our best to help you sort it out.

Name		Name	Last commit message	Last commit date
Latest commit History 132 Commits
spca-mapreduce		spca-mapreduce
spca-spark		spca-spark
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spca-mapreduce

spca-mapreduce

spca-spark

spca-spark

LICENSE.md

LICENSE.md

README.md

README.md

Repository files navigation

sPCA

People

Publications

License

Contact

About

Releases

Packages

Languages

License

suchenzang/sPCA

Folders and files

Latest commit

History

Repository files navigation

sPCA

People

Publications

License

Contact

About

Resources

License

Stars

Watchers

Forks

Languages