sPCA

Scalable PCA (sPCA) is a scalable implementation of Principal component analysis (PCA) on top of Spark and MapReduce. sPCA achieves scalability via employing efficient large matrix operations, effectively leveraging matrix sparsity, and minimizing intermediate data. The repository contains two README files that will take you through running sPCA on Spark and MapReduce, respectively: (sPCA-Spark README, sPCA-mapreduce README).

People

Ashraf Aboulnaga
Mohamed Hefeeda
Tarek Elgamal
Maysam Yabandeh
Waleed Mustafa

Publications

T. Elgamal, M. Yabandeh, A. Aboulnaga, W. Mustafa, and M. Hefeeda. sPCA: Scalable Principal Component Analysis fo Big Data on Distributed Platforms. In Proc. of ACM SIGMOD’15, Melbourne, Australia, May 2015. [pdf] [bibtex]
T. Elgamal and M. Hefeeda. Analysis of PCA Algorithms in Distributed Environments. Technical Report arXiv:1503.05214. [pdf][bibtex]

License

sPCA is released under the terms of the MIT License.

Contact

For any issues or enhancement please use the issue pages in Github, or contact us. We will try our best to help you sort it out.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

sPCA

People

Publications

License

Contact

Files

README.md

Latest commit

History

README.md

File metadata and controls

sPCA

People

Publications

License

Contact