Skip to content

Latest commit

 

History

History
25 lines (20 loc) · 1.75 KB

README.md

File metadata and controls

25 lines (20 loc) · 1.75 KB

sPCA

Scalable PCA (sPCA) is a scalable implementation of Principal component analysis (PCA) on top of Spark and MapReduce. sPCA achieves scalability via employing efficient large matrix operations, effectively leveraging matrix sparsity, and minimizing intermediate data. The repository contains two README files that will take you through running sPCA on Spark and MapReduce, respectively: (sPCA-Spark README, sPCA-mapreduce README).

People

Publications

  • T. Elgamal, M. Yabandeh, A. Aboulnaga, W. Mustafa, and M. Hefeeda. sPCA: Scalable Principal Component Analysis fo Big Data on Distributed Platforms. In Proc. of ACM SIGMOD’15, Melbourne, Australia, May 2015. [pdf] [bibtex]

  • T. Elgamal and M. Hefeeda. Analysis of PCA Algorithms in Distributed Environments. Technical Report arXiv:1503.05214. [pdf][bibtex]

License

sPCA is released under the terms of the MIT License.

Contact

For any issues or enhancement please use the issue pages in Github, or contact us. We will try our best to help you sort it out.