# Principal component analysis (PCA)
Principal component analysis (PCA) is a statistical method to find a rotation such that the first coordinate has the largest variance possible, and each succeeding coordinate in turn has the largest variance possible. The columns of the rotation matrix are called principal components. PCA is used widely in dimensionality reduction.   

MLlib supports PCA for tall-and-skinny matrices stored in row-oriented format and any Vectors.

In [1]:
val PATH = "file:///Users/lzz/work/SparkML/"

import org.apache.spark.mllib.linalg.distributed.RowMatrix
import org.apache.spark.mllib.linalg.Vectors

  // Load and parse the data file.
    val rows = sc.textFile(PATH+"data/mllib/sample_lda_data.txt").map { line =>
      val values = line.split(' ').map(_.toDouble)
      Vectors.dense(values)
    }
    val mat = new RowMatrix(rows)

    // Compute principal components.
    val pc = mat.computePrincipalComponents(mat.numCols().toInt)

    println("Principal components are:\n" + pc)

Principal components are:
-0.10254174739276567    0.10636749833471082   ... (11 total)
-0.2701315901153436     -0.21148915503895177  ...
-0.05891637051343293    -0.05162499987903119  ...
0.617048999525624       0.6130842535978166    ...
-0.2520312598305243     0.2514896707404193    ...
-0.27063623897797384    0.08969537535819513   ...
0.16198717252363828     -0.47388476693087916  ...
-0.18451159969012262    0.11657129286233833   ...
0.039932516582478365    -0.19944081051727516  ...
-0.0020016435329844057  -0.22781881654639036  ...
0.577621667541641       -0.4053330068054373   ...
