# Principal Component Analysis

***
### `class PCA(componentCount: 0, whiten: false)`
***

## Parameters:
  #### `componentCount`: *Int, optional, default `0`*
  Number of components to keep. If `componentCount` is `0` then Minka’s MLE is used to select componentCount that best describe the dataset, else the `componentCount` is used to represent the number of component to keep.
  #### `whiten`: *Bool, optional, default `false`*
  When `true` the `components` vectors are multiplied by the square root of sample count and then divided by the singular values to ensure uncorrelated outputs with unit component-wise variances.
  Whitening will remove some information from the transformed signal (the relative variance scales of the components) but can sometime improve the predictive accuracy of the downstream estimators by making their data respect some hard-wired assumptions.


## Attributes: 
  #### `components`: *Tensor, shape [component count, feature count]*
  Principal axes in feature space, representing the directions of maximum variance in the data. The components are sorted by `explainedVariance`.
  #### `explainedVariance`: *Tensor, shape [component count]*
  The amount of variance explained by each of the selected components.
  #### `explainedVarianceRatio`: *Tensor, shape [component count]*
  Percentage of variance explained by each of the selected components.
  #### `singularValues`: *Tensor, shape [component count]*
  The singular values corresponding to each of the selected components.
  #### `mean`: *Tensor, shape [feature count]*
  Per-feature empirical mean, estimated from the training set.
  #### `noiseVariance`: *Tensor*
  The estimated noise covariance.

  

***

## Methods

***

  ### `fit(data: Tensor)`:  Fit a Principal Component Analysis.

  ### Parameters:

  #### `data`: *Tensor, shape [sample count, feature count]*
  Training data.

  ***

  ### `transformation(data: Tensor)`: Apply dimensionality reduction to input.

  ### Parameters:
  #### `for`: *Tensor, shape [sample count, feature count]*
  New data.

  ### Returns:
  Dimensionally reduced data. 

  ### `inverseTransformation(data: Tensor)`: Transform data to its original space.
  ### Parameters:
  #### `for`: *Tensor, shape [sample count, feature count]*
  New data.

  ### Returns:
  Original data whose transform would be input data.


***

# Example

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/drive/1XaoIWnlgyLUJo6dtmHCRPWfAm1XpGinC"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/param087/swiftML/blob/master/Notebooks/Principal%20Component%20Analysis.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>

## Install the swiftML package from GitHub.

In [1]:
%install '.package(url: "https://github.com/param087/swiftML", from: "0.0.4")' swiftML

Installing packages:
	.package(url: "https://github.com/param087/swiftML", from: "0.0.2")
		swiftML
With SwiftPM flags: []
Working in: /tmp/tmpzakxgcwi/swift-install
Fetching https://github.com/param087/swiftML
Completed resolution in 8.40s
Cloning https://github.com/param087/swiftML
Resolving https://github.com/param087/swiftML at 0.0.2
Compile Swift Module 'swiftML' (16 sources)
        var u: Tensor<Double>
            ^
        var outOfBootData: [[String]]
            ^
        var indices: Tensor<Int32>
            ^

Compile Swift Module 'jupyterInstalledPackages' (1 sources)
Linking ./.build/x86_64-unknown-linux/debug/libjupyterInstalledPackages.so
Initializing Swift...
Installation complete!


## Import Swift packages

In [2]:
import TensorFlow
import swiftML

## Dataset

In [3]:
let data = Tensor<Double>([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])

## Principal Component Analysis

In [4]:
let model = PCA(componentCount: 2)
model.fit(data: data)
let newData = model.transformation(for: data)
print(newData)

[[    1.383405778728807,   0.29357869708094075],
 [   2.2218980166336806,   -0.2513348437429923],
 [    3.605303795362487,  0.042243853337948334],
 [   -1.383405778728807,  -0.29357869708094075],
 [  -2.2218980166336806,    0.2513348437429923],
 [   -3.605303795362487, -0.042243853337948334]]


## Retrieve Original Dataset

In [5]:
let originalData = model.inverseTransformation(for: newData)
print(originalData)

[[-0.9999999999999997, -0.9999999999999998],
 [-1.9999999999999993, -0.9999999999999997],
 [-2.9999999999999987,  -1.999999999999999],
 [ 0.9999999999999997,  0.9999999999999998],
 [ 1.9999999999999993,  0.9999999999999997],
 [ 2.9999999999999987,   1.999999999999999]]


In [6]:
print("mean: ", model.mean)
print("components: ", model.components)
print("componentCount: ", model.componentCount)
print("explainedVariance: ",model.explainedVariance)
print("explainedVarianceRatio: ", model.explainedVarianceRatio)
print("singularValues: ", model.singularValues)

mean:  [[0.0, 0.0]]
components:  [[-0.8384922379048738, -0.5449135408239331],
 [ 0.5449135408239331, -0.8384922379048738]]
componentCount:  2
explainedVariance:  [ 7.9395431207184375, 0.06045687928155813]
explainedVarianceRatio:  [   0.9924428900898052, 0.0075571099101947705]
singularValues:  [ 6.300612319734661, 0.5498039617971033]


## Minka’s MLE is used to guess the dimension

In [7]:
let model = PCA()
model.fit(data: data)
let newData = model.transformation(for: data)
print(newData)

[[  1.383405778728807],
 [ 2.2218980166336806],
 [  3.605303795362487],
 [ -1.383405778728807],
 [-2.2218980166336806],
 [ -3.605303795362487]]


In [8]:
let XOriginal = model.inverseTransformation(for: newData)
print(XOriginal)

[[ -1.159975007336852, -0.7538365412834047],
 [-1.8630442403635754,  -1.210742315593533],
 [ -3.023019247700427, -1.9645788568769373],
 [  1.159975007336852,  0.7538365412834047],
 [ 1.8630442403635754,   1.210742315593533],
 [  3.023019247700427,  1.9645788568769373]]


In [9]:
print("mean: ", model.mean)
print("components: ", model.components)
print("componentCount: ", model.componentCount)
print("explainedVariance: ",model.explainedVariance)
print("explainedVarianceRatio: ", model.explainedVarianceRatio)
print("singularValues: ", model.singularValues)

mean:  [[0.0, 0.0]]
components:  [[-0.8384922379048738, -0.5449135408239331]]
componentCount:  1
explainedVariance:  [7.9395431207184375]
explainedVarianceRatio:  [0.9924428900898052]
singularValues:  [6.300612319734661]
