# Dimensionality Reduction in Julia

#### Métodos Intensivos de Computación Estadística

#### Juan Sebastián Corredor Rodriguez - jucorredorr@unal.edu.

The main idea of this notebook is to present some dimensionality reduction methods in Julia and evaluate its results and performance. Specifically, the methods are:
1. [PCA (Principal Component Analysis)](https://en.wikipedia.org/wiki/Principal_component_analysis).
2. [LLE (Locally Linear Embedding)](http://www.robots.ox.ac.uk/~az/lectures/ml/lle.pdf).
3. [Isomap (Isometric Feature Mapping)](https://en.wikipedia.org/wiki/Isomap). 

#### Principal Component Analysis (PCA)

See https://github.com/JuliaStats/MultivariateStats.jl/blob/master/docs/source/pca.rst.

In [22]:
#Load the corresponding libraries

import Pkg
#Pkg.add("MultivariateStats") 
#Pkg.add("ManifoldLearning")
#Pkg.add("RDatasets")
#Pkg.add("Plots")
#Pkg.add("PlotlyJS")
#Pkg.add("ORCA")
#Pkg.add("Gadfly")
using Gadfly
using ORCA
using PlotlyJS
using ManifoldLearning
using Plots
using RDatasets
using MultivariateStats

In [23]:
plotly() # using plotly for 3D-interacive graphing

#Loading iris dataset
iris = dataset("datasets", "iris")

#Split half to training set
X_train = convert(Matrix, iris[1:2:end,1:4])'
X_train_labels = convert(Vector, iris[1:2:end,5])

#Split other half to testing set
X_test = convert(Matrix, iris[2:2:end,1:4])'
X_test_labels = convert(Vector, iris[2:2:end,5])

print("Done!")

Done!

In [37]:
#Each observation is now a column
X_train

4×75 LinearAlgebra.Adjoint{Float64,Array{Float64,2}}:
 5.1  4.7  5.0  4.6  4.4  5.4  4.8  5.8  …  6.3  6.0  6.7  5.8  6.7  6.3  6.2
 3.5  3.2  3.6  3.4  2.9  3.7  3.0  4.0     3.4  3.0  3.1  2.7  3.3  2.5  3.4
 1.4  1.3  1.4  1.4  1.4  1.5  1.4  1.2     5.6  4.8  5.6  5.1  5.7  5.0  5.4
 0.2  0.2  0.2  0.3  0.2  0.2  0.1  0.2     2.4  1.8  2.4  1.9  2.5  1.9  2.3

In [25]:
#Train a PCA model, allowing up to 3 dimensions (parameter specified in maxoutdim)
pca_model = fit(PCA, X_train; maxoutdim=3)

#Apply PCA model to testing set
X_test_projected = transform(pca_model, X_test)
X_test_projected

3×75 Array{Float64,2}:
  2.72714    2.75491    2.32396   …  -1.92047   -1.74161   -1.37706 
 -0.230916  -0.406149   0.646374      0.246554   0.127625  -0.280295
 -0.253119  -0.0271266  0.230469      0.180044   0.123165   0.314992

In [26]:
#Reconstruct testing observations (approximately)
X_test_reconstructed = reconstruct(pca_model, X_test_projected) #You can evaluate how different is of original data
X_test_reconstructed

4×75 Array{Float64,2}:
 4.86449  4.61087   5.40782   5.00775   …  6.79346  6.58825  6.46774  5.94384
 3.04262  3.08695   3.89061   3.39069      3.20785  3.13416  3.03873  2.94737
 1.46099  1.48132   1.68656   1.48668      5.91124  5.39197  5.25542  5.02469
 0.10362  0.229519  0.421233  0.221041     2.28224  1.99665  1.91243  1.91901

In [28]:
X_test 

4×75 LinearAlgebra.Adjoint{Float64,Array{Float64,2}}:
 4.9  4.6  5.4  5.0  4.9  4.8  4.3  5.7  …  6.4  6.9  6.9  6.8  6.7  6.5  5.9
 3.0  3.1  3.9  3.4  3.1  3.4  3.0  4.4     3.1  3.1  3.1  3.2  3.0  3.0  3.0
 1.4  1.5  1.7  1.5  1.5  1.6  1.1  1.5     5.5  5.4  5.1  5.9  5.2  5.2  5.1
 0.2  0.2  0.4  0.2  0.1  0.2  0.1  0.4     1.8  2.1  2.3  2.3  2.3  2.0  1.8

#### LLE and Isomap Methods

Let's see how to implements theese methods using pkg ManifoldLearning.

In [66]:
X_train_projected_isomap = transform(Isomap, X_train; k = 3, d = 3)
X_train_projected_lle = transform(LLE, X_train; k = 3, d = 3)

└ @ ManifoldLearning /Users/JuanSebastianCorredorRodriguez/.julia/packages/ManifoldLearning/cj14P/src/utils.jl:45
└ @ ManifoldLearning /Users/JuanSebastianCorredorRodriguez/.julia/packages/ManifoldLearning/cj14P/src/utils.jl:45


LLE(outdim = 3, neighbors = 3)

In [74]:
transform(X_train_projected_isomap)

3×50 Array{Float64,2}:
 -0.660178   -0.645426   -0.40219    …   2.45097   -0.268812    2.38143 
  1.57505     1.49209     0.360931      -0.295682   0.205135   -0.201306
 -0.0833845  -0.0751031   0.0248118     -0.344455   0.0522324  -0.90252 

In [71]:
transform(X_train_projected_lle)

3×50 LinearAlgebra.Transpose{Float64,Array{Float64,2}}:
  5.64286e-6  0.000159763  -0.0191072   …  -0.0110724   -0.0136918 
 -0.00011389  0.000929377  -0.00161377     -0.00576302   3.61576e-7
 -0.00330561  0.0346012    -0.0119724       0.299182    -1.74803e-6