<a href="https://colab.research.google.com/github/maciejskorski/ml_examples/blob/master/RandomEmbeddings_vs_PCA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Random Embeddings vs PCA

In various setups it has been observed that Random Projections work similarly as PCA in terms of quality, while being much more efficient.

This notebook compares these techniques on MNIST and Fashion_MNIST datasets, by fiting a linear classifier on top of extracted features. 

In [1]:
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
import numpy as np
from sklearn.linear_model import LogisticRegression

def compare_pca_rp(X,y,n_components=100):
  ''' info: compares pca and random projections, by fiting a linear classifier on top of the features extracted by these methods
      input:
        X is a matrix of shape [N,M]
        y is a vector of shape [N]
        n_components is the number of extracted features
      output: a tuple with scores for random projections and pca, respectively
  '''

  ## dimensionality reduction

  n_components = 100

  # by pca

  X_pca = PCA(n_components).fit_transform(X)
  X_pca = StandardScaler().fit_transform(X_pca)

  # by random embeddings

  embed_matrix = np.random.choice(a=[-1,1],p=[0.5,0.5],size=(X.shape[1],n_components))
  X_embed = X.dot(embed_matrix)
  X_embed = StandardScaler().fit_transform(X_embed)

  # score by linear model

  model = LogisticRegression(max_iter=500)

  outputs = []

  for X in [X_embed,X_pca]:
    model.fit(X,y)
    outputs.append(model.score(X,y).round(3))
  return tuple(outputs)

In [6]:
import tensorflow as tf
import pandas as pd

result = {}

(X, y), _ = tf.keras.datasets.mnist.load_data()
X = X.reshape(X.shape[0],-1)
result['mnist'] = compare_pca_rp(X,y,n_components=100)

(X, y), _ = tf.keras.datasets.fashion_mnist.load_data()
X = X.reshape(X.shape[0],-1)
result['fashion_mnist'] = compare_pca_rp(X,y,n_components=100)

pd.DataFrame(result,index=['Random Embeddings','PCA']).T

Unnamed: 0,Random Embeddings,PCA
mnist,0.893,0.922
fashion_mnist,0.832,0.856
