<img src="https://i.postimg.cc/52kKwKCx/illustration1.jpg" alt="Epic Fight">

In this notebook i'll experiment many different dimension reduction algorithms on mnist dataset and see if they can handle data authenticity when they reduce progressively dimensions.

<img src="https://i.postimg.cc/bwqyWT3c/pic-illustrated.png" align="right" width="800" height="400">

MNIST Dataset contains 28 * 28 pixels for each digit which makes in total 784 dimensions.

I'll apply this algorithms to reduce dimensions progressively and train neural network on them using 10 Kfold cv and see how redimentioning affect model accuracy, as showed in next illustration.

the aim is to have intuition about each algorithme performance.

<BR CLEAR=”left” />

ps: I could not test TSNE because badly manages more than 4 dimensions.

Lets begin !

# Contents

* [<font size=4>Libraryes for fun</font>](#1)
* [<font size=4>PCA</font>](#2)
* [<font size=4>IncrementalPCA</font>](#3)
* [<font size=4>NMF</font>](#4)
* [<font size=4>KernelPCA</font>](#5)
* [<font size=4>Isomap</font>](#6)
* [<font size=4>TruncatedSVD</font>](#7)
* [<font size=4>Gaussian Random Projection</font>](#8)
* [<font size=4>FastICA</font>](#9)
* [<font size=4>MiniBatch Dictionary Learning</font>](#10)
* [<font size=4>Sparse Random Projection</font>](#11)

# Libraryes for fun <a id="1"></a>

In [None]:
# !jupyter labextension install jupyterlab-plotly

import numpy as np 
import pandas as pd 

from tensorflow.keras import Sequential

from sklearn.model_selection import cross_val_score
from collections import defaultdict
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import StratifiedKFold
from tqdm.notebook import tqdm 
from keras.utils.np_utils import to_categorical
import seaborn as sns
from sklearn.linear_model import LogisticRegression

import plotly.express as px
import matplotlib.pyplot as plt
import plotly.offline as pyo

from sklearn.manifold import LocallyLinearEmbedding
from sklearn.decomposition import PCA
from sklearn.decomposition import IncrementalPCA
from sklearn.decomposition import KernelPCA
from sklearn.decomposition import SparsePCA
from sklearn.decomposition import NMF
from sklearn.manifold import Isomap
from sklearn.manifold import TSNE
from sklearn.decomposition import TruncatedSVD
from sklearn.random_projection import GaussianRandomProjection
from sklearn.decomposition import FastICA
from sklearn.decomposition import MiniBatchDictionaryLearning
from sklearn.random_projection import SparseRandomProjection

from sklearn.datasets import make_classification
from xgboost import XGBClassifier
import lightgbm as lgb

import gc

pyo.init_notebook_mode()

train = pd.read_csv('../input/digit-recognizer/train.csv')

train = train.sample(10000).reset_index(drop=True) #i pick sample of 10k digit to increase speed

train.loc[:,'pixel0':] = train.loc[:,'pixel0':]/255

X_ = train.loc[:,'pixel0':]
y = train['label']

components = [784 ,int(785/2) ,int(785/4) ,int(785/8), int(785/16), int(785/32), int(785/64), int(785/128), int(785/256), 2]
# components = components[::-1]
batch_size = 501
epochs = 17
neurons = 958
optimizer = 'Adam'
random_state=42

In [None]:
def digit_show (df_train, number, name) :
    plt.figure(figsize=(20, 15))
    j = 1
    for i in range(10) :
        if number == 784 :
            plt.subplot(1,10,j)
            plt.gca().set_title('Image Reconstruction from Compressed Representation', fontsize=16)
        else :
            plt.subplot(1,10,j)
        j +=1
        plt.imshow(df_train[df_train['label'] == i].head(1).drop(labels = ["label"],axis = 1).values.reshape(28, 28), cmap='gray', interpolation='none')
        if number == 784 :
            plt.title("Original : {}".format(i))
        else :
            plt.title("{} {} Digit: {}".format(name, number, i))
    plt.tight_layout()


In [None]:
# this funcion take Algorithme in entry, use it for dimention reduction, train NN with results, plus 2D and 3D result, and accuracy performance.

def dimensionality_reduction_octagone(alg, name) :
    dim2 = []
    dim3 = []
    result = []
    names = []
    results = defaultdict(list)
    
    for i in tqdm(components) :
        if i == 784 :
            X= X_.values
        else :
            if name == 'KernelPCA' :
                alg_ = alg(n_components=i, fit_inverse_transform = True).fit(X_)
                X = alg_.transform(X_)
            else :
                alg_ = alg(n_components=i).fit(X_)
                X = alg_.transform(X_)
            
        if i == 2 :
            dim2 = X
        elif i ==3 :
            dim3 = X
            
        kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=random_state)
        cvscores = []
        for train, test in kfold.split(X, y):
            model = Sequential()
            model.add(Dense(neurons, input_dim=i, activation='relu'))
            model.add(Dropout(0.2))
            model.add(Dense(neurons, activation='relu'))
            model.add(Dropout(0.2))
            model.add(Dense(10, activation='softmax'))
            model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])
            model.fit(X[train], to_categorical(y[train],num_classes = 10), epochs=epochs, batch_size=batch_size, verbose = 0)
            scores = model.evaluate(X[test], to_categorical(y[test],num_classes = 10), verbose=0)
            cvscores.append(scores[1] * 100)
        results[name + ' ' + str(i)].append(cvscores)
        
        if name not in ('Isomap', 'GaussianRandomProjection', 'MiniBatchDictionaryLearning', 'SparseRandomProjection') :
            if i == 784 :
                digit = pd.merge(pd.DataFrame(X), pd.DataFrame(y), left_index=True, right_index=True)
                digit_show(digit, i, name)
            else :
                digit = pd.merge(pd.DataFrame(alg_.inverse_transform(X)), pd.DataFrame(y), left_index=True, right_index=True)
                digit_show(digit, i, name)
            
    
    plt.figure(figsize=(30,10))
    
    for key, value in results.items():
        if key == name + ' ' + '784' :
            names.append('Original')
        else :
            names.append(key)
            
        result.append(value)

    plt.xticks(rotation=45)
    ax = sns.boxplot(x=names, y= result)
    ax.set(xlabel= name + ' Components (Dimmentions)', ylabel='Accuracy %')
    ax.set_title('Accuracy Progression from '+ name + ' Compressed Representation')
   

        
    final_2D = pd.merge(pd.DataFrame(dim2), pd.DataFrame(y), left_index=True, right_index=True)
    final_2D.columns = ['X','Y','Label']
    final_2D.Label = final_2D.Label.astype('str')

    fig1 = px.scatter(final_2D, x='X', y='Y', color="Label", title= name + " 2 Components")
#     fig1.show()
    
    final_3D = pd.merge(pd.DataFrame(dim3), pd.DataFrame(y), left_index=True, right_index=True)
    final_3D.columns = ['X','Y','Z','Label']
    final_3D.Label = final_3D.Label.astype('str')

    fig2 = px.scatter_3d(final_3D, x='X', y='Y', z= 'Z', color="Label", size_max=0.2, title= name + " 3 Components")
#     fig2.update_traces(marker=dict(size=2,
#                               line=dict(width=0,
#                                         color='DarkSlateGrey')),
#                   selector=dict(mode='markers'))
#     fig2.show()
    
    
    fig1.show()
    fig2.show()
    gc.collect()

# PCA <a id="2"></a>

<img src="https://i.postimg.cc/WzsdN5C9/Principal-Component-Analysis-second-principal.gif" align="right" width="500" height="400">

<a href="https://builtin.com/data-science/step-step-explanation-principal-component-analysis">Principal Component Analysis</a>, or PCA, is a dimensionality-reduction method that is often used to reduce the dimensionality of large data sets, by transforming a large set of variables into a smaller one that still contains most of the information in the large set.

Reducing the number of variables of a data set naturally comes at the expense of accuracy, but the trick in dimensionality reduction is to trade a little accuracy for simplicity. Because smaller data sets are easier to explore and visualize and make analyzing data much easier and faster for machine learning algorithms without extraneous variables to process.

So to sum up, the idea of PCA is simple — reduce the number of variables of a data set, while preserving as much information as possible.

<BR CLEAR=”left” />

In [None]:
dimensionality_reduction_octagone(PCA, 'PCA')

* PCA decrease performances in 392 components but it increase it from 196 until 24 and that is interesting, that mean PCA can increase performances after reducing number of components.

* we can easily recognize digit after 49 components.

* 2D and 3D representation is not that bad.

# IncrementalPCA <a id="3"></a>

<a href="https://scikit-learn.org/stable/modules/decomposition.html#incrementalpca">IncrementalPCA</a>, <p>The PCA object is very useful, but has certain limitations for
large datasets. The biggest limitation is that PCA only supports
batch processing, which means all of the data to be processed must fit in main
memory. The IncrementalPCA object uses a different form of
processing and allows for partial computations which almost
exactly match the results of while processing the data in a
minibatch fashion. IncrementalPCA makes it possible to implement
out-of-core Principal Component Analysis either by:</p>

<blockquote>
<div><ul class="simple">
<li><p>Using its partial_fit method on chunks of data fetched sequentially
from the local hard drive or a network database.</p></li>
<li><p>Calling its fit method on a sparse matrix or a memory mapped file using
numpy.memmap</p></li>
</ul>
</div></blockquote>

IncrementalPCA only stores estimates of component and noise variances,
in order update explained_variance_ratio_ incrementally. This is why
memory usage depends on the number of samples per batch, rather than the
number of samples to be processed in the dataset.</p>

As in PCA IncrementalPCA centers but does not scale the
input data for each feature before applying the SVD.</p>

In [None]:
dimensionality_reduction_octagone(IncrementalPCA, 'IncrementalPCA')

* Same as PCA, IncrementalPCA decrease performances in 392 components but it increase it from 196 until 24 and that is interesting, that mean IncrementalPCA can increase performances after reducing number of components, buy generally there is more variance in folds.

* we can easily recognize digit after 49 components.

* 2D and 3D representation is not that bad same as PCA.

# NMF <a id="4"></a>

<a href="https://en.wikipedia.org/wiki/Non-negative_matrix_factorization#:~:text=Non%2Dnegative%20matrix%20factorization%20(NMF,matrices%20have%20no%20negative%20elements.">Non-negative matrix factorization</a>, <p><b></b>  (<b>NMF</b> or <b>NNMF</b>), also <b>non-negative matrix approximation</b><sup id="cite_ref-dhillon_1-0" class="reference"></sup> is a group of algorithms in multivariate analysis and linear algebra where a matrix <span class="texhtml"><b>V</b></span> is factorized into (usually) two matrices <span class="texhtml"><b>W</b></span> and <span class="texhtml"><b>H</b></span>, with the property that all three matrices have no negative elements. This non-negativity makes the resulting matrices easier to inspect. Also, in applications such as processing of audio spectrograms or muscular activity, non-negativity is inherent to the data being considered. Since the problem is not exactly solvable in general, it is commonly approximated numerically.
</p>
<br />
<img src="https://i.postimg.cc/qRLpWxJs/wiki.png" width="500" height="400">
<br />
<br />
<p>NMF finds applications in such fields as astronomy, computer vision, document clustering, chemometrics, audio signal processing, recommender systems, and bioinformatics.</p>

In [None]:
dimensionality_reduction_octagone(NMF, 'NMF')

* After 24 components accuracy decrease slowly and globaly it perform bad.

* we can badly recognize digit after 12 components.

* Strange 2D and 3D representation, look like they are stuck at the border.

# KernelPCA <a id="5"></a>

<a href="https://en.wikipedia.org/wiki/Kernel_principal_component_analysis">kernel principal component analysis (kernel PCA)</a>
<p>In the field of multivariate statistics <b></b> 
<sup id="cite_ref-1" class="reference"></sup>
is an extension of principal component analysis (PCA) using techniques of kernel methods. Using a kernel, the originally linear operations of PCA are performed in a reproducing kernel Hilbert space.
</p>

In [None]:
dimensionality_reduction_octagone(KernelPCA, 'KernelPCA')

* KernelPCA is very similar to PCS in performances, the progression of accuracy is nearly the same.

* Digit are all clear but i think thr is a bug in inverse_transform function, in addition to reverse it adds the fit inverse_transform attribute to the fit function, it's a bit awkward.

* Nearly same 2D and 3D representation to PCA.

# Isomap <a id="6"></a>

<p> <a href="https://en.wikipedia.org/wiki/Isomap">Isomap</a> is a nonlinear dimensionality reduction method. It is one of several widely used low-dimensional embedding methods. Isomap is used for computing a quasi-isometric, low-dimensional embedding of a set of high-dimensional data points.  The algorithm provides a simple method for estimating the intrinsic geometry of a data manifold based on a rough estimate of each data point’s neighbors on the manifold. Isomap is highly efficient and generally applicable to a broad range of data sources and dimensionalities.
</p>

In [None]:
dimensionality_reduction_octagone(Isomap, 'Isomap')

* Isomap take long time to  finish.

* It perform well in 2D and 3D

* No possibility to reconstruct digit.

* relativly big varience in accuracy and big drop between 6 and 3 componenets but globaly it performe well. nearly same as PCA.

# TruncatedSVD <a id="7"></a>

<p>In linear algebra</p>, the <b>singular value decomposition</b> (<b>SVD</b>) is a factorization of a realor complex matrix that generalizes the eigendecomposition of a square normal matrix to any 
<semantics>
<mrow class="MJX-TeXAtom-ORD">
<mstyle displaystyle="true" scriptlevel="0">
<mi>m</mi>
<mo>×<!-- × --></mo>
<mi>n</mi>
</mstyle>
</mrow>
<annotation encoding="application/x-tex">{\displaystyle m\times n}</annotation>
</semantics>
</math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/12b23d207d23dd430b93320539abbb0bde84870d" class="mwe-math-fallback-image-inline" aria-hidden="true" style="vertical-align: -0.338ex; width:6.276ex; height:1.676ex;" alt="m\times n"></span> matrix via an extension of the polar decomposition.
</p>
<p>Specifically, the singular value decomposition of an <span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle m\times n}">
<semantics>
<mrow class="MJX-TeXAtom-ORD">
<mstyle displaystyle="true" scriptlevel="0">
<mi>m</mi>
<mo>×<!-- × --></mo>
<mi>n</mi>
</mstyle>
</mrow>
<annotation encoding="application/x-tex">{\displaystyle m\times n}</annotation>
</semantics>
</math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/12b23d207d23dd430b93320539abbb0bde84870d" class="mwe-math-fallback-image-inline" aria-hidden="true" style="vertical-align: -0.338ex; width:6.276ex; height:1.676ex;" alt="m\times n"></span> real or complex matrix <span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle \mathbf {M} }">
<semantics>
<mrow class="MJX-TeXAtom-ORD">
<mstyle displaystyle="true" scriptlevel="0">
<mrow class="MJX-TeXAtom-ORD">
<mi mathvariant="bold">M</mi>
</mrow>
</mstyle>
</mrow>
<annotation encoding="application/x-tex">{\displaystyle \mathbf {M} }</annotation>
</semantics>
</math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/e499ae5946af9c09777ada933051b3669d3372c2" class="mwe-math-fallback-image-inline" aria-hidden="true" style="vertical-align: -0.338ex; width:2.537ex; height:2.176ex;" alt="\mathbf {M} "></span> is a factorization of the form <span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle \mathbf {U\Sigma V^{*}} }">
<semantics>
<mrow class="MJX-TeXAtom-ORD">
<mstyle displaystyle="true" scriptlevel="0">
<mrow class="MJX-TeXAtom-ORD">
<mi mathvariant="bold">U</mi>
<mi mathvariant="bold">Σ<!-- Σ --></mi>
<msup>
<mi mathvariant="bold">V</mi>
<mrow class="MJX-TeXAtom-ORD">
<mo mathvariant="bold">∗<!-- ∗ --></mo>
</mrow>
</msup>
</mrow>
</mstyle>
</mrow>
<annotation encoding="application/x-tex">{\displaystyle \mathbf {U\Sigma V^{*}} }</annotation>
</semantics>
</math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/b696ac58329349e430962ce8fa94b50a60ea30a5" class="mwe-math-fallback-image-inline" aria-hidden="true" style="vertical-align: -0.338ex; width:7.185ex; height:2.343ex;" alt="{\displaystyle \mathbf {U\Sigma V^{*}} }"></span>, where <span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle \mathbf {U} }">
<semantics>
<mrow class="MJX-TeXAtom-ORD">
<mstyle displaystyle="true" scriptlevel="0">
<mrow class="MJX-TeXAtom-ORD">
<mi mathvariant="bold">U</mi>
</mrow>
</mstyle>
</mrow>
<annotation encoding="application/x-tex">{\displaystyle \mathbf {U} }</annotation>
</semantics>
</math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/b2141bec2344e3dc5241ff50b0fd366755e00223" class="mwe-math-fallback-image-inline" aria-hidden="true" style="vertical-align: -0.338ex; width:2.057ex; height:2.176ex;" alt="\mathbf {U} "></span> is an <span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle m\times m}">
<semantics>
<mrow class="MJX-TeXAtom-ORD">
<mstyle displaystyle="true" scriptlevel="0">
<mi>m</mi>
<mo>×<!-- × --></mo>
<mi>m</mi>
</mstyle>
</mrow>
<annotation encoding="application/x-tex">{\displaystyle m\times m}</annotation>
</semantics>
</math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/367523981d714dcd9214703d654bfdedbe58d44a" class="mwe-math-fallback-image-inline" aria-hidden="true" style="vertical-align: -0.338ex; width:6.921ex; height:1.676ex;" alt="m\times m"></span> real or complex unitary matrix, <span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle \mathbf {\Sigma } }">
<semantics>
<mrow class="MJX-TeXAtom-ORD">
<mstyle displaystyle="true" scriptlevel="0">
<mrow class="MJX-TeXAtom-ORD">
<mi mathvariant="bold">Σ<!-- Σ --></mi>
</mrow>
</mstyle>
</mrow>
<annotation encoding="application/x-tex">{\displaystyle \mathbf {\Sigma } }</annotation>
</semantics>
</math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/90f99b56fe6ada781ecd0f8a45b6e787b6dfed56" class="mwe-math-fallback-image-inline" aria-hidden="true" style="vertical-align: -0.338ex; width:1.931ex; height:2.176ex;" alt="\mathbf{\Sigma}"></span> is an <span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle m\times n}">
<semantics>
<mrow class="MJX-TeXAtom-ORD">
<mstyle displaystyle="true" scriptlevel="0">
<mi>m</mi>
<mo>×<!-- × --></mo>
<mi>n</mi>
</mstyle>
</mrow>
<annotation encoding="application/x-tex">{\displaystyle m\times n}</annotation>
</semantics>
</math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/12b23d207d23dd430b93320539abbb0bde84870d" class="mwe-math-fallback-image-inline" aria-hidden="true" style="vertical-align: -0.338ex; width:6.276ex; height:1.676ex;" alt="m\times n"></span> rectangular diagonal matrix with non-negative real numbers on the diagonal, and <span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle \mathbf {V} }">
<semantics>
<mrow class="MJX-TeXAtom-ORD">
<mstyle displaystyle="true" scriptlevel="0">
<mrow class="MJX-TeXAtom-ORD">
<mi mathvariant="bold">V</mi>
</mrow>
</mstyle>
</mrow>
<annotation encoding="application/x-tex">{\displaystyle \mathbf {V} }</annotation>
</semantics>
</math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/c0048514530d0c0fb8a7beb795110815a818784d" class="mwe-math-fallback-image-inline" aria-hidden="true" style="vertical-align: -0.338ex; width:2.019ex; height:2.176ex;" alt="\mathbf {V} "></span> is an <span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle n\times n}">
<semantics>
<mrow class="MJX-TeXAtom-ORD">
<mstyle displaystyle="true" scriptlevel="0">
<mi>n</mi>
<mo>×<!-- × --></mo>
<mi>n</mi>
</mstyle>
</mrow>
<annotation encoding="application/x-tex">{\displaystyle n\times n}</annotation>
</semantics>
</math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/59d2b4cb72e304526cf5b5887147729ea259da78" class="mwe-math-fallback-image-inline" aria-hidden="true" style="vertical-align: -0.338ex; width:5.63ex; height:1.676ex;" alt="n\times n"></span> real or complex unitary matrix.  If <span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle \mathbf {M} }">
<semantics>
<mrow class="MJX-TeXAtom-ORD">
<mstyle displaystyle="true" scriptlevel="0">
<mrow class="MJX-TeXAtom-ORD">
<mi mathvariant="bold">M</mi>
</mrow>
</mstyle>
</mrow>
<annotation encoding="application/x-tex">{\displaystyle \mathbf {M} }</annotation>
</semantics>
</math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/e499ae5946af9c09777ada933051b3669d3372c2" class="mwe-math-fallback-image-inline" aria-hidden="true" style="vertical-align: -0.338ex; width:2.537ex; height:2.176ex;" alt="\mathbf {M} "></span> is real, <span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle \mathbf {U} }">
<semantics>
<mrow class="MJX-TeXAtom-ORD">
<mstyle displaystyle="true" scriptlevel="0">
<mrow class="MJX-TeXAtom-ORD">
<mi mathvariant="bold">U</mi>
</mrow>
</mstyle>
</mrow>
<annotation encoding="application/x-tex">{\displaystyle \mathbf {U} }</annotation>
</semantics>
</math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/b2141bec2344e3dc5241ff50b0fd366755e00223" class="mwe-math-fallback-image-inline" aria-hidden="true" style="vertical-align: -0.338ex; width:2.057ex; height:2.176ex;" alt="\mathbf {U} "></span> and <span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle \mathbf {V^{T}} =\mathbf {V^{*}} }">
<semantics>
<mrow class="MJX-TeXAtom-ORD">
<mstyle displaystyle="true" scriptlevel="0">
<mrow class="MJX-TeXAtom-ORD">
<msup>
<mi mathvariant="bold">V</mi>
<mrow class="MJX-TeXAtom-ORD">
<mi mathvariant="bold">T</mi>
</mrow>
</msup>
</mrow>
<mo>=</mo>
<mrow class="MJX-TeXAtom-ORD">
<msup>
<mi mathvariant="bold">V</mi>
<mrow class="MJX-TeXAtom-ORD">
<mo mathvariant="bold">∗<!-- ∗ --></mo>
</mrow>
</msup>
</mrow>
</mstyle>
</mrow>
<annotation encoding="application/x-tex">{\displaystyle \mathbf {V^{T}} =\mathbf {V^{*}} }</annotation>
</semantics>
</math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/e368902fa17988abd314f42286e859f0dc88207b" class="mwe-math-fallback-image-inline" aria-hidden="true" style="vertical-align: -0.338ex; width:9.862ex; height:2.676ex;" alt="{\displaystyle \mathbf {V^{T}} =\mathbf {V^{*}} }"></span> are real orthogonal matrices. 
</p>
<p>Mathematical applications of the SVD include computing the pseudoinverse, matrix approximation, and determining the rank, range, and null space of a matrix.  The SVD is also extremely useful in all areas of science, engineering, and ">statistics, such as signal processing, least squares fitting of data, and process control.
</p>

<br />
<img src="https://i.postimg.cc/9QWByG06/1024px-Singular-Value-Decomposition-svg.png" width="500" height="400">
<br />
<br />


<br />
<br />
<a>Truncated_SVD</a> 

<dl><dd><span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle {\tilde {\mathbf {M} }}=\mathbf {U} _{t}{\boldsymbol {\Sigma }}_{t}\mathbf {V} _{t}^{*}}">
  <semantics>
    <mrow class="MJX-TeXAtom-ORD">
      <mstyle displaystyle="true" scriptlevel="0">
        <mrow class="MJX-TeXAtom-ORD">
          <mrow class="MJX-TeXAtom-ORD">
            <mover>
              <mrow class="MJX-TeXAtom-ORD">
                <mi mathvariant="bold">M</mi>
              </mrow>
              <mo stretchy="false">~<!-- ~ --></mo>
            </mover>
          </mrow>
        </mrow>
        <mo>=</mo>
        <msub>
          <mrow class="MJX-TeXAtom-ORD">
            <mi mathvariant="bold">U</mi>
          </mrow>
          <mrow class="MJX-TeXAtom-ORD">
            <mi>t</mi>
          </mrow>
        </msub>
        <msub>
          <mrow class="MJX-TeXAtom-ORD">
            <mi mathvariant="bold">Σ<!-- Σ --></mi>
          </mrow>
          <mrow class="MJX-TeXAtom-ORD">
            <mi>t</mi>
          </mrow>
        </msub>
        <msubsup>
          <mrow class="MJX-TeXAtom-ORD">
            <mi mathvariant="bold">V</mi>
          </mrow>
          <mrow class="MJX-TeXAtom-ORD">
            <mi>t</mi>
          </mrow>
          <mrow class="MJX-TeXAtom-ORD">
            <mo>∗<!-- ∗ --></mo>
          </mrow>
        </msubsup>
      </mstyle>
    </mrow>
    <annotation encoding="application/x-tex">{\displaystyle {\tilde {\mathbf {M} }}=\mathbf {U} _{t}{\boldsymbol {\Sigma }}_{t}\mathbf {V} _{t}^{*}}</annotation>
  </semantics>
</math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/db110a4ccfa4cf33905c27cf163bca3941c516d2" class="mwe-math-fallback-image-inline" aria-hidden="true" style="vertical-align: -0.838ex; width:14.349ex; height:3.176ex;" alt="{\tilde {\mathbf {M} }}=\mathbf {U} _{t}{\boldsymbol {\Sigma }}_{t}\mathbf {V} _{t}^{*}"></span></dd></dl>

<p>Only the <i>t</i> column vectors of <i>U</i> and <i>t</i> row vectors of <i>V*</i> corresponding to the <i>t</i> largest singular values Σ<sub><i>t</i></sub> are calculated. The rest of the matrix is discarded. This can be much quicker and more economical than the compact SVD if <i>t</i>≪<i>r</i>. The matrix <i>U</i><sub><i>t</i></sub> is thus <i>m</i>×<i>t</i>, Σ<sub><i>t</i></sub> is <i>t</i>×<i>t</i> diagonal, and <i>V</i><sub><i>t</i></sub>* is <i>t</i>×<i>n</i>.
</p>

<p>Of course the truncated SVD is no longer an exact decomposition of the original matrix <i>M</i>, but as discussed above, the approximate matrix <span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle {\tilde {\mathbf {M} }}}">
  <semantics>
    <mrow class="MJX-TeXAtom-ORD">
      <mstyle displaystyle="true" scriptlevel="0">
        <mrow class="MJX-TeXAtom-ORD">
          <mrow class="MJX-TeXAtom-ORD">
            <mover>
              <mrow class="MJX-TeXAtom-ORD">
                <mi mathvariant="bold">M</mi>
              </mrow>
              <mo stretchy="false">~<!-- ~ --></mo>
            </mover>
          </mrow>
        </mrow>
      </mstyle>
    </mrow>
    <annotation encoding="application/x-tex">{\displaystyle {\tilde {\mathbf {M} }}}</annotation>
  </semantics>
</math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/8e71771306bd814a99b8329565c30f30212dc5e6" class="mwe-math-fallback-image-inline" aria-hidden="true" style="vertical-align: -0.338ex; width:2.537ex; height:2.676ex;" alt="{\tilde {\mathbf {M} }}"></span> is in a very useful sense the closest approximation to <i>M</i> that can be achieved by a matrix of rank&nbsp;<i>t</i>.
</p>

In [None]:
dimensionality_reduction_octagone(TruncatedSVD, 'TruncatedSVD')

* After 12 components accuracy decrease slowly and big variance at the end.

* we can badly recognize digit below 49 components.

* Relatively bad representation in 2D and 3D seems intermingling between digits.

# Gaussian Random Projection <a id="8"></a>

In mathematics and statistics, random projection is a technique used to reduce the dimensionality of a set of points which lie in Euclidean space. Random projection methods are known for their power, simplicity, and low error rates when compared to other methods. According to experimental results, random projection preserves distances well, but empirical results are sparse, They have been applied to many natural language tasks under the name random indexing.

<a href="https://en.wikipedia.org/wiki/Random_projection#Gaussian_random_projection">Gaussian_random_projection</a>

<p>The random matrix R can be generated using a Gaussian distribution. The first row is a random unit vector uniformly chosen from <span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle S^{d-1}}">
  <semantics>
    <mrow class="MJX-TeXAtom-ORD">
      <mstyle displaystyle="true" scriptlevel="0">
        <msup>
          <mi>S</mi>
          <mrow class="MJX-TeXAtom-ORD">
            <mi>d</mi>
            <mo>−<!-- − --></mo>
            <mn>1</mn>
          </mrow>
        </msup>
      </mstyle>
    </mrow>
    <annotation encoding="application/x-tex">{\displaystyle S^{d-1}}</annotation>
  </semantics>
</math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/c3e0d6d177d799d8dd1333f8da7dd117fc9e499f" class="mwe-math-fallback-image-inline" aria-hidden="true" style="vertical-align: -0.338ex; width:4.714ex; height:2.676ex;" alt="{\displaystyle S^{d-1}}"></span>. The second row is a random unit vector from the space orthogonal to the first row, the third row is a random unit vector from the space orthogonal to the first two rows, and so on. In this way of choosing R, R is an orthogonal matrix (the inverse of its transpose), and the following properties are satisfied:
</p>

<ul><li>Spherical symmetry: For any orthogonal matrix <span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle A\in O(d)}">
  <semantics>
    <mrow class="MJX-TeXAtom-ORD">
      <mstyle displaystyle="true" scriptlevel="0">
        <mi>A</mi>
        <mo>∈<!-- ∈ --></mo>
        <mi>O</mi>
        <mo stretchy="false">(</mo>
        <mi>d</mi>
        <mo stretchy="false">)</mo>
      </mstyle>
    </mrow>
    <annotation encoding="application/x-tex">{\displaystyle A\in O(d)}</annotation>
  </semantics>
</math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/025daba5089f54a521a6e10b7433081fda69502c" class="mwe-math-fallback-image-inline" aria-hidden="true" style="vertical-align: -0.838ex; width:9.382ex; height:2.843ex;" alt="{\displaystyle A\in O(d)}"></span>, RA and R have the same distribution.</li>
<li>Orthogonality: The rows of R are orthogonal to each other.</li>
<li>Normality: The rows of R are unit-length vectors.</li></ul>

In [None]:
dimensionality_reduction_octagone(GaussianRandomProjection, 'GaussianRandomProjection')

* Performe relativly bad, after 49 components accuracy decreace dramaticly.

* no waye to see digit reconstruction in sklearn lib.

* Relatively bad representation in 2D and 3D seems intermingling between digits.

# FastICA <a id="9"></a>

<a href="https://en.wikipedia.org/wiki/FastICA">FastICA</a> is an efficient and popular algorithm for independent component analysis invented by Aapo Hyvärinen at Helsinki University of Technology. Like most ICA algorithms, FastICA seeks an orthogonal rotation of prewhitened data, through a fixed-point iteration scheme, that maximizes a measure of non-Gaussianity of the rotated components. Non-gaussianity serves as a proxy for statistical independence, which is a very strong condition and requires infinite data to verify. FastICA can also be alternatively derived as an approximative Newton iteration.

In [None]:
dimensionality_reduction_octagone(FastICA, 'FastICA')

* Average separation in 2D and 3D.

* Hardly reconise digits after 49 componenets.

* Big accuracy drop at 392 but it went back up and after it decrease hardly, globaly bad performances.

# MiniBatch Dictionary Learning <a id="10"></a>

<a href="https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.MiniBatchDictionaryLearning.html">MiniBatch Dictionary Learning</a>

MiniBatchDictionaryLearning implements a faster, but less accurate
version of the dictionary learning algorithm that is better suited for large
datasets.</p>
<p>By default, MiniBatchDictionaryLearning divides the data into
mini-batches and optimizes in an online manner by cycling over the mini-batches
for the specified number of iterations. However, at the moment it does not
implement a stopping condition.</p>
<p>The estimator also implements partial_fit, which updates the dictionary by
iterating only once over a mini-batch. This can be used for online learning
when the data is not readily available from the start, or for when the data
does not fit into the memory.</p>

In [None]:
dimensionality_reduction_octagone(MiniBatchDictionaryLearning, 'MiniBatchDictionaryLearning')

* Flat representation 2D and 3D digits but still bad.

* No way to have digit reconstruction in sklearn lib.

* Big accuracy drop at 196 and it went back up good and after it decrease hardly, globaly bad performances.

# Sparse Random Projection <a id="11"></a>

<a href="https://scikit-learn.org/stable/modules/generated/sklearn.random_projection.SparseRandomProjection.html">Sparse Random Projection</a>
<p>Sparse random matrix is an alternative to dense random
projection matrix that guarantees similar embedding quality while being
much more memory efficient and allowing faster computation of the
projected data.</p>

<p>If we note s = 1 / density the components of the random matrix are
drawn from:</p>

<blockquote>
<div><ul class="simple">
<li><p>-sqrt(s) / sqrt(n_components)   with probability 1 / 2s</p></li>
<li><p>0                              with probability 1 - 1 / s</p></li>
<li><p>+sqrt(s) / sqrt(n_components)   with probability 1 / 2s</p></li>
</ul>
</div></blockquote>

In [None]:
dimensionality_reduction_octagone(SparseRandomProjection, 'SparseRandomProjection')

* Bad representation in 2D and 3D digits.

* No way to have digit reconstruction in sklearn lib.

* Big accuracy at the begining and then big decrease at 49 componenets and very bad performances at 3 and 2 componenets.

# Conclusion <a id="12"></a>

We can clearly distinct good approaches from bad even if they are all nearly same but there is some subtleties, if i have to classify from best to worst i do like this :

1. Isomap
2. truncatedSVD
3. Kernel PCA
4. PCA 
5. NMF
6. Mini Batch Dictionary Learning
7. FastICA
8. Gaissian Random Projection
9. Sparce Random Projection


It is true that they are tested in mnist data uniquely and with only one rigid neural network but i have beed tested them on xgb and lgbm and the result is nearly the same and mnist digits of are distinct enough from each other to evaluate an algorithm on them in my opinion.

What do you think ?

In my opinion we can extrapolate to fight and trust it to choose an algorithm in the future.

I hope you enjoyed it, please leave me an **upvote** I will greatly appreciate it.

<img src="https://i.postimg.cc/6QsZhjs3/Getty-Images-630157424.jpg" alt="Epic Fight">