# Experiment MNIST - TSNE

This experiment tries to replicate the reult obtained by the paper https://arxiv.org/pdf/1906.00722.pdf where a process of dimensionality reduction was applied on the mnist dataset, and values of 0.946 for Trustworthiness and 0.938 for continuity were obtained. 

## Basic imports

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import tensorflow as tf
import sys
import numpy as np
import pandas as pd
from pathlib import Path
print(sys.path)
sys.path.append("../../")

2022-10-09 19:10:07.317684: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-10-09 19:10:07.556284: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-10-09 19:10:07.556326: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-10-09 19:10:07.609235: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-10-09 19:10:08.890641: W tensorflow/stream_executor/pla

['/home/darlinne.soto/librep-hiaac/experiments/Topological_ae', '/usr/lib/python38.zip', '/usr/lib/python3.8', '/usr/lib/python3.8/lib-dynload', '', '/usr/local/lib/python3.8/dist-packages', '/usr/lib/python3/dist-packages']


## Loading the dataset

The dataset is loaded with Keras temporarily because thers is no access to a local mnist dataset. Further versions will use the dataset properly.

In [3]:
from librep.transforms import TSNE
from librep.transforms import UMAP
from librep.datasets.multimodal import TransformMultiModalDataset, ArrayMultiModalDataset, WindowedTransform
from librep.metrics.dimred_evaluator import DimensionalityReductionQualityReport, MultiDimensionalityReductionQualityReport
from librep.datasets.har.loaders import MNISTView

In [4]:
# loader = MNISTView("../../data/old-views/MNIST/default/", download=False)
# train_val_mnist, test_mnist = loader.load(concat_train_validation=True)

In [5]:
# train_val_mnist, test_mnist

In [6]:
# train_val_pd_X = train_val_mnist.data.iloc[:,1:]
# train_val_pd_Y = train_val_mnist.data.iloc[:,0]
# test_pd_X = test_mnist.data.iloc[:,1:]
# test_pd_Y = test_mnist.data.iloc[:,0]

In [7]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
assert x_train.shape == (60000, 28, 28)
assert x_test.shape == (10000, 28, 28)
assert y_train.shape == (60000,)
assert y_test.shape == (10000,)
train_val_pd_X = np.reshape(x_train, (-1, 28*28))
train_val_pd_Y = y_train
test_pd_X = np.reshape(x_test, (-1, 28*28))
test_pd_Y = y_test

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


In [8]:
test_pd_X.shape

(10000, 784)

In [9]:
# # Code to create new view for mnist
# columns = ['pixel-' + str(val) for val in range(784)]
# columns.insert(0, 'label')
# train_val_mnist.data.columns = columns
# train_val_mnist.data.to_csv('DATA_MNIST.csv', index=False)

# Reduce with TSNE

TSNE is applied to the train and test mnist datasets. TSNE was applied directly temporarily. The DimensionalityReductionQualityReport evaluates a set of datasets, where the first is the high-dimensional dataset and the second is the low-dimensional dataset. Further versions of DimensionalityReductionQualityReport will evaluate a set of low-dimensional datasets to plot metrics over dimensions.
The test for the 60000 datapoints was commented until a more powerful machine is available.

In [10]:
tsne_reducer = TSNE()
train_val_pd_X_reduced = tsne_reducer.fit_transform(train_val_pd_X)
test_pd_X_reduced = tsne_reducer.fit_transform(test_pd_X)



In [11]:
# train_x = np.array(train_val_mnist.data.iloc[:,1:])
# train_y = np.array(train_val_mnist.data.iloc[:,0])
# test_x = np.array(test_mnist.data.iloc[:,1:])
# test_y = np.array(test_mnist.data.iloc[:,0])

In [12]:
# mnist_dataset_train = ArrayMultiModalDataset(X=train_x, y=train_y, window_slices=[(0, 28*28)], 
#                                              window_names=["px"])
# mnist_dataset_test = ArrayMultiModalDataset(X=test_x, y=test_y, window_slices=[(0, 28*28)], 
#                                              window_names=["px"])

In [13]:
# transform_tsne = TSNE()
# transformer = TransformMultiModalDataset(transforms=[transform_tsne])
# train_applied_tsne = transformer(mnist_dataset_train)
# test_applied_tsne = transformer(mnist_dataset_test)

In [14]:
# metrics_reporter = DimensionalityReductionQualityReport()
# metrics_train_applied_tsne = metrics_reporter.evaluate([train_val_pd_X, train_val_pd_X_reduced])
# print(metrics_train_applied_tsne)

In [15]:
metrics_reporter = DimensionalityReductionQualityReport()
metrics_test_applied_tsne = metrics_reporter.evaluate([test_pd_X, test_pd_X_reduced])
print(metrics_test_applied_tsne)

{'residual variance (pearson)': 0.8257997496975751, 'residual variance (spearman)': 0.8549140049495447, 'trustworthiness': 0.9821331207797731, 'continuity': 0.9722094855483745, 'co k nearest neighbor size': 0.411459895989599, 'local continuity meta criterion': 0.4098595759255862, 'local property': 0.5234523452345234, 'global property': 0.6590193760049603}


# Reduce with UMAP

In [16]:
umap_reducer = UMAP()
train_val_pd_X_reduced = umap_reducer.fit_transform(train_val_pd_X)
test_pd_X_reduced = umap_reducer.fit_transform(test_pd_X)

In [17]:
# metrics_reporter = DimensionalityReductionQualityReport()
# metrics_train_applied_tsne = metrics_reporter.evaluate([train_val_pd_X, train_val_pd_X_reduced])
# print(metrics_train_applied_tsne)

In [18]:
metrics_reporter = DimensionalityReductionQualityReport()
metrics_test_applied_tsne = metrics_reporter.evaluate([test_pd_X, test_pd_X_reduced])
print(metrics_test_applied_tsne)

{'residual variance (pearson)': 0.8404193225039035, 'residual variance (spearman)': 0.8672144754557534, 'trustworthiness': 0.961414842651249, 'continuity': 0.9738931377110465, 'co k nearest neighbor size': 0.28272827282728275, 'local continuity meta criterion': 0.28112795276326996, 'local property': 0.4379871410880463, 'global property': 0.6596151426648349}


In [19]:
# transform_umap = UMAP()
# transformer = TransformMultiModalDataset(transforms=[transform_umap])
# train_applied_umap = transformer(mnist_dataset_train)
# test_applied_umap = transformer(mnist_dataset_test)

In [20]:
# metrics_reporter = DimensionalityReductionQualityReport(sampling_threshold=60000)
# metrics_train_applied_umap = metrics_reporter.evaluate([mnist_dataset_train, train_applied_umap])
# print(metrics_train_applied_umap)

In [21]:
# metrics_reporter = DimensionalityReductionQualityReport(sampling_threshold=10000)
# metrics_test_applied_umap = metrics_reporter.evaluate([mnist_dataset_test, test_applied_umap])
# print(metrics_test_applied_umap)