# Experiment MNIST - TSNE

This experiment tries to replicate the reult obtained by the paper https://arxiv.org/pdf/1906.00722.pdf where a process of dimensionality reduction was applied on the mnist dataset, and values of 0.946 for Trustworthiness and 0.938 for continuity were obtained. 

## Basic imports

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import tensorflow as tf
import sys
import numpy as np
import pandas as pd
from pathlib import Path
sys.path.append("../../")

2022-10-18 08:55:19.388493: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-10-18 08:55:19.388512: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


## Loading the dataset

The dataset is loaded with Keras temporarily because thers is no access to a local mnist dataset. Further versions will use the dataset properly.

In [3]:
from TopoAEMetrics import MeasureCalculator
from librep.transforms import TSNE
from librep.transforms import UMAP
from librep.datasets.multimodal import TransformMultiModalDataset, ArrayMultiModalDataset, WindowedTransform
from librep.metrics.dimred_evaluator import DimensionalityReductionQualityReport, MultiDimensionalityReductionQualityReport
from librep.datasets.har.loaders import MNISTView

In [4]:
# loader = MNISTView("../../data/old-views/MNIST/default/", download=False)
# train_val_mnist, test_mnist = loader.load(concat_train_validation=True)

In [5]:
# train_val_mnist, test_mnist

In [6]:
# train_val_pd_X = train_val_mnist.data.iloc[:,1:]
# train_val_pd_Y = train_val_mnist.data.iloc[:,0]
# test_pd_X = test_mnist.data.iloc[:,1:]
# test_pd_Y = test_mnist.data.iloc[:,0]

In [7]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
assert x_train.shape == (60000, 28, 28)
assert x_test.shape == (10000, 28, 28)
assert y_train.shape == (60000,)
assert y_test.shape == (10000,)
train_val_pd_X = np.reshape(x_train, (-1, 28*28))
train_val_pd_Y = y_train
test_pd_X = np.reshape(x_test, (-1, 28*28))
test_pd_Y = y_test

In [8]:
test_pd_X.shape

(10000, 784)

In [9]:
# # Code to create new view for mnist
# columns = ['pixel-' + str(val) for val in range(784)]
# columns.insert(0, 'label')
# train_val_mnist.data.columns = columns
# train_val_mnist.data.to_csv('DATA_MNIST.csv', index=False)

# Reduce with TSNE

TSNE is applied to the train and test mnist datasets. TSNE was applied directly temporarily. The DimensionalityReductionQualityReport evaluates a set of datasets, where the first is the high-dimensional dataset and the second is the low-dimensional dataset. Further versions of DimensionalityReductionQualityReport will evaluate a set of low-dimensional datasets to plot metrics over dimensions.
The test for the 60000 datapoints was commented until a more powerful machine is available.

In [10]:
tsne_reducer = TSNE()
train_val_pd_X_reduced = tsne_reducer.fit_transform(train_val_pd_X)
test_pd_X_reduced = tsne_reducer.fit_transform(test_pd_X)



KeyboardInterrupt: 

In [None]:
# train_x = np.array(train_val_mnist.data.iloc[:,1:])
# train_y = np.array(train_val_mnist.data.iloc[:,0])
# test_x = np.array(test_mnist.data.iloc[:,1:])
# test_y = np.array(test_mnist.data.iloc[:,0])

In [None]:
# mnist_dataset_train = ArrayMultiModalDataset(X=train_x, y=train_y, window_slices=[(0, 28*28)], 
#                                              window_names=["px"])
# mnist_dataset_test = ArrayMultiModalDataset(X=test_x, y=test_y, window_slices=[(0, 28*28)], 
#                                              window_names=["px"])

In [None]:
# transform_tsne = TSNE()
# transformer = TransformMultiModalDataset(transforms=[transform_tsne])
# train_applied_tsne = transformer(mnist_dataset_train)
# test_applied_tsne = transformer(mnist_dataset_test)

In [None]:
# metrics_reporter = DimensionalityReductionQualityReport()
# metrics_train_applied_tsne = metrics_reporter.evaluate([train_val_pd_X, train_val_pd_X_reduced])
# print(metrics_train_applied_tsne)

In [None]:
# metrics_reporter = DimensionalityReductionQualityReport()
# metrics_test_applied_tsne = metrics_reporter.evaluate([test_pd_X, test_pd_X_reduced])
# print(metrics_test_applied_tsne)

In [None]:
mcalculator = MeasureCalculator(train_val_pd_X, train_val_pd_X_reduced, 15)
mcalculator.trustworthiness(15)

In [None]:
mcalculator = MeasureCalculator(test_pd_X, test_pd_X_reduced, 15)
mcalculator.trustworthiness(15)

# Reduce with UMAP

In [None]:
umap_reducer = UMAP()
train_val_pd_X_reduced = umap_reducer.fit_transform(train_val_pd_X)
test_pd_X_reduced = umap_reducer.fit_transform(test_pd_X)

In [None]:
# metrics_reporter = DimensionalityReductionQualityReport()
# metrics_train_applied_tsne = metrics_reporter.evaluate([train_val_pd_X, train_val_pd_X_reduced])
# print(metrics_train_applied_tsne)

In [None]:
# metrics_reporter = DimensionalityReductionQualityReport()
# metrics_test_applied_tsne = metrics_reporter.evaluate([test_pd_X, test_pd_X_reduced])
# print(metrics_test_applied_tsne)

In [None]:
mcalculator = MeasureCalculator(train_val_pd_X, train_val_pd_X_reduced, 15)
mcalculator.trustworthiness(15)

In [None]:
mcalculator = MeasureCalculator(test_pd_X, test_pd_X_reduced, 15)
mcalculator.trustworthiness(15)

In [None]:
# transform_umap = UMAP()
# transformer = TransformMultiModalDataset(transforms=[transform_umap])
# train_applied_umap = transformer(mnist_dataset_train)
# test_applied_umap = transformer(mnist_dataset_test)

In [None]:
# metrics_reporter = DimensionalityReductionQualityReport(sampling_threshold=60000)
# metrics_train_applied_umap = metrics_reporter.evaluate([mnist_dataset_train, train_applied_umap])
# print(metrics_train_applied_umap)

In [None]:
# metrics_reporter = DimensionalityReductionQualityReport(sampling_threshold=10000)
# metrics_test_applied_umap = metrics_reporter.evaluate([mnist_dataset_test, test_applied_umap])
# print(metrics_test_applied_umap)