In [1]:
import numpy as np
import os
import cv2
import matplotlib.pyplot as plt

import fiftyone.brain as fob
import fiftyone.zoo as foz
import fiftyone as fo 

fo.config.dataset_zoo_dir = "./fiftyone_data/"

In [20]:
# Load dataset from fifty one zoo
DATASETNAME = "mnist"
dataset = foz.load_zoo_dataset(DATASETNAME)
display(dataset.get_annotation_info)

Split 'train' already downloaded
Split 'test' already downloaded
Loading existing dataset 'mnist'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use


<bound method SampleCollection.get_annotation_info of Name:        mnist
Media type:  image
Num samples: 70000
Persistent:  False
Tags:        []
Sample fields:
    id:               fiftyone.core.fields.ObjectIdField
    filepath:         fiftyone.core.fields.StringField
    tags:             fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:         fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)
    created_at:       fiftyone.core.fields.DateTimeField
    last_modified_at: fiftyone.core.fields.DateTimeField
    ground_truth:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)>

In [21]:
# obtain the split of dataset we want to visualize using the fiftyone tags.
data_split = dataset.match_tags("test")
print(data_split)

Dataset:     mnist
Media type:  image
Num samples: 10000
Sample fields:
    id:               fiftyone.core.fields.ObjectIdField
    filepath:         fiftyone.core.fields.StringField
    tags:             fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:         fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)
    created_at:       fiftyone.core.fields.DateTimeField
    last_modified_at: fiftyone.core.fields.DateTimeField
    ground_truth:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)
View stages:
    1. MatchTags(tags=['test'], bool=True, all=False)


### Read in images from files in fiftyone zoo

Get data into a data matrix for use in computing the tSNE embeddings.

__NOTE:__ FiftyOne calls the original dataset ($X \in \mathbb{R}^{n \times d}$) ``embeddings`` , __not__ the learned, lower-dimensional embedding. They call the learned embedding ``points``. 

__NOTE:__ Also, in order for this to work, make sure you have run ``load_datasets.py`` so that datasets are prepared in ``./data`` and the individual image files are properly stored. 

In [13]:
X = np.array([
    cv2.imread(f, cv2.IMREAD_UNCHANGED).ravel() for f in data_split.values("filepath")
])

### Load in the learned embeddings (``points``) from LapTSNE code

Either need to now 
1. Compute these embeddings by leaving the notebook and using 
```
python run.py --config <YOUR CONFIG FILE>
```
and then loading the learned embedding here into the notebook:
```
Xlap = np.load("./results/<PATH TO LEARNED EMBEDDING>.npy")
```
2. Directly call the LapTSNE class here in the notebook:
```from run import run_experiment
Lap_TSNE = LaplacianTSNE(n_components=m, knn_graph=knn_graph, perplexity=perplexity, k_eigen=k_eigen, approx_nn=approx_nn, learning_rate=learning_rate)
Lap_TSNE._prep_graph(X)
Xlap, run_time = run_experiment(X, LapTSNE=Lap_TSNE, repulsion_kernel=repulsion_kernel, num_landmarks=num_landmarks, hat_bandwidth=hat_bandwidth)   
```
3. If want to visualize results already done, simply load the learned embedding as in the second half of option 1.
```
Xlap = np.load("./results/<PATH TO LEARNED EMBEDDING>.npy")
```


__NOTE:__ Can repeat this process for various learned LapTSNE embeddings, just repeat the process changing the ``brain_key`` string parameter to uniquely identify each learned embedding that you want to inspect. 

In [5]:
# Example-- load a learned embedding already saved to file
Xlap = np.load("./results/mnist_test/mnist_test_2_30.0_0_20_50_hat_100_0.5_0.05.npy")
print(Xlap.shape)

(10000, 2)


In [6]:
results_lap = fob.compute_visualization(data_split, embeddings=X, num_dims=2, method="manual", points=Xlap,
                                   brain_key="mnist_test_lap", verbose=True, seed=51)

#### Explanation of ``fob.compute_visualization``

The above function computes/prepares things for running the fiftyone app in the notebook. It requires:
* ``data_split`` : the fiftyone ``DatasetView`` object associated with the datapoints we've stored in ``X`` and learned embeddings ``Xlap``
* ``embeddings`` : the numpy ndarray of datapoints we refer to as ``X`` (original datapoints)
* ``method`` : for loading our embeddings, we need to specify "manual", otherwise it computes a default dimensionality reduction algorithm (e.g., UMAP, tSNE)
* ``points`` : the numpy ndarray of learned embeddings we refer to as ``Xlap``
* ``brain_key`` : the string identifier for this set of learned embeddings in the fiftyone app. 

__NOTE:__ You can do multiple calls of ``fob.compute_visualization`` for various learned embeddings connected to this ``data_split`` object. Then, when the fiftyone app is called, it will load all of the results connected to ``data_split``, each identified by this ``brain_key`` string value. 

Below, I've also included an example of running a UMAP visualization (implemented in FiftyOne codebase) for comparison. 

In [7]:
include_umap = True
if include_umap:
    results_umap = fob.compute_visualization(data_split, embeddings=X, num_dims=2, method="umap",
                                   brain_key="mnist_test_umap", verbose=True, seed=51)

Generating visualization...


  warn(


UMAP(n_jobs=1, random_state=51, verbose=True)
Tue Feb  4 11:49:40 2025 Construct fuzzy simplicial set
Tue Feb  4 11:49:40 2025 Finding Nearest Neighbors
Tue Feb  4 11:49:40 2025 Building RP forest with 10 trees
Tue Feb  4 11:49:42 2025 NN descent for 13 iterations
	 1  /  13
	 2  /  13
	 3  /  13
	 4  /  13
	Stopping threshold met -- exiting after 4 iterations
Tue Feb  4 11:49:46 2025 Finished Nearest Neighbor Search
Tue Feb  4 11:49:47 2025 Construct embedding


Epochs completed:   0%|            0/500 [00:00]

	completed  0  /  500 epochs
	completed  50  /  500 epochs
	completed  100  /  500 epochs
	completed  150  /  500 epochs
	completed  200  /  500 epochs
	completed  250  /  500 epochs
	completed  300  /  500 epochs
	completed  350  /  500 epochs
	completed  400  /  500 epochs
	completed  450  /  500 epochs
Tue Feb  4 11:49:53 2025 Finished embedding


# Launch FiftyOne App! 

Now, we launch the FiftyOne app dashboard here in the notebook to visualize the embeddings. You will initially see just the individual images, but to load the embeddings:
1. Click on the ``+`` to add a new tab (window), selecting ``Embeddings`` for this new tab. 
2. Press the button to split to two tabs (windows). Looks like two rectangles side-by-side
3. Over in the right hand window, select ``brain_key`` drop down menu to select the embedding results you wish to visualize. 

Now, some helpful hints on the visualization tools:
1. In the Embeddings window, click on ``Color by`` and select ``ground_truth.label`` and this will color all the embedding points in their respective class labels. 
2. You can select sets of points with the lasso tool to visualize in the ``Samples`` panel what the original images look like for the subset of embedded points you selected. 
3. Scroll to zoom in the Embeddings panel.


In [8]:
session = fo.launch_app(data_split)