# Running cytoself in Colab
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/royerlab/cytoself/blob/main/examples/simple_example.ipynb)
## Introduction
This jupyter notebook shows a simple example of how to use *cytoself* with a few example images and pre-trained model.
The pre-trained model was trained with the image shape of (100, 100, 2) in which the channel consists of fluorescence
protein and nucleus distance.

## Example demo
Let's get started with a simple example.
It is highly recommended to use GPU if you can.

Note: In case an error occurs, which is observed occasionally, please be patient and try to run the cell again.
If the error persists, please try to restart the runtime.

First, install dependencies and configure tensorflow version to 1.15.2.

In [None]:
# Set tensorflow to 1.15.2
%tensorflow_version 1.x

# Install requirement packages
!pip install git+https://github.com/royerlab/cytoself.git
!pip uninstall -y matplotlib
!pip install matplotlib==3.1.3  # for saving png
!pip uninstall -y h5py
!pip install h5py==2.10.0  # for loading pre-trained model
# !pip install git+https://github.com/royerlab/cytoself

(Optional) You can use wandb to monitor your computation resources.
Please note that this requires account sign up in wandb.

In [None]:
# !pip install wandb
# import wandb
# wandb.init()

Import dependencies

In [None]:
import numpy as np
import pandas as pd
import gdown
from cytoself.data_loader.data_manager import DataManager
from cytoself.models import CytoselfFullModel
from cytoself.data_loader.data_generator import image_and_label_generator
from cytoself.analysis.analytics import Analytics

Let's download example data and trained model weights.
Here we only download a fraction of test data that was used in our preprint for the sake of time and space.

In [None]:
# Download model weights (full model)
gdown.download(
    'https://drive.google.com/uc?id=1gkiEMKdadOel4Xh6KoS2U603JTkZhgDw',
    'pretrained_model.h5',
    quiet=True
)
# Download label data
gdown.download(
    'https://drive.google.com/uc?id=16-0bhKrUMbZ0DSz768Z_q13yNivHyfVO',
    'example_label.npy',
    quiet=True
)
# Download image data
gdown.download(
    'https://drive.google.com/uc?id=1znRLbYJJqd11Zqv-5_yUmNjarKcwIWMg',
    'example_image.npy',
    quiet=True
)
# Download localization table
gdown.download(
    'https://drive.google.com/uc?id=1RM654Qavcy8gG5uy3mCzi8EsOT_xOlVd',
    'protein_uniloc.csv',
    quiet=True
)

# Download dendrogram index to plot feature spectrum
gdown.download(
    'https://drive.google.com/uc?id=1WrxhGsSzivZVAlL_K2FLVsRmHrsfhyrI',
    'dgram_index1.npy',
    quiet=True
)

Let's load example data. The image and label data consist of proteins with unique localization.
The localization table indicates the unique localization for each protein.

In [None]:
image_data = np.load('example_image.npy')
label_data = np.load('example_label.npy', allow_pickle=True)
gt_table = pd.read_csv('protein_uniloc.csv')

# The image data has 3 channels which are protein label, nucleus and nucleus distance.
# In this example we only use protein label and nucleus distance channels.
image_data = image_data[:, ..., [0, 2]]

# Make sure that the label data has 2 dimensions.
label_data = label_data.reshape(-1, 1)

Let's create a cytoself model object.
This is an object wrapping Tensorflow Model object and some convenient auxiliary functions
(e.g. extracting latent representations).

In [None]:
model = CytoselfFullModel(input_image_shape=[100, 100, 2], num_fc_output_classes=len(np.unique(label_data)))

Consolidate datasets with DataManager. In this example we use the same data for train, validation and test data.
Then compile the model.

Note: The data split here is only to provide an example of how to run cytoself. Please make sure the data is
split properly when you train your read data.


In [None]:
data_manager = DataManager(
        train_data=image_data[:100],
        val_data=image_data[100:200],
        test_data=image_data[200:],
        train_label=label_data[:100],
        val_label=label_data[100:200],
        test_label=label_data[200:],
)

# Compile the model with data_manager
model.compile_with_datamanager(data_manager)

Now we are ready to start training cytoself.
`train_with_datamanager` is a convenient method with very few arguments but there are other methods with more flexibility.
Please check the codebase.

You can change batch size depending on the available GPU memory.
Training histories will be displayed after training. (Make sure the version of matplotlib==3.1.3.)

In [None]:
model.train_with_datamanager(data_manager, batch_size=64, max_epoch=1)

Training is done, but that was only for demonstration purpose.
We can load a pre-trained model to pretend we have a well-trained the model if all parameters are in the default setting.
(Make sure the version of h5py==2.10.0)

In [None]:
model.load_model('pretrained_model.h5')

We use `Analytics` module to perform various analysis.
For example, we use `Analytics` to compute the UMAP of global representation.

In [None]:
analytics = Analytics(model, data_manager, gt_table)
analytics.calc_plot_umaps_gt("vec", titles="Unique localization")

Now let's compute a feature spectrum from local representation.
We first compute a cluster map against vq index.
The cluster map combines similar vq indices so that we obtain an order of vq index where similar features come close to each other.
We will use this ordering to plot feature spectrum.

In [None]:
analytics.plot_clustermaps()

The index order of the dendrogram in the clustermap can be saved and reload so that you don't need to compute the cluster map everytime you want to get feature spectra.
Here we load a pre-computed dendrogram index that was computed with all test data.

In [None]:
analytics.load_dendrogram_index('dgram_index1.npy')

Let's plot a feature spectrum.

In [None]:
analytics.plot_feature_spectrum_from_image(image_data[:1])

Now, we have walked through the basics about how to get
localization clustering and feature spectrum using example data.
Most of the intermediate computation results can be saved so that you can start downstream analysis from the middle.
Please check the codebase for what arguments are available for the moment.
An API guide will available in the future.