# Getting Started: Multi-Task Decoding (Task 3)

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/MTNeuro/MTNeuro/blob/notebook_cleanup/notebooks/task3_getting_started.ipynb)

### We will use the 🔄 icon to indicate places that you can change.
This **MTNeuro** jupyter notebook takes you through how you can execute `task 3`. It takes in an encoder and computes R2 scores between embeddings and different Semantic features.

For more details on the tasks and dataset, please refer to our paper:

    "Quesada, J., Sathidevi, L., Liu, R., Ahad, N., Jackson, J.M., Azabou, M., ... & Dyer, E. L. (2022). MTNeuro: A Benchmark for Evaluating Representations of Brain Structure Across Multiple Levels of Abstraction. Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track."


#### Install the required packages

In [None]:
!pip install 'intern[cloudvolume]' scikit-learn timm pretrainedmodels efficientnet_pytorch segmentation-models-pytorch
!git clone https://github.com/MTNeuro/MTNeuro && cd MTNeuro && pip install .
%cd MTNeuro

#### Import the required packages

In [2]:
# Import libraries
import os
import sys
import json
import torch
import math
import numpy as np
from matplotlib import cm
import matplotlib.pyplot as plt
import argparse
import umap

# PyTorch imports
import torch
from torchvision import transforms
from torchvision.transforms import ToTensor
from torch.utils.data import Dataset, DataLoader
import torch.multiprocessing as mp
import torch.nn.functional as F

# Sci-kit learn imports
from sklearn.linear_model import LinearRegression
from sklearn.neural_network import MLPRegressor
from sklearn import preprocessing
from sklearn.decomposition import PCA

# MTNeuro modules
from MTNeuro.bossdbdataset import BossDBDataset
from MTNeuro.annots.features import extract_cell_stats, extract_axon_stats, extract_blood_stats
from MTNeuro.annots.get_cutouts import get_cutout_data
from MTNeuro.annots.latents import get_latents, get_unsup_latents

## Task Overview
![image.png](https://mtneuro.github.io/images/tasks.png)

#### Dataset Description and Task 3 Objectives

For Task 3, the "datasets" are actually the representations of the models we trained in Task 1! We will use a linear readout on the latent embeddings of the Task 1 models to predict properties of the image such as blood vessels density, cell count and size, axon density and average distance between cells. In order to extract these latents, we will be using the full four cubes specified in task 1 as the training data.

In this notebook, we will use the BYOL model [[1]](#1) as the encoder and load in the pretrained weights found here: [[Dropbox]](https://www.dropbox.com/sh/bhmkr6fphyxlils/AAAAcOBSWRoxowGzwu7Tp2LAa?dl=0).

<a id="1">[1]</a> Grill, J. B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., ... & Valko, M. (2020). Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33, 21271-21284.

## Loading the Data and Model

Like with the other tasks, we can load in all the parameters needed to access the data and build the model using the task 3 config file found [here](https://github.com/MTNeuro/MTNeuro/tree/main/MTNeuro/taskconfig).

The available encoder types are:
- `ssl`: Get the latents from a pretrained BYOL model
- `supervised`: Get the latents from a pretrained supervised ResNet model
- `PCA`: Get the latents using principal component analysis
- `NMF`: Get the latents using non-negative matrix factorization

In [3]:
root = "./MTNeuro/"

## Load the task3 config file
with open(os.path.join(root, f"taskconfig/task3.json")) as file:
    task_config = json.load(file)

encoder_type = task_config['encoder_type']
encoder_path = task_config['encoder_path']
print("We will be using the", encoder_type, "encoder type.")

We will be using the ssl encoder type.


#### Loading the Data

We will be using the `BossDBDataset` class to download the training images and their groundtruth annotations.

In [4]:
## Use transforms.ToTensor() if no other transforms are needed
transform = transforms.ToTensor()

## Since there is no testing for this task, we only load the train data
train_data = BossDBDataset(task_config, None, mode='train', image_transform=transform, mask_transform=transform)

In [5]:
## Get a copy of the input images and annotations
slices = np.copy(train_data.image_array)
annots = np.copy(train_data.mask_array)

print("We have", len(slices), "training images.")

We have 1440 training images.


#### Loading the Pretrained Weights

You can download the pretrained weights from the Dropbox link mentioned earlier, but we will use this bash script to automatically download the relevant model weights according to the encoder specified by the task config. The script takes in two arguments: the encoder type and the name of the file to download the weights into.

In [6]:
!bash notebooks/scripts/download_task3_weights.sh $encoder_type $encoder_path

Downloading pretrained weights for ssl encoder


#### Extracting image properties from groundtruth annotations

We can extract properties such as cell, axon, and blood stats using the methods provided by MTNeuro below. These extract methods return a pandas DataFrame where the first column is the image index and the rest are various statistics.

In [7]:
print('Extracting cell stats...')
stats_cell = extract_cell_stats(annots)

print('Extracting axon stats...')
stats_axon = extract_axon_stats(annots)

print('Extracting blood stats...')
stats_blood = extract_blood_stats(annots)

Extracting cell stats...
Extracting axon stats...
Extracting blood stats...


In [8]:
## Print stats for each artifact in the image
print("List of cell stats: ", list(stats_cell))
print("List of axon stats: ", list(stats_axon))
print("List of blood stats: ", list(stats_blood))

List of cell stats:  ['Image Number', 'Number of Cells', 'Avg Distance to NN', 'Avg Distance to 3rd NN', 'Avg Cell Size', 'Cell Pixel count']
List of axon stats:  ['Image Number', 'Percent of Pixels']
List of blood stats:  ['Image Number', 'Percent of Pixels']


#### Extract latent embeddings

We can extract the latents of self-supervised and supervised models using `get_latents` which takes in the input slices, the path to the pretrained weights, and a boolean value specifying whether the model is self-supervised or supervised. If we are using unsupervised methods, we can use the `get_unsup_latents` function which takes in the input slices and a boolean value for PCA or NMF.

In [9]:
## Extract latents from the encoder
if encoder_type == 'ssl' or encoder_type == 'supervised':
    embeddings = get_latents(slices, encoder_path, encoder_type=='ssl')
elif encoder_type == 'PCA' or encoder_type == 'NMF':
    embeddings = get_unsup_latents(slices, encoder_type=='PCA')

  "Argument 'interpolation' of type int is deprecated since 0.13 and will be removed in 0.15. "


#### Perform linear readout

We will use sklearn's `LinearRegression` to perform linear readout with the latent embeddings as the training data and the image properties (cell/axon/blood stats) as the labels

In [10]:
## Get linear readout scores
X = embeddings

## Predict the percent of blood vessels in the image by pixels
y = stats_blood['Percent of Pixels'].to_numpy()
reg = LinearRegression().fit(X, y)
blood_vsl_score = reg.score(X, y)
print(f"Blood Vessel Score: {blood_vsl_score:.4f}")

## Predict the number of cells in the input image
y = stats_cell['Number of Cells'].to_numpy()
reg = LinearRegression().fit(X, y)
numb_cell = reg.score(X, y)
print(f"Cell Count Score: {numb_cell:.4f}")

## Predict the average distance of a cell to its nearest neighbor
y = stats_cell['Avg Distance to NN'].to_numpy()
reg = LinearRegression().fit(X, y)
avg_dist_nn_cell = reg.score(X, y)
print(f"Avg Cell Distance Score: {avg_dist_nn_cell:.4f}")

## Predict the average cell size
y = stats_cell['Avg Cell Size'].to_numpy()
reg = LinearRegression().fit(X, y)
cell_size = reg.score(X, y)
print(f"Cell Size Score: {cell_size:.4f}")

## Predict the percent of axons in the image by pixels
y = stats_axon['Percent of Pixels']
reg = LinearRegression().fit(X,y)
axon_rslt = reg.score(X, y)
print(f"Axon Score: {axon_rslt:.4f}")

Blood Vessel Score: 0.8583
Cell Count Score: 0.7551
Avg Cell Distance Score: 0.5098
Cell Size Score: 0.7119
Axon Score: 0.9479
