# Semantic Segmentation with convpaint and DINOv2

This notebooks demonstrates how to run a semantic segmentation on an image using DINOv2 for feature extraction and a random forest algorithm for classification. It is based on the notebook provided by convpaint and runs independently from napari.


## Imports

In [1]:
%load_ext autoreload
%autoreload 2

# import napari and its screenshot function
import napari
from napari.utils.notebook_display import nbscreenshot

# import what we need from conv_paint
from napari_convpaint.conv_paint import ConvPaintWidget
# from napari_convpaint.conv_paint_utils import Hookmodel
from napari_convpaint.convpaint_sample import create_annotation_cell3d
from napari_convpaint.conv_paint_utils import (filter_image_multioutputs, get_features_current_layers,
get_multiscale_features, train_classifier, predict_image)
from napari_convpaint.conv_paint_utils import extract_annotated_pixels
 
# import the other general modules used
import numpy as np
import skimage
# import tifffile
from matplotlib import pyplot as plt

# import pytorch and pillow Image
import torch
from torchvision.transforms import Compose, Resize, ToTensor, Normalize
# from PIL import Image


## Load data

First, we load an image and the corresponding annotation. Both are cropped to be 128x128.

In [2]:
# Load 3D image with 2 channels (cell borders and nuclei)
image_original = skimage.data.cells3d()
# Take a layer in middle of cell (30 of 0-59) and take 2nd channel (nuclei)
image_original = image_original[30, 1]
# Load annotation defined in conv_paint
labels_original = create_annotation_cell3d()[0][0]

# Take crops of image and annotation
crop = ((60,188), (0,128))
image_original = image_original[crop[0][0]:crop[0][1], crop[1][0]:crop[1][1]]
labels_original = labels_original[crop[0][0]:crop[0][1], crop[1][0]:crop[1][1]]
image_original_shape = image_original.shape

# The original image 'cells3d' is 128x128 pixels
# print(original_im_shape)
# The number of annotated pixels is 327
# print(sum(origina§l_labels[original_labels>0]))

Show the image and annotation as layers in napari.

In [3]:
# create a napari viewer; add the image to it; add the labels/annotation
viewer = napari.Viewer()
viewer.add_image(image_original)
viewer.add_labels(labels_original)

<Labels layer 'labels_original' at 0x1677f6ad2e0>

We can also show a napari screenshot.

In [4]:
# show a screenshot of the napari viewer here in the notebook
# nbscreenshot(viewer)

In [5]:
#tifffile.imwrite('label_cell3d.tiff', viewer.layers['Labels'].data)

## Create model

DINOv2 comes in 4 different versions, each increasing in training set size and power. Choose the desired model by assigning it to 'model'.

In [6]:
# model = torch.hub.load('facebookresearch/dinov2', 'dinov2_vits14')
# model = torch.hub.load('facebookresearch/dinov2', 'dinov2_vitb14')
model = torch.hub.load('facebookresearch/dinov2', 'dinov2_vitl14')
# model = torch.hub.load('facebookresearch/dinov2', 'dinov2_vitg14')

Using cache found in C:\Users\roman/.cache\torch\hub\facebookresearch_dinov2_main


Define the model parameters. Note that the patch size that DINOv2 uses is 14x14. Therefore, the number of patches is the input shape divided by 14.

In [7]:
# Store the patch size and input shape as global constants
PATCH_SIZE = (14, 14)
IN_SHAPE = (224, 224)
# Calculate the shape of the patched image (i.e. how many patches fit in the image)
if not (IN_SHAPE[0]%PATCH_SIZE[0] == 0 and IN_SHAPE[1]%PATCH_SIZE[1] == 0):
    raise ValueError('Input shape must be divisible by patch size')
else:
    patched_image_shape = (int(IN_SHAPE[0]/PATCH_SIZE[0]), int(IN_SHAPE[1]/PATCH_SIZE[1]))

## Convert & preprocess image

Resize the image to the input shape (which has to be a multiple of the patch size 14).

In [8]:
# Scale original image to input size of the model
image_scaled = skimage.transform.resize(image_original, IN_SHAPE, mode='edge', order=1, preserve_range=True)
labels_scaled = skimage.transform.resize(labels_original, IN_SHAPE, mode='edge', order=0, preserve_range=True)

# The number of annotated pixels in the resized annotation is 994
# print(sum(labels_scaled[labels_scaled>0]))

Show the scaled version of the image and labels to ensure that it still looks good

In [9]:
viewer = napari.Viewer()
viewer.add_image(image_scaled)
viewer.add_labels(labels_scaled)

<Labels layer 'labels_scaled' at 0x16774c2a4c0>

Convert the image to RGB. Then preprocess it into a torch tensor, normalized according to distribution expected by the model.

In [10]:
# Convert to RGB
image_rgb = np.stack((image_scaled,)*3, axis=-1)
napari.imshow(image_scaled)

# New shape is (224, 224, 3)
# print(image_rgb.shape)

# Preprocess image
preprocess = Compose([
    #Resize((224, 224)),  # Resize to the input size expected by the model
    ToTensor(),  # Convert to PyTorch tensor
    Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),  # Normalize to the input distribution expected by the model
])
# TODO: Normalization with [0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] OK?

image_pre = preprocess(image_rgb).float()

## Feature extraction

Now that the model is defined, we can run an image through it and extract features from it.

In [11]:
# Add an extra batch dimension 
image_batch = image_pre.unsqueeze(0)

# Move image to the GPU if available
# device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# image_batch = image_batch.to(device)
# model = model.to(device)

# Pass image through the model (assuming images is a batch of test images)
with torch.no_grad():
    # features=model.forward_features(torch.nn.functional.interpolate(images,(448,448)))['x_norm_patchtokens']
    features=model.forward_features(image_batch)['x_norm_patchtokens']

Rearrange and reshape dimensions of the DINOv2 output.

In [12]:
# Read out the number of features, which is dependent on the model chosen (e.g. small = 384; large = 1024)
NUM_FEATURES = features.shape[2]

# The output shape is [batch_size, num_patches, features] = [1, 256, NUM_FEATURES]
# print(features.shape)

# Rearrange dimensions of the feature tensor to [batch_size, features, num_patches] = [1, NUM_FEATURES, 256]
features_perm = features.permute(0, 2, 1)
# print(features.shape)

# Reshape linear patches (256) into 2D: [batch_size, features, patches_w, patches_h] = [1, NUM_FEATURES, 16, 16]
features_wh = features_perm.reshape(1, NUM_FEATURES, patched_image_shape[0], patched_image_shape[1])
# print(features_wh.shape)

# Upsample to original image size, i.e. [batch_size, features, image_w, image_h] = [1, NUM_FEATURES, 128, 128] or [1, NUM_FEATURES, 224, 224]
# scaling_factor = (14 * image_original.shape[0] / 224, 14 * image_original.shape[1] / 224)

# Upsample to the size of the scaled image (i.e. interpolate with scaling factor = patch_size = 14)
features_int = torch.nn.functional.interpolate(features_wh, scale_factor=PATCH_SIZE)
# print(features_int.shape)

# Convert to numpy array and remove batch dimension to get [features, image_w, image_h] = [NUM_FEATURES, 128, 128] or [NUM_FEATURES, 224, 224]
features_np = features_int.numpy()
features_np = np.squeeze(features_np, axis=0)
# print(features_np.shape)


Show feature space in napari.

In [20]:

viewer = napari.Viewer()
# add the loaded image to it
viewer.add_image(image_scaled)
# add the loaded labels/annotation
viewer.add_labels(labels_scaled)
# add the feature space
viewer.add_image(features_np)

<Image layer 'features_np' at 0x1c7fb37aee0>

Extract features and target values (labels) where image is annotated.

In [13]:
features_annot, targets = extract_annotated_pixels(features_np, labels_scaled)
# features.shape = (646, NUM_FEATURES)
# targets.shape = (646,)
print(features_annot.shape)
print(targets.shape)
# # NOTE: in convpaint, we had
# features.shape = (218, 640)
# targets.shape = (218,)
# And the number of annotated pixels was 327

(646, 1024)
(646,)


## Train and use Classifier
Finally we can train a classifier:

In [22]:
random_forest = train_classifier(features_annot, targets)

And do a prediction.

In [23]:
# Convert features to numpy array
all_features = features.numpy()
# Remove the batch dimension
all_features_lin = all_features.squeeze(0)

# Run predict of random forest on all features
predictions = random_forest.predict(all_features_lin)

# We have 256 predictions, which corresponds to the 256 patches (16x16 in the image)
# print(predictions.shape)

Reshape and resize the predictions so we can show and overlay them on the image.

In [None]:
# Reshape the predictions to the shape of the image of patches
predicted_image = predictions.reshape(patched_image_shape[0], patched_image_shape[1])
# Resize to the size of the scaled input image
predicted_image = skimage.transform.resize(predicted_image, IN_SHAPE, mode='edge', order=0, anti_aliasing=False)
# Transform interpolated values to integer values
predicted_image = predicted_image.astype(np.uint8)

## Visualize Results
And finally we can visualize the output (and quantify its quality):

In [24]:
viewer = napari.Viewer()
# add the loaded image to it
viewer.add_image(image_scaled)
# add the loaded labels/annotation
viewer.add_labels(labels_scaled)
# add the prediction
viewer.add_labels(predicted_image)

<Labels layer 'predicted_image' at 0x1c80b69cbe0>

In [None]:
# nbscreenshot(viewer)

## Tests Roman

In [None]:
# CREATE AND SHOW FULL OUTPUT OF A LAYER OF VGG16 (= 64 FEATURES)
def get_layer_features(image, layer, show_napari = False, interpolate = False):
        
    model = Hookmodel(model_name='vgg16')


    all_layers = [key for key in model.module_dict.keys()]
    # Choose just 1 layer, and register a hook there
    if isinstance(layer, str):
        layers = [layer]
    elif isinstance(layer, int):
        layers = [all_layers[layer]]
    
    # layers = ['features.30 MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) avgpool AdaptiveAvgPool2d(output_size=(7, 7))']
    model.register_hooks(selected_layers=layers)

    # Get features using only this first layer and without scaling
    features, targets = get_features_current_layers(
        model=model, image=image, annotations=image, scalings=[1], use_min_features=False, order=interpolate)

    # Convert the DataFrame to a numpy array
    features_array = features.values
    # Get the shape of the image
    image_shape = image.shape
    # Reshape the features array to match the image shape and add the second dimension of features as the third dimension
    features_image = features_array.reshape(*image_shape, -1)

    # Move the last dimension to the first position
    features_image = np.moveaxis(features_image, -1, 0)
    # print(features.shape)
    # print(features_image.shape)

    # Now you can view the new_features using napari
    if show_napari: napari.view_image(features_image)
    return features_image

In [None]:
# RUN

# image = image.T

# Get features of multiple (all) layers
conv_layers = [0,2]#,5,7,10,12,14,17,19,21,24,26,28]
all_conv = [get_layer_features(image, l) for l in conv_layers]


### Pad first dimension of the layers with fewer features and concatenate all layers into a 4D Image

# Get the shapes of all outputs
shapes = [output.shape for output in all_conv]
# Find the maximum shape in each dimension
max_shape = np.max(shapes, axis=0)
# Pad all outputs to have the max shape
from numpy.lib import pad
all_conv_padded = np.array([pad(output, [(0, max_dim - dim) for dim, max_dim in zip(output.shape, max_shape)]) for output in all_conv])

# Show in Napari
napari.view_image(all_conv_padded)