# Semantic Segmentation with convpaint and DINOv2

This notebooks demonstrates how to run a semantic segmentation on an image using DINOv2 for feature extraction and a random forest algorithm for classification. It is based on the notebook provided by convpaint and runs independently from napari.


## Imports

In [1]:
%load_ext autoreload
%autoreload 2

# import napari and its screenshot function
import napari
from napari.utils.notebook_display import nbscreenshot

# import what we need from conv_paint
from napari_convpaint.conv_paint import ConvPaintWidget
from napari_convpaint.conv_paint_utils import Hookmodel
from napari_convpaint.convpaint_sample import create_annotation_cell3d
from napari_convpaint.conv_paint_utils import (filter_image_multioutputs, get_features_current_layers,
get_multiscale_features, train_classifier, predict_image)
from napari_convpaint.conv_paint_utils import extract_annotated_pixels
 
# import the other general modules used
import numpy as np
import skimage
import tifffile

# import pytorch
import torch
from torchvision.transforms import Compose, Resize, ToTensor, Normalize

# import pillow.image
from PIL import Image



## Load data

First, we load an image and the corresponding annotation. Both are cropped to be 128x128.

In [9]:
# Load 3D image with 2 channels (cell borders and nuclei)
original_image = skimage.data.cells3d()
# Take a layer in middle of cell (30 of 0-59) and take 2nd channel (nuclei)
original_image = original_image[30,1]
# Load annotation defined in conv_paint
original_labels = create_annotation_cell3d()[0][0]

# Take crops of image and annotation
crop = ((60,188), (0,128))
original_image = original_image[crop[0][0]:crop[0][1], crop[1][0]:crop[1][1]]
original_labels = original_labels[crop[0][0]:crop[0][1], crop[1][0]:crop[1][1]]
original_im_shape = original_image.shape
# The original image 'cells3d' is 128x128 pixels
# print(original_im_shape)
# The number of annotated pixels is 327
# print(sum(originaÂ§l_labels[original_labels>0]))

(128, 128)
327


Show the image and annotation as layers in napari.

In [10]:
# create a napari viewer
viewer = napari.Viewer()
# add the loaded image to it
viewer.add_image(original_image)
# add the loaded labels/annotation
viewer.add_labels(original_labels)

<Labels layer 'original_labels' at 0x12eea1494c0>

Instead we can show a napari screenshot.

In [None]:
# show a screenshot of the napari viewer here in the notebook
# nbscreenshot(viewer)

In [None]:
#tifffile.imwrite('label_cell3d.tiff', viewer.layers['Labels'].data)

## Create model

DINOv2 comes in 4 different versions, each increasing in training set size and power.

In [38]:
# model = Hookmodel(model_name='vgg16')

# dinov2_vits14 = torch.hub.load('facebookresearch/dinov2', 'dinov2_vits14')
# dinov2_vitb14 = torch.hub.load('facebookresearch/dinov2', 'dinov2_vitb14')
dinov2_vitl14 = torch.hub.load('facebookresearch/dinov2', 'dinov2_vitl14')
# dinov2_vitg14 = torch.hub.load('facebookresearch/dinov2', 'dinov2_vitg14')

# Choose the model to use
model = dinov2_vits14

Using cache found in C:\Users\roman/.cache\torch\hub\facebookresearch_dinov2_main


## Convert & preprocess image

We convert the image to a pillow format RGB image. Then it is preprocessed into a torch tensor.

In [39]:
in_shape = (224, 224)
scaling_factor = (original_im_shape[0] / in_shape[0], original_im_shape[1] / in_shape[1])
image_scaled = skimage.transform.resize(original_image, in_shape, mode='edge', order=1, anti_aliasing=True)
labels_scaled = skimage.transform.resize(original_labels, in_shape, mode='edge', order=0, anti_aliasing=False)

# create a napari viewer
viewer = napari.Viewer()
# add the loaded image to it
viewer.add_image(image_scaled)
# add the loaded labels/annotation
viewer.add_labels(labels_scaled)

# Convert to pillow image and make RGB
image_pillow = Image.fromarray(image_scaled)
image_pillow = image_pillow.convert("RGB")

# Preprocess image
preprocess = Compose([
    # Resize((224, 224)),  # Resize to the input size expected by the model
    ToTensor(),  # Convert to PyTorch tensor
    Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),  # Normalize to ImageNet mean and std
])
image_pre = preprocess(image_pillow)

## Feature extraction

Now that the model is defined, we can run an image through it and extract features from it.

In [40]:
# Add an extra batch dimension and move image to the GPU if available
images = image_pre.unsqueeze(0)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
images = images.to(device)
model = model.to(device)

# Pass image through the model (assuming images is a batch of test images)
with torch.no_grad():
    # features=model.forward_features(torch.nn.functional.interpolate(images,(448,448)))['x_norm_patchtokens']
    features=model.forward_features(images)['x_norm_patchtokens']

Rearrange and reshape dimensions of the DINOv2 output. Note that the patch size that DINOv2 uses is 14x14. Therefore, the number of patches is the input shape divided by 14.

In [41]:
# The output shape is [batch_size, patches, features] = [1, 256, 384]
# print(features.shape)

# Rearrange dimensions of the feature tensor to [batch_size, features, patches] = [1, 384, 256]
features_perm = features.permute(0,2,1)
# print(features.shape)

# Reshape linear patches (256) into width and height --> [1, 384, 16, 16]
features_wh = features_perm.reshape(1,384,16,16)
print(features_wh.shape)

# Upsample to original image size, i.e. [batch_size, features, image_w, image_h] = [1, 384, 128, 128] or [1, 384, 224, 224]
# fact = (14 * original_im_shape[0] / 224, 14 * original_im_shape[1] / 224)
fact = (14, 14)
features_int = torch.nn.functional.interpolate(features_wh, scale_factor=fact)
print(features_int.shape)

# Convert to numpy array and remove batch dimension to get [features, image_w, image_h] = [384, 128, 128] or [1, 384, 224, 224]
features_np = features_int.numpy()
features_np = np.squeeze(features_np, axis=0)
print(features_np.shape)


torch.Size([1, 384, 16, 16])
torch.Size([1, 384, 224, 224])
(384, 224, 224)


Show feature space in napari.

In [None]:

viewer = napari.Viewer()
# add the loaded image to it
viewer.add_image(image_scaled)
# add the loaded labels/annotation
viewer.add_labels(labels_scaled)
# add the feature space
viewer.add_image(features_np)

Extract features and target values (labels) where image is annotated.

In [42]:
features_annot, targets = extract_annotated_pixels(features_np, labels_scaled)
# features.shape = (646, 384)
# targets.shape = (646,)
print(features_annot.shape)
print(targets.shape)
# # NOTE: in convpaint, we had
# features.shape = (218, 640)
# targets.shape = (218,)
# And the number of annotated pixels was 327

(646, 384)
(646,)


## Train and use Classifier
Finally we can train a classifier:

In [43]:
random_forest = train_classifier(features_annot, targets)

And do a prediction.

In [44]:
all_features = features.numpy()
all_features_lin = all_features.squeeze(0)

predictions = random_forest.predict(all_features_lin)

# We have 256 predictions, which corresponds to the 256 patches
print(predictions.shape)

predicted_image = predictions.reshape(16,16)#in_shape[0],in_shape[1])
predicted_image = skimage.transform.resize(predicted_image, in_shape, mode='edge', order=0, anti_aliasing=False)
predicted_image = predicted_image.astype(np.uint8)

(256,)


## Visualize Results
And finally we can visualize the output (and quantify its quality):

In [46]:
viewer = napari.Viewer()
# add the loaded image to it
viewer.add_image(image_scaled)
# add the loaded labels/annotation
viewer.add_labels(labels_scaled)
# add the prediction
viewer.add_labels(predicted_image)

<Labels layer 'predicted_image' at 0x12f61b51b80>

In [None]:
# nbscreenshot(viewer)

## Tests Roman

In [None]:
# CREATE AND SHOW FULL OUTPUT OF A LAYER OF VGG16 (= 64 FEATURES)
def get_layer_features(image, layer, show_napari = False, interpolate = False):
        
    model = Hookmodel(model_name='vgg16')


    all_layers = [key for key in model.module_dict.keys()]
    # Choose just 1 layer, and register a hook there
    if isinstance(layer, str):
        layers = [layer]
    elif isinstance(layer, int):
        layers = [all_layers[layer]]
    
    # layers = ['features.30 MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) avgpool AdaptiveAvgPool2d(output_size=(7, 7))']
    model.register_hooks(selected_layers=layers)

    # Get features using only this first layer and without scaling
    features, targets = get_features_current_layers(
        model=model, image=image, annotations=image, scalings=[1], use_min_features=False, order=interpolate)

    # Convert the DataFrame to a numpy array
    features_array = features.values
    # Get the shape of the image
    image_shape = image.shape
    # Reshape the features array to match the image shape and add the second dimension of features as the third dimension
    features_image = features_array.reshape(*image_shape, -1)

    # Move the last dimension to the first position
    features_image = np.moveaxis(features_image, -1, 0)
    # print(features.shape)
    # print(features_image.shape)

    # Now you can view the new_features using napari
    if show_napari: napari.view_image(features_image)
    return features_image

In [None]:
# RUN

# image = image.T

# Get features of multiple (all) layers
conv_layers = [0,2]#,5,7,10,12,14,17,19,21,24,26,28]
all_conv = [get_layer_features(image, l) for l in conv_layers]


### Pad first dimension of the layers with fewer features and concatenate all layers into a 4D Image

# Get the shapes of all outputs
shapes = [output.shape for output in all_conv]
# Find the maximum shape in each dimension
max_shape = np.max(shapes, axis=0)
# Pad all outputs to have the max shape
from numpy.lib import pad
all_conv_padded = np.array([pad(output, [(0, max_dim - dim) for dim, max_dim in zip(output.shape, max_shape)]) for output in all_conv])

# Show in Napari
napari.view_image(all_conv_padded)