# Pre-processing your data for training

This Jupyter Notebook will teach you how to pre-process your data in order to train your own models.

## Downloading the package

Make sure that the notebook is running with Python>=3.10 and with a version of PyTorch >=1.13 installed (preferably with CUDA available).

To verify if PyTorch and Cuda are installed, run the following cell.

In [7]:
import torch
print(f"Current version of Pytorch: {torch.__version__}")
print(f"Cuda working properly: {torch.cuda.is_available()}")
device = "cuda" if torch.cuda.is_available() else "cpu"

Current version of Pytorch: 2.5.1+cu124
Cuda working properly: True


If you have the good version of Torch and Cuda is working, you can run the following cell to install our package. Otherwise, fix your Python environment before proceeding.

In [2]:
!pip install nagini3D

Collecting nagini3D
  Downloading nagini3d-0.0.1-py3-none-any.whl.metadata (7.8 kB)
Collecting csbdeep==0.7.4 (from nagini3D)
  Downloading csbdeep-0.7.4-py2.py3-none-any.whl.metadata (2.5 kB)
Collecting einops==0.7.0 (from nagini3D)
  Downloading einops-0.7.0-py3-none-any.whl.metadata (13 kB)
Collecting omegaconf==2.3.0 (from nagini3D)
  Downloading omegaconf-2.3.0-py3-none-any.whl.metadata (3.9 kB)
Collecting scikit-image==0.21.0 (from nagini3D)
  Downloading scikit_image-0.21.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (14 kB)
Collecting scipy==1.11.3 (from nagini3D)
  Downloading scipy-1.11.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.4/60.4 kB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0m
Collecting antlr4-python3-runtime==4.9.* (from omegaconf==2.3.0->nagini3D)
  Downloading antlr4-python3-runtime-4.9.3.tar.gz (117 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━

## Loading the tools

In [1]:
from nagini3D.data.format_dataset.data_reading_tools import (compute_barycenter, compute_radius, mask_to_contour,
                                bound_box, farthest_point_sampling, distance_to_center)

In [5]:
from glob import glob
from os.path import join, splitext, basename
import tifffile
import numpy as np

## Setting the parameters

In [None]:
input_dir = ""        # Path to the directory where the masks are stored
output_dir = ""       # Path where you want to store the corresponding spot maps and samplings
nb_sampling = 61      # Number of points to sample on the surface of each mask
verbose = True        # Set to True to print updates on the sampling process
anisotropy = [1,1,1]  # Anisotropy ratio, if your image is anisotropic, set each value to the ratio of longuest_dim/current_dim


mask_files = glob(join(input_dir,"*.tif"))

# if sampling type is "erosion" then the surface of the object is considered made of the voxels of the masks that are close to the background
# if sampling type is "dilation" then the surface of the object is considered made of the voxels of the background that are close to the mask
#SAMPLING_TYPE = "erosion"
SAMPLING_TYPE = "dilation"

## Sampling the masks

In [None]:
for mask_f in mask_files:

  filename = basename(mask_f)
  name_no_ext = splitext(filename)[0]
  if verbose: print(f"Processing file : {filename}\n")

  # Loading the current mask
  mask = tifffile.imread(mask_f)

  # Creating meshgrid used to compute the barycenter
  nx,ny,nz = mask.shape
  vx,vy,vz = np.arange(nx)[:,None,None], np.arange(ny)[None,:,None], np.arange(nz)[None, None, :]
  mesh = (vx, vy, vz)

  # Extract all the labels in the mask (try to avoid non-contiguous/missing labels in your data)
  mask_idx = np.unique(mask)[1:]

  nb_cells = len(mask_idx)

  # Initializing the spot map and the samplings list
  proba_map = np.zeros_like(mask, dtype=float)
  gaussian_mask = np.zeros_like(mask, dtype=int)

  center_list = list()
  radius_list = list()
  sampling_list = list()
  idx_list = list()

  cells_count = 0

  # Processing each label/object
  for i,idx in enumerate(mask_idx):
    crt_mask = (mask == idx)*1
    N = crt_mask.sum()
    if verbose: print(f"\rCell nb {idx}/{nb_cells}", end="")

    # Computing its barycenter
    barycenter = compute_barycenter(crt_mask, mesh)

    # Creating a binary mask equal to 1 on the surface of the object
    crt_contour = mask_to_contour(crt_mask, mode = SAMPLING_TYPE)

    # Computing the radius of the object
    radius = compute_radius(crt_contour, mesh, barycenter)

    # Extracting the bounding box of the object (useful to avoid processing the whole image during sampling)
    bb = bound_box(crt_contour, mesh)

    x_min, x_max, y_min, y_max, z_min, z_max = bb

    # Cropping mask and contour mask to correspond to the bounding box
    small_contour = crt_contour[x_min:x_max+1, y_min:y_max+1, z_min:z_max+1]
    small_mask = crt_mask[x_min:x_max+1, y_min:y_max+1, z_min:z_max+1]

    # Creating the spot map using the distance of each voxel to the current barycenter
    bx, by, bz = barycenter
    dist = np.sqrt((anisotropy[0]*(vx-bx))**2+(anisotropy[1]*(vy-by))**2+(anisotropy[2]*(vz-bz)**2))
    masked_dist =  dist*crt_mask
    M = np.max(masked_dist)
    masked_dist = (M - masked_dist)*crt_mask
    den = np.max(masked_dist)
    if den>0:
        masked_dist = masked_dist/den
        proba_map += masked_dist

    # Sampling the surface of the current object using a Farthest Point Sampling algorithm
    sampling = farthest_point_sampling(small_contour, nb_sampling, anisotropy=anisotropy, device = device)
    centered_sampling = np.array(sampling) + np.array([x_min, y_min, z_min]) - np.array([barycenter])
    sampling_list.append(centered_sampling)

    # Storing the sampling
    center_list.append(barycenter)
    radius_list.append(radius)
    idx_list.append(cells_count)

    cells_count+=1

  # Saving the samplings and the spots map
  npz_path = join(output_dir, name_no_ext+".npz")
  np.savez(npz_path, centers = np.array(center_list), radius = np.array(radius_list), samplings = np.array(sampling_list))

  proba_path = join(output_dir, name_no_ext+".tif")
  tifffile.imwrite(proba_path, proba_map)

  print()