Once you have organised all the files into your `raw_image_info.csv` file, we now need to generate the raw_label images to train our model. These will be calculated based off the raw_labels found in `raw_annotated_file_path`.

# Load dependencies

In [1]:
%gui qt5
import torch
import torchio as tio
from torch.utils.data import DataLoader
import scipy.io
from scipy.ndimage import maximum_filter
import pandas as pd
import numpy as np
import napari
import h5py

# Generation of training data

We have volumes of micro-ct data that we are trying to label. However, the labels we currently have are 3d point locations, which isn't a format that our deep learning model can link spatially to our ct data. 

We want to convert these 3d point locations into a new "prediction volume". This is what we want our deep learning model to end up producing. It is also a format that our deep learning model can read, link spatially to our data and produce.

## Let's load in the info we need from the csv file

In [2]:
info = pd.read_csv('raw_image_info.csv')

info

Unnamed: 0,image_file_path,raw_annotated_file_path
0,//mnt/d37c99c5-3b94-47b9-9965-c66fd9a16e23/jak...,//home/jake/projects/mctv_resfiles/ants/diurna...
1,//mnt/d37c99c5-3b94-47b9-9965-c66fd9a16e23/jak...,//home/jake/projects/mctv_resfiles/fiddlercrab...
2,//mnt/d37c99c5-3b94-47b9-9965-c66fd9a16e23/jak...,//home/jake/projects/mctv_resfiles/fiddlercrab...
3,//mnt/d37c99c5-3b94-47b9-9965-c66fd9a16e23/jak...,//home/jake/projects/mctv_resfiles/fiddlercrab...
4,//mnt/d37c99c5-3b94-47b9-9965-c66fd9a16e23/jak...,//home/jake/projects/mctv_resfiles/fiddlercrab...
...,...,...
152,//mnt/d37c99c5-3b94-47b9-9965-c66fd9a16e23/jak...,
153,//mnt/d37c99c5-3b94-47b9-9965-c66fd9a16e23/jak...,
154,//mnt/d37c99c5-3b94-47b9-9965-c66fd9a16e23/jak...,
155,//mnt/d37c99c5-3b94-47b9-9965-c66fd9a16e23/jak...,


Let's view one of the label files to see where the data is

In [3]:
f = h5py.File(info.loc[0, 'raw_annotated_file_path'])

f['save_dat'].visititems(lambda n, o:print(n, o))

ana <HDF5 group "/save_dat/ana" (1 members)>
ana/para <HDF5 group "/save_dat/ana/para" (11 members)>
ana/para/allow_splitting_rows_by_distance <HDF5 dataset "allow_splitting_rows_by_distance": shape (1, 1), type "<f8">
ana/para/distance_for_pointtext <HDF5 dataset "distance_for_pointtext": shape (1, 1), type "<f8">
ana/para/include_non_empty <HDF5 dataset "include_non_empty": shape (1, 1), type "<f8">
ana/para/main_marker_size <HDF5 dataset "main_marker_size": shape (1, 1), type "<f8">
ana/para/pt_th <HDF5 dataset "pt_th": shape (1, 1), type "<f8">
ana/para/row_th <HDF5 dataset "row_th": shape (1, 1), type "<f8">
ana/para/spline_stiff <HDF5 dataset "spline_stiff": shape (1, 1), type "<f8">
ana/para/spline_stiff_smooth <HDF5 dataset "spline_stiff_smooth": shape (1, 1), type "<f8">
ana/para/spline_stiff_soft_mult <HDF5 dataset "spline_stiff_soft_mult": shape (1, 1), type "<f8">
ana/para/spline_stiff_z_mult <HDF5 dataset "spline_stiff_z_mult": shape (1, 1), type "<f8">
ana/para/text_size 

  f = h5py.File(info.loc[0, 'raw_annotated_file_path'])


Here's the dataset

In [4]:
pd.DataFrame(np.array(f['save_dat']['data']['marked']))

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,5170,5171,5172,5173,5174,5175,5176,5177,5178,5179
0,210.378869,208.71522,203.161893,201.773182,216.326074,211.377609,207.015189,202.492341,200.279329,199.304796,...,745.026416,746.956193,749.140737,752.58752,755.880528,750.993639,750.950449,748.290773,750.992589,752.272633
1,610.125062,608.291494,617.583415,627.088487,597.115077,601.899662,603.774637,614.244095,624.459194,633.980652,...,443.376608,430.567993,420.227072,411.458105,403.947978,401.98969,397.802645,439.233097,430.750563,421.386247
2,618.403081,605.82051,597.841257,598.72229,622.821452,613.526117,597.957832,588.549029,587.718511,594.389523,...,417.117119,429.199554,438.165841,448.930036,460.621231,462.513364,472.234956,424.412174,436.990685,446.607978
3,11.0,11.0,11.0,11.0,12.0,12.0,12.0,12.0,12.0,12.0,...,76.0,76.0,76.0,76.0,76.0,76.0,76.0,77.0,77.0,77.0
4,52.0,53.0,54.0,55.0,51.0,52.0,53.0,54.0,55.0,56.0,...,42.0,43.0,44.0,45.0,46.0,47.0,48.0,43.0,44.0,45.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Here's the locations of the annotations in x, y and z space

In [5]:
pd.DataFrame(np.array(f['save_dat']['data']['marked'])).iloc[[2, 1, 0], :]

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,5170,5171,5172,5173,5174,5175,5176,5177,5178,5179
2,618.403081,605.82051,597.841257,598.72229,622.821452,613.526117,597.957832,588.549029,587.718511,594.389523,...,417.117119,429.199554,438.165841,448.930036,460.621231,462.513364,472.234956,424.412174,436.990685,446.607978
1,610.125062,608.291494,617.583415,627.088487,597.115077,601.899662,603.774637,614.244095,624.459194,633.980652,...,443.376608,430.567993,420.227072,411.458105,403.947978,401.98969,397.802645,439.233097,430.750563,421.386247
0,210.378869,208.71522,203.161893,201.773182,216.326074,211.377609,207.015189,202.492341,200.279329,199.304796,...,745.026416,746.956193,749.140737,752.58752,755.880528,750.993639,750.950449,748.290773,750.992589,752.272633


Here's the category associated with the annotations

In [6]:
pd.DataFrame(np.array(f['save_dat']['data']['marked'])).iloc[5, :]

0       0.0
1       0.0
2       0.0
3       0.0
4       0.0
       ... 
5175    1.0
5176    1.0
5177    1.0
5178    1.0
5179    1.0
Name: 5, Length: 5180, dtype: float64

## Let's set up some helper functions to generate annotated prediction volumes for our analysis.

In [44]:
def _load_point_data(dir):
    print('loading point data from .mat files...')
    # load classifications
    f = h5py.File(dir)
    classification = pd.DataFrame(np.array(f['save_dat']['data']['marked'])).iloc[5, :]
    points = pd.DataFrame(np.array(f['save_dat']['data']['marked'])).iloc[[2, 1, 0], :].T
    points.columns = ['x', 'y', 'z']
    # convert back to numpy array and round to nearest voxel
    points = np.array(points)
    points = np.round(points).astype(int)
    # get corneas and rhabdom locations with x, y and z data
#     import pdb; pdb.set_trace()
    cornea_indx = (classification == 0) | (classification == 2)
    rhabdom_indx = (classification == 1) | (classification == 3)
    cornea_locations = points[cornea_indx, :]
    rhabdom_locations = points[rhabdom_indx, :]
    return cornea_locations, rhabdom_locations

def _point_to_segmentation_vol(image, cornea_locations, rhabdom_locations):
    print('converting point data to segmentation volume...')
    # create empty matrix the size of original data
    empty = image#.copy()
    print('clearing image')
    empty[:, :, :] = 0
    
    print('copying empty images')
    corneas = empty.copy()
    rhabdoms = empty.copy()
    
    print('assigning positions of corneas and rhabdoms')
    corneas[
        0,
        cornea_locations[:, 2],
        cornea_locations[:, 1],
        cornea_locations[:, 0]
    ] = 1
    rhabdoms[
        0,
        rhabdom_locations[:, 2],
        rhabdom_locations[:, 1],
        rhabdom_locations[:, 0]
    ] = 1
    
    # now use a maximum filter to make points a slightly larger areak
    # note, that maximum filter makes predictions a cube without rounded edges
    # a gaussian filter may be more appropriate
    print('running maximum filter')
    corneas = maximum_filter(corneas, size=3)
    rhabdoms = maximum_filter(rhabdoms, size=3)
    
    print('merging cornea and rhabdom images into single volume')
    # now merge both into a single prediction volume
    # 0 = nothing
    # 1 = cornea
    # 2 = rhabdom
    prediction = empty
    prediction[corneas > 0] = 1
    prediction[rhabdoms > 0] = 2
    
    return prediction

def create_annotated_volumes(dir, image):
    cornea_locations, rhabdom_locations = _load_point_data(dir)
    annotated_vol = _point_to_segmentation_vol(
        image,
        cornea_locations,
        rhabdom_locations
    )
    print('done.')
    return annotated_vol
    

In [45]:
n_rows = info.shape[0]
i = 0
#for i in n_rows:
img = info.loc[i, 'image_file_path']
label = info.loc[i, 'raw_annotated_file_path']

transform = tio.ToCanonical()
img = tio.ScalarImage(
    img
)
ann = tio.LabelMap(
    tensor=create_annotated_volumes(
        label,
        img.data.numpy()
    )
)

viewer = napari.Viewer()
viewer.dims.ndisplay = 3 # toggle 3 dimensional view
viewer.add_image(img.data.numpy())
viewer.add_image(ann.data.numpy())

loading point data from .mat files...
converting point data to segmentation volume...
clearing image


  f = h5py.File(dir)


copying empty images
assigning positions of corneas and rhabdoms
running maximum filter
merging cornea and rhabdom images into single volume
done.


<Image layer 'Image [1]' at 0x7fa19c4fe7c0>

INFO - 2021-08-13 13:06:46,200 - acceleratesupport - No OpenGL_accelerate module loaded: %s


These are quite slow, they use numpy only and indexing

But let's try reading the data as point clouds and voxelising it

In [8]:
import open3d as o3d
N = 2000

i = 0
img = info.loc[i, 'image_file_path']
label = info.loc[i, 'raw_annotated_file_path']

corneas, rhabdoms = _load_point_data(label)

pcd = o3d.geometry.PointCloud()
pcd.points = o3d.utility.Vector3dVector(corneas)
# fit to unit cube
# pcd.scale(1 / np.max(pcd.get_max_bound() - pcd.get_min_bound()),
#           center=pcd.get_center())
pcd.colors = o3d.utility.Vector3dVector(np.random.uniform(0, 1, size=(N, 3)))

aabb = pcd.get_axis_aligned_bounding_box()
aabb.color = (1, 0, 0)
obb = pcd.get_oriented_bounding_box()
obb.color = (0, 1, 0)

o3d.visualization.draw_geometries([pcd, aabb, obb])

Jupyter environment detected. Enabling Open3D WebVisualizer.
[Open3D INFO] WebRTC GUI backend enabled.
[Open3D INFO] WebRTCWindowSystem: HTTP handshake server disabled.
loading point data from .mat files...


  f = h5py.File(dir)


Now voxelize

In [40]:
voxel_grid = o3d.geometry.VoxelGrid.create_from_point_cloud(pcd,
                                                            voxel_size=1)
o3d.visualization.draw_geometries([voxel_grid])

In [32]:
import torch
import open3d.ml.tf as ml3d
import tensorflow as tf

bounds = tio.ScalarImage(img).get_bounds()

voxel = ml3d.ops.voxelize(
    corneas,
    voxel_size=tf.constant([1., 1., 1.]),
    points_range_min=tf.constant([bounds[0][0], bounds[1][0], bounds [2][0]]),
    points_range_max=tf.constant([bounds[0][1], bounds[1][1], bounds [2][1]])
)



In [42]:
point_cloud_np = np.asarray([voxel_grid.origin + pt.grid_index*voxel_grid.voxel_size for pt in voxel_grid.get_voxels()])

In [24]:
voxels = voxel_grid.get_voxels()
voxel_indices = np.stack([voxels[i].grid_index for i in range(len(voxels))])
voxel_colours = np.stack([voxels[i].color for i in range(len(voxels))])

In [23]:
voxel_indices

array([[ 29, 198, 479],
       [ 34, 185, 481],
       [ 55, 137, 487],
       ...,
       [  3, 306, 386],
       [111, 251, 101],
       [  5, 316, 385]], dtype=int32)

In [26]:
np.unique(voxel_colours)

array([0.])

That was fast. Now let's plot this against the volume in napari

In [57]:
def create_annotated_volume(dir):
    corneas, rhabdoms = _load_point_data(label)
    pcd = o3d.geometry.PointCloud()
    pcd.points = o3d.utility.Vector3dVector(corneas)
    voxel_grid = o3d.geometry.VoxelGrid.create_from_point_cloud(pcd, voxel_size=1)
    return np.asarray(voxel_grid.points)

In [58]:
n_rows = info.shape[0]
i = 0
#for i in n_rows:
img = info.loc[i, 'image_file_path']
label = info.loc[i, 'raw_annotated_file_path']

transform = tio.ToCanonical()
img = tio.ScalarImage(img)
ann = tio.LabelMap(tensor=create_annotated_volume(label))

viewer = napari.Viewer()
viewer.dims.ndisplay = 3 # toggle 3 dimensional view
viewer.add_image(img.data.numpy())
viewer.add_image(ann.data.numpy())

loading point data from .mat files...


AttributeError: 'open3d.cpu.pybind.geometry.VoxelGrid' object has no attribute 'points'

Ok, Open3d generates a voxel grid incredibly fast, so we will now use this instead.

## Now, let's load our mct volumes and point data to make segmentation labels for training

In [3]:
id = 'P_crassipes_FEG191022_077A'
img_dir = '//media/jake/1tb_ssd/mctv_analysis/head_scans/P_crassipes_FEG191022_077A_highpriority'
matlab_ann_dir = '//media/jake/1tb_ssd/mctv_analysis/mctv_resfiles/hyperiids/P_crassipes_FEG191022_077A/P_crassipes_FEG191022_077A.mat'
labels_dir = '//media/jake/1tb_ssd/mctv_analysis/labels/'

transform = tio.ToCanonical()
img = tio.ScalarImage(
    img_dir
)
ann = tio.LabelMap(
    tensor=create_annotated_volumes(
        matlab_ann_dir,
        img.data.numpy()
    )
)

ImageSeriesReader (0x5c86430): Non uniform sampling or missing slices detected,  maximum nonuniformity:7.46568e-06



loading point data from .mat files...
converting point data to segmentation volume...
done.


In [None]:
viewer = napari.Viewer()
viewer.dims.ndisplay = 3 # toggle 3 dimensional view
viewer.add_image(img.data.numpy())
viewer.add_image(ann.data.numpy())

Let's double check that the affine matrices are the same

In [5]:
img.affine

array([[-0.00287747,  0.        ,  0.        ,  0.        ],
       [ 0.        , -0.00287747,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.00287747,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  1.        ]])

Traceback (most recent call last):
  File "/home/jake/projects/mctnet/venv/lib/python3.8/site-packages/vispy/app/backends/_qt.py", line 846, in paintGL
    self._vispy_canvas.events.draw(region=None)
  File "/home/jake/projects/mctnet/venv/lib/python3.8/site-packages/vispy/util/event.py", line 453, in __call__
    self._invoke_callback(cb, event)
  File "/home/jake/projects/mctnet/venv/lib/python3.8/site-packages/vispy/util/event.py", line 471, in _invoke_callback
    _handle_exception(self.ignore_callback_errors,
  File "/home/jake/projects/mctnet/venv/lib/python3.8/site-packages/vispy/util/event.py", line 469, in _invoke_callback
    cb(event)
  File "/home/jake/projects/mctnet/venv/lib/python3.8/site-packages/vispy/scene/canvas.py", line 218, in on_draw
    self._draw_scene()
  File "/home/jake/projects/mctnet/venv/lib/python3.8/site-packages/vispy/scene/canvas.py", line 277, in _draw_scene
    self.draw_visual(self.scene)
  File "/home/jake/projects/mctnet/venv/lib/python3.8/site-p

In [None]:
ann.affine

Now let's do a plot to make sure everything looks ok

In [None]:
subject = tio.Subject(
    mct=img,
    labels=ann,
    id=id
)
subject.plot()

Now that we've generated our labels, let's save them to disk so they can be loaded by our model for training.

In [None]:
ann.save('_label.nii.gz')

Now we can load this data quickly during training and testing. This can be done with the approach below.

In [None]:
img = tio.ScalarImage(
    
)
ann = tio.LabelMap(
    'data/'+ id + '/' + id + '_label.nii.gz',
    affine=img.affine
)
subject = tio.Subject(
    mct=img,
    labels=ann,
    id=id
)
subject.plot()