# Tutorial on how to open, visualize and extract some features from a .mhd Image

This Tutorial will show how to:
    - Open and read a .mhd image
    - Visualize a .mhd image
    - Read a list of candidates from a .csv file
    - Transform from world coordinates to voxel coordinates
    - Extract some features / patches of candidates and visualize them
To be able to run this tutorial some python libraries / modules need to be installed:
    - Simple ITK: a library for handling and processing medical images
    - Numpy: a fundamental package for scientific computing with Python
    - PIL (Python Imaging Library): a library for adding image processing capabilities to your Python interpreter 
    - Matplotlib: a plotting library for the Python programming language

We start importing required modules / libraries  using the import command from python

In [4]:
import SimpleITK as sitk
import numpy as np
import csv
import os
from PIL import Image
import matplotlib.pyplot as plt
%matplotlib inline

We define now a function to:
    - Open the image 
    - Store it into a numpy array
    - Extract the following info: Pixel Spacing, Origin
This function takes as input the name of the image and returns:
    - The array corresponding to the image (numpyImage)
    - Origin (numpyOrigin)
    - PixelSpacing (numpySpacing)

In [5]:
def load_itk_image(filename):
    itkimage = sitk.ReadImage(filename)
    numpyImage = sitk.GetArrayFromImage(itkimage)
     
    numpyOrigin = np.array(list(reversed(itkimage.GetOrigin())))
    numpySpacing = np.array(list(reversed(itkimage.GetSpacing())))
     
    return numpyImage, numpyOrigin, numpySpacing

To be able to open and read the list of candidates, we need to use the csv python module. 
We define now a function to:
    - Open a csv file
    - Read a csv file
    - Save each line of a csv file
This functions takes as input the name of the csv file and returns:
    - A list of each line of the csv

In [6]:
def readCSV(filename):
    lines = []
    with open(filename, "r") as f:
        csvreader = csv.reader(f)
        for line in csvreader:
            lines.append(line)
    return lines

Since the coordinates of the candidates are given in World Coordinates, we now need to transform from world coordinates to voxel coordinates. 
We define now a function to do that. Please note that the transformation below is only valid if there is no rotation component in the transformation matrix. For all CT images in our dataset, there is no rotation component so that this formula can be used. 
This function takes as inputs:
    - The world coordinates
    - The origin
    - The pixel Spacing
This function returns:
    - Voxel coordinates (voxelCoord)

In [7]:
def worldToVoxelCoord(worldCoord, origin, spacing):
     
    stretchedVoxelCoord = np.absolute(worldCoord - origin)
    voxelCoord = stretchedVoxelCoord / spacing
    return voxelCoord

We want to extract now some features from the candidates. We define some normalized planes to extract views from the candidates

In [8]:
def normalizePlanes(npzarray):
     
    maxHU = 400.
    minHU = -1000.
 
    npzarray = (npzarray - minHU) / (maxHU - minHU)
    npzarray[npzarray>1] = 1.
    npzarray[npzarray<0] = 0.
    return npzarray

After having defined these auxiliary functions, we can now define the main part of our script.
First we:
    - Specify the path where the image (img_path) is 
    - Specificy the path where the file with the list of candidates is (cand_path)

In [9]:
img_path  = './data/1.3.6.1.4.1.14519.5.2.1.6279.6001.109002525524522225658609808059.mhd'
cand_path = './data/candidates.csv'


Using the function defined in line 2 we can:
    - Load the image
    - Extract the Origin
    - Extract the Pixel Spacing 

In [10]:
# load image
numpyImage, numpyOrigin, numpySpacing = load_itk_image(img_path)
print(numpyImage.shape)
print(numpyOrigin)
print(numpySpacing)

(161, 512, 512)
[-194.       -108.300003 -187.699997]
[1.25       0.54882801 0.54882801]


Using the function defined in line 3 we can:
    - Load the csv file
    - Get the candidates 
Using the function defined in line 4 we can: 
    - Transform from world to voxel coordinates

['1.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222365663678666836860',
 '-56.08',
 '-67.85',
 '-311.92',
 '0']

In [14]:
# load candidates
cands = readCSV(cand_path)
# get candidates
for cand in cands[1:]:
    worldCoord = np.asarray([float(cand[3]),float(cand[2]),float(cand[1])])
    voxelCoord = worldToVoxelCoord(worldCoord, numpyOrigin, numpySpacing)
    voxelWidth = 65

Using the function defined in line 5 we can:
    - Extract patch for each candidate in the list
    - Visualize each patch
    - Save each page as image in .tiff format

In [11]:
INPUT_FOLDER = './data'
datapoints = os.listdir(INPUT_FOLDER)
datapoints = set((filter(lambda x: ".mhd" in x, datapoints)))

In [12]:
loaded_img = dict()

In [16]:
i = 0
all_data = []
for cand in cands[31300:]:
    fname = cand[0] + ".mhd"
    if fname not in datapoints:
        continue
    if i % 1000 == 0:
        print("Done %d" % i)
    
    i += 1

    img_path = './data/' + fname
    if img_path in loaded_img:
        numpyImage, numpyOrigin, numpySpacing = loaded_img[img_path]
    else:
        numpyImage, numpyOrigin, numpySpacing = load_itk_image(img_path)
        loaded_img[img_path] = (numpyImage, numpyOrigin, numpySpacing)
    
    worldCoord = np.asarray([float(cand[3]),float(cand[2]),float(cand[1])])
    voxelCoord = worldToVoxelCoord(worldCoord, numpyOrigin, numpySpacing)
    voxelWidth = 100
    
    patch = numpyImage[int(voxelCoord[0]),int(voxelCoord[1]-voxelWidth/2):int(voxelCoord[1]+voxelWidth/2),int(voxelCoord[2]-voxelWidth/2):int(voxelCoord[2]+voxelWidth/2)]
    patch = normalizePlanes(patch)

    outputDir = 'patches/'
    all_data.append(np.append([int(cand[4])], patch.flatten()))
    #plt.imshow(patch, cmap='gray')
    #plt.show()
    #Image.fromarray(patch*255).convert('L').save(os.path.join(outputDir, 'patch_' + str(worldCoord[0]) + '_' + str(worldCoord[1]) + '_' + str(worldCoord[2]) + '.tiff'))

Done 0
Done 1000
Done 2000
Done 3000
Done 4000
Done 5000
Done 6000
Done 7000
Done 8000
Done 9000
Done 10000
Done 11000
Done 12000
Done 13000
Done 14000
Done 15000
Done 16000
Done 17000
Done 18000
Done 19000
Done 20000
Done 21000
Done 22000
Done 23000
Done 24000
Done 25000
Done 26000
Done 27000
Done 28000
Done 29000
Done 30000
Done 31000
Done 32000
Done 33000
Done 34000
Done 35000
Done 36000
Done 37000
Done 38000
Done 39000
Done 40000
Done 41000
Done 42000
Done 43000
Done 44000
Done 45000
Done 46000
Done 47000
Done 48000
Done 49000
Done 50000
Done 51000
Done 52000
Done 53000
Done 54000


In [141]:
results = np.vstack(all_data)

ValueError: all the input array dimensions except for the concatenation axis must match exactly

In [143]:
all_data[0].shape

(10001,)

In [17]:
f_all_data = np.vstack(filter(lambda x: x.shape[0] == 10001, all_data))

In [18]:
np.save("./sample_data_2", f_all_data)