# BUILDING A TFRECORDS DATABASE

In this notebook, a TFRecords database is built from the MRI dataset that has already been organized in folders, according to the possible labels:

* CN
* MCI
* AD

The code for that organization is in the notebook named `ImagePreprocessing.ipynb`. This notebook is the first one executed using Google Colaboratory. The objective is to create a series of `.tfrecords` files encoding the entire ADNI dataset. Since it is very big and Google Colab offers 12GB of RAM, it was mandatory to use this format to avoid the use of native Python generators, which would be way too slow for training.

First, mount Google Drive on this notebook.

In [0]:
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


`SimpleITK` has to be installed, because it is not installed by default on Colab. `DLTK` will also be very useful for some preprocessing steps, like whitening. Using `pip` will do.

In [0]:
! pip install SimpleITK
! pip install dltk

Import all the needed libraries. A brief description of them:

* `Tensorflow` is the selected deep learning framework.
* `SimpleITK` for reading `.nii` images.
* `numpy` for working with numbers and matrixes. It is also necessary for `SimpleITK` to work properly.
* `pandas` for loading the data description file.
* `keras` will be used for model construction, working on `tensorflow`.
* `dltk.io.preprocessing` for whitening the images
* `matplotlib.pyplot` for image visualization
* `os` for file interaction

In [0]:
import tensorflow as tf
import SimpleITK as sitk
import numpy as np
import pandas as pd

from tensorflow import keras
from dltk.io import preprocessing
from matplotlib import pyplot as plt

import os

---

## PREVIOUS STEPS

Implement methods to build a `tf.train.Feature` from a basic python or numpy datatype. This is a basic step towards building any TFRecords database.

In [0]:
def _int64_feature(value):
  return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))

def _float_feature(value):
  return tf.train.Feature(float_list=tf.train.FloatList(value=value))
  
def _bytes_feature(value):
  return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

Define all constants that allow us to access our data:

In [0]:
# basic RAW databases, with registrated and skull-stripped images
DB_REG_PATH = '/content/gdrive/My Drive/Education/Master/MIA/TFM/Data/ADNI/MRI/REGISTERED/'
DB_SS_PATH = '/content/gdrive/My Drive/Education/Master/MIA/TFM/Data/ADNI/MRI/SKULL-STRIPPED/'

# the data description file
DESCRIPTION_FILE = '/content/gdrive/My Drive/Education/Master/MIA/TFM/Data/ADNI/MRI/Description.csv'

# data subfolders (labels)
CLASS_SUBFOLDERS = ['MCI/', 'AD/', 'CN/']
BINARY_CLASS_SUBFOLDERS = ['AD/', 'CN/']

# database with the unsupervised learning data
DB_UL_PATH = '/content/gdrive/My Drive/Education/Master/MIA/TFM/Data/UNSUPERVISED/REGISTERED/'
DB_UL_SS_PATH = '/content/gdrive/My Drive/Education/Master/MIA/TFM/Data/UNSUPERVISED/SKULL-STRIPPED/'

# 3D supervised TFRecords database
DB_TF_3D_PATH = '/content/gdrive/My Drive/Education/Master/MIA/TFM/Data/TFRecords/MRI/TFRecords3D/'
# tfrecords files - registered and skull stripped
TFREC_3D_REG_TRAIN = 'train.3D.registered.tfrecords'
TFREC_3D_SS_TRAIN = 'train.3D.skull_stripped.tfrecords'
TFREC_3D_REG_TEST = 'test.3D.registered.tfrecords'
TFREC_3D_SS_TEST = 'test.3D.skull_stripped.tfrecords'
TFREC_3D_REG_VAL = 'validation.3D.registered.tfrecords'
TFREC_3D_SS_VAL = 'validation.3D.skull_stripped.tfrecords'

# 2D supervised TFRecords database
DB_TF_2D_PATH = '/content/gdrive/My Drive/Education/Master/MIA/TFM/Data/TFRecords/MRI/TFRecords2D/'
# tfrecords files - registered and skull stripped
# also created a binary tfrecord, which was intended to be used for binary classification (deprecated)
TFREC_2D_REG_TRAIN = 'train.2D.registered.tfrecords'
TFREC_2D_SS_TRAIN = 'train.2D.skull_stripped.tfrecords'
TFREC_2D_BIN_TRAIN = 'train.2D.binary.tfrecords'
TFREC_2D_REG_TEST = 'test.2D.registered.tfrecords'
TFREC_2D_SS_TEST = 'test.2D.skull_stripped.tfrecords'
TFREC_2D_BIN_TEST = 'test.2D.binary.tfrecords'
TFREC_2D_REG_VAL = 'validation.2D.registered.tfrecords'
TFREC_2D_SS_VAL = 'validation.2D.skull_stripped.tfrecords'
TFREC_2D_BIN_VAL = 'validation.2D.binary.tfrecords'


# 3D unsupervised TFRecords database
DB_TF_UL_PATH = '/content/gdrive/My Drive/Education/Master/MIA/TFM/Data/TFRecords/MRI/TFRecordsUL/'
# tfrecords files - registered and skull stripped
TFREC_UL_REG_TRAIN = 'train.UL.registered.tfrecords'
TFREC_UL_SS_TRAIN = 'train.UL.skull_stripped.tfrecords'
TFREC_UL_REG_TEST = 'test.UL.registered.tfrecords'
TFREC_UL_SS_TEST = 'test.UL.skull_stripped.tfrecords'
TFREC_UL_REG_VAL = 'validation.UL.registered.tfrecords'
TFREC_UL_SS_VAL = 'validation.UL.skull_stripped.tfrecords'

Identifiers for the three different classes are needed. Also save the shape of the images, in case that information is needed.

In [0]:
# label mapping
LABELS = {'CN': 0, 'MCI': 1, 'AD': 2}
BINARY_LABELS = {'CN': 0, 'AD': 1}

# shape of the images, both 3D and 2D
IMG_SHAPE = (78, 110, 86)
IMG_2D_SHAPE = (IMG_SHAPE[1] * 4, IMG_SHAPE[2] * 4)


Define the percentage of the data that are going to be used as a test and validation set. When using TFRecords, data has to be separated in different files, because they cannot be splitted later in training.

In [0]:
TEST_SPLIT = 0.15
VALIDATION_SPLIT = 0.15

---

## SUPERVISED DATA

### Train/Test supervised data split

Find a way to split the data. Load the path of every file in a list, and then split the list so the references of training, validation and test data are separated.

In [0]:
# array for saving the filenames
filenames = np.array([])

# iterate all three class folders in the db
for subf in CLASS_SUBFOLDERS:
  # using the skull stripped data
  path = DB_SS_PATH + subf
  for name in os.listdir(path):
    complete_name = os.path.join(path, name)
    if os.path.isfile(complete_name):
      filenames = np.concatenate((filenames, complete_name), axis=None)

In [0]:
filenames.shape

(1539,)

Now, shuffle and split the `ndarray`:

In [0]:
for i in range(1000):
  np.random.shuffle(filenames)
  
test_margin = int(len(filenames) * TEST_SPLIT)
training_set, test_set = filenames[test_margin:], filenames[:test_margin]

validation_margin = int(len(training_set) * VALIDATION_SPLIT)
training_set, validation_set = training_set[validation_margin:], training_set[:validation_margin]

print('Training set:', training_set.shape)
print('Validation set:', validation_set.shape)
print('Test set:', test_set.shape)

Training set: (1113,)
Validation set: (196,)
Test set: (230,)



### 3D TFRecords database for supervised learning

Let´s build the 3D TFRecords database for supervised learning. Keep in mind that this code can be reused to create databases for both skull-stripped and non-skull-stripped data, just by modifying the referenced constants. The final work used the skull stripped data.


Load the data description file.

In [0]:
description = pd.read_csv(DESCRIPTION_FILE)
description.head()

Unnamed: 0,Image Data ID,Subject,Group,Sex,Age,Visit,Modality,Description,Type,Acq Date,Format,Downloaded
0,97327,941_S_1311,MCI,M,69,1,MRI,MPR; GradWarp; B1 Correction; N3; Scaled,Processed,3/02/2007,NiFTI,4/04/2019
1,97341,941_S_1311,MCI,M,70,3,MRI,MPR-R; GradWarp; B1 Correction; N3; Scaled,Processed,9/27/2007,NiFTI,4/02/2019
2,112538,941_S_1311,MCI,M,70,4,MRI,MPR; GradWarp; B1 Correction; N3; Scaled,Processed,6/01/2008,NiFTI,4/02/2019
3,75150,941_S_1202,CN,M,78,3,MRI,MPR; GradWarp; B1 Correction; N3; Scaled,Processed,8/24/2007,NiFTI,4/02/2019
4,105437,941_S_1202,CN,M,79,4,MRI,MPR; GradWarp; B1 Correction; N3; Scaled,Processed,2/28/2008,NiFTI,4/02/2019


Now, design a method that loads a 3D `.nii` image and some of its information. Taking the absolute path, split the name by directories to get the image name. With that, obtain the class label. Also, obtain the subject ID from the image file name. Finally, read the image using `SimpleITK`, transform into a `numpy` array and return the image, the label and the subject ID.

In [0]:
def load_image_3D(abs_path):
  ''' Load an image (.nii) and its label, from its absolute path.
      
      Parameters:
        abs_path -- Absolute path, filename included
        
      Returns:
        img -- The .nii image, converted into a numpy array
        label -- The label of the image
        
  '''
  
  # obtain the label from the path (it is the last directory name)
  split_path = abs_path.split('/')
  label = LABELS[split_path[-2]]
  
  # obtain the ID of the subject
  img_name = split_path[-1]
  subject = '_'.join(img_name.split('_')[1:4])
  
  # load the image with SimpleITK
  sitk_image = sitk.ReadImage(abs_path)
  
  # transform into a numpy array
  img = sitk.GetArrayFromImage(sitk_image)
  
  return img, label, subject

Now, create a new method for creating `.tfrecords` files. It would be necessary to specifiy the filenames of all the images that are going to be stored in the `.tfrecords`, as well as the name for this file. Then, follow the common strategy for creating this type of files, like it can be seen in countless tutorials around the internet. 

In the method, several extra data, besides the image and label, are stored for each example (subject, age, sex, preprocessing and image ID). This was stored just in case these data were needed in forward steps, although, in the end, they were not. It does not really matter because the extra space needed was insignificant.  

In [0]:
def create_tf_record(img_filenames, tf_rec_filename):
  ''' Create a TFRecord file, including the information
      of the specified images
      
      Parameters:
        img_filenames -- Array with the path to every
                         image that is going to be included
                         in the TFRecords file.
        tf_rec_filename -- Name of the TFRecords file.
  '''
  
  # open the file
  writer = tf.python_io.TFRecordWriter(tf_rec_filename)
  
  # iterate through all .nii files
  for meta_data in img_filenames:

    # load the image and label
    img, label, subject = load_image_3D(meta_data)
    
    # also save the preprocessing information and the subject age and sex
    meta_data_split = meta_data.split('/')
    filename_split = meta_data_split[-1].split('_')
    
    # save the preprocessing technique used
    preprocessing = '_'.join(filename_split[5:-3])
    
    # get the image ID
    if filename_split[-1].endswith('.gz'): image_ID = int(filename_split[-1][1:-7])
    else: image_ID = int(filename_split[-1][1:-4])
      
    # get the age and sex of the subject
    age_and_sex = description.loc[description['Image Data ID'] == image_ID, ['Age', 'Sex']].iloc[0]
    
    # create a feature
    feature = {'label': _int64_feature(label),
               'subject': _bytes_feature(subject.encode('utf-8')),
               'preprocessing': _bytes_feature(preprocessing.encode('utf-8')),
               'subject_age': _int64_feature(age_and_sex[0]),
               'subject_sex': _bytes_feature(age_and_sex[1].encode('utf-8')),
               'image_id': _int64_feature(image_ID),
               'image': _float_feature(img.ravel())}

    # create an example protocol buffer
    example = tf.train.Example(features=tf.train.Features(feature=feature))

    # serialize to string and write on the file
    writer.write(example.SerializeToString())
    
  writer.close()

Define the complete path names for the `.tfrecords` files.

In [0]:
train_tfrec = os.path.join(DB_TF_3D_PATH, TFREC_3D_SS_TRAIN)
test_tfrec = os.path.join(DB_TF_3D_PATH, TFREC_3D_SS_TEST)
val_tfrec = os.path.join(DB_TF_3D_PATH, TFREC_3D_SS_VAL)

Finally, create the `.tfrecords` files. 

In [0]:
create_tf_record(training_set, train_tfrec)
create_tf_record(test_set, test_tfrec)
create_tf_record(validation_set, val_tfrec)

### 2D TFRecords database for supervised learning

Let´s build the 2D TFRecords database for supervised learning. Keep in mind that this code can be reused to create databases for both skull-stripped and non-skull-stripped data, just by modifying the referenced constants. The final work used skull-stripped data.




In this case, images need to be transformed to 2D. The following method does exactly that, taking multiple horizontal slices and putting them in a 2D matrix. In the final version, 16 slices were used. Some considerations:

* The top slice was selected manually, after some tests. Higher cuts did not show any useful information.
* The same for the bottom slice. Below slices only showed some of the brainstem. 
* If 16 cuts were wanted, every two slices from 30 to 60 has to be selected.

In [0]:
def slices_matrix_2D(img):
  ''' Transform a 3D MRI image into a 2D image, by obtaining 9 slices 
      and placing them in a 4x4 two-dimensional grid.
      
      All 16 cuts are from a horizontal/axial view. They are selected
      from the 30th to the 60th level of the original 3D image.
      
      Parameters:
        img -- np.ndarray with the 3D image
        
      Returns:
        np.ndarray -- The resulting 2D image
  '''
  
  # create the final 2D image 
  image_2D = np.empty(IMG_2D_SHAPE)
  
  # set the limits and the step
  TOP = 60
  BOTTOM = 30
  STEP = 2
  N_CUTS = 16
  
  # iterator for the cuts
  cut_it = TOP
  # iterator for the rows of the 2D final image
  row_it = 0
  # iterator for the columns of the 2D final image
  col_it = 0
  
  for cutting_time in range(N_CUTS):
    
    # cut
    cut = img[cut_it, :, :]
    cut_it -= STEP
    
    # reset the row iterator and move the
    # col iterator when needed
    if cutting_time in [4, 8, 12]:
      row_it = 0
      col_it += cut.shape[1]
    
    # copy the cut to the 2D image
    for i in range(cut.shape[0]):
      for j in range(cut.shape[1]):
        image_2D[i + row_it, j + col_it] = cut[i, j]
    row_it += cut.shape[0]
  
  # return the final 2D image, with 3 channels
  # this is necessary for working with most pre-trained nets
  return np.repeat(image_2D[None, ...], 3, axis=0).T
  #return image_2D

The following method uses the previous 2D transformation to load the 3D images from disk and transforms them. Also returns the image label.

The label mapper argument was intended to be used for modifying class labels, but it was never really used for anything special. It always used `LABELS` as the mapper.

In [0]:
def load_image_2D(abs_path, labels):
  ''' Load an image (.nii) and its label, from its absolute path.
      Transform it into a 2D image, by obtaining 16 slices and placing them
      in a 4x4 two-dimensional grid.
      
      Parameters:
        abs_path -- Absolute path, filename included
        labels -- Label mapper
        
      Returns:
        img -- The .nii image, converted into a numpy array
        label -- The label of the image (from argument 'labels')
        
  '''
  
  # obtain the label from the path (it is the last directory name)
  label = labels[abs_path.split('/')[-2]]
  
  # load the image with SimpleITK
  sitk_image = sitk.ReadImage(abs_path)
  
  # transform into a numpy array
  img = sitk.GetArrayFromImage(sitk_image)
  
  # apply whitening
  img = preprocessing.whitening(img)
  
  # make the 2D image
  img = slices_matrix_2D(img)
  
  return img, label

Define the complete filename for each one of the `.tfrecords`.

In [0]:
train_tfrec2D = os.path.join(DB_TF_2D_PATH, TFREC_2D_SS_TRAIN)
test_tfrec2D = os.path.join(DB_TF_2D_PATH, TFREC_2D_SS_TEST)
val_tfrec2D = os.path.join(DB_TF_2D_PATH, TFREC_2D_SS_VAL)

Design the method for creating a `.tfrecords`, given the all the images filenames, the name of the file and the label mapper (always the `LABELS` constant). It uses the previous two methods for loading the images. 

In [0]:
def create_tf_record_2D(img_filenames, tf_rec_filename, labels):
  ''' Create a TFRecord file, including the information
      of the specified images, after converting them into 
      a 2D grid.
      
      Parameters:
        img_filenames -- Array with the path to every
                         image that is going to be included
                         in the TFRecords file.
        tf_rec_filename -- Name of the TFRecords file.
        labels -- Label mapper
  '''
  
  # open the file
  writer = tf.python_io.TFRecordWriter(tf_rec_filename)
  
  # iterate through all .nii files
  for meta_data in img_filenames:

    # load the image and label
    img, label = load_image_2D(meta_data, labels)

    # create a feature
    feature = {'label': _int64_feature(label),
               'image': _float_feature(img.ravel())}

    # create an example protocol buffer
    example = tf.train.Example(features=tf.train.Features(feature=feature))

    # serialize to string and write on the file
    writer.write(example.SerializeToString())
    
  writer.close()

Finally, create the `.tfrecords`.

In [0]:
create_tf_record_2D(training_set, train_tfrec2D, LABELS)
create_tf_record_2D(test_set, test_tfrec2D, LABELS)
create_tf_record_2D(validation_set, val_tfrec2D, LABELS)

---

### Train/Test unsupervised data split

Save all file names in an array. Keep in mind that the data we have for supervised learning can also be used for unsupervised learning, so we need to access both databases this time.

In [0]:
# array where we are going to save the filenames
filenames = np.array([])

# iterate all three class folders in the supervised db
for subf in CLASS_SUBFOLDERS:
  path = DB_SS_PATH + subf
  for name in os.listdir(path):
    complete_name = os.path.join(path, name)
    if os.path.isfile(complete_name): 
      filenames = np.concatenate((filenames, complete_name), axis=None)
    
for name in os.listdir(DB_UL_SS_PATH):
  complete_name = os.path.join(DB_UL_SS_PATH, name)
  if os.path.isfile(complete_name):
      filenames = np.concatenate((filenames, complete_name), axis=None)

Then shuffle and split:

In [0]:
for i in range(1000):
  np.random.shuffle(filenames)
  
test_margin = int(len(filenames) * TEST_SPLIT)
unsupervised_training_set, unsupervised_test_set = filenames[test_margin:], filenames[:test_margin]

validation_margin = int(len(unsupervised_training_set) * VALIDATION_SPLIT)
unsupervised_training_set, unsupervised_validation_set = unsupervised_training_set[validation_margin:], unsupervised_training_set[:validation_margin]

---

## UNSUPERVISED DATA (DEPRECATED)

### 3D TFRecords database for unsupervised learning

Let´s build the 3D TFRecords database for unsupervised learning. Keep in mind that this code can be reused to create databases for both skull-stripped and non-skull-stripped data, just by modifying the referenced constants. 



In [0]:
train_ul_tfrec = os.path.join(DB_TF_UL_PATH, TFREC_UL_SS_TRAIN)
test_ul_tfrec = os.path.join(DB_TF_UL_PATH, TFREC_UL_SS_TEST)
val_ul_tfrec = os.path.join(DB_TF_UL_PATH, TFREC_UL_SS_VAL)

In [0]:
def unsupervised_load_image_3D(abs_path):
  ''' Load an image (.nii) from its absolute path.
      The associated label will be the image itself.
      This method is used for unsupervised learning.
      
      Parameters:
        abs_path -- Absolute path, filename included
        
      Returns:
        img -- The .nii image, converted into a numpy array        
  '''
   
  # load the image with SimpleITK
  sitk_image = sitk.ReadImage(abs_path)
  # transform into a numpy array
  img = sitk.GetArrayFromImage(sitk_image)
  
  split_path = abs_path.split('/')
  if split_path[-3] != 'UNSUPERVISED':
    split_name = split_path[-1].split('_')
    subject = '_'.join(split_name[1:4])
    preprocessing = '_'.join(split_name[5:-3])
  else:
    # for those images that come from the IXI dataset
    subject = 'IXI'
    preprocessing = 'IXI'
  
  return img, subject, preprocessing

In [0]:
def create_unsupervised_tf_record(img_filenames, tf_rec_filename):
  ''' Create a TFRecord file, including the information
      of the specified images
      
      Parameters:
        img_filenames -- Array with the path to every
                         image that is going to be included
                         in the TFRecords file.
        tf_rec_filename -- Name of the TFRecords file.
  '''
  
  exceptions = []
  
  # open the file
  writer = tf.python_io.TFRecordWriter(tf_rec_filename)
  
  # iterate through all .nii files
  for meta_data in img_filenames:
    
    try:
      # load the image and label
      img, subject, preprocessing = unsupervised_load_image_3D(meta_data)
    except:
      exceptions.append(meta_data)

    # create a feature
    feature = {'subject': _bytes_feature(subject),
               'preprocessing': _bytes_feature(preprocessing),
               'image': _float_feature(img.ravel())}

    # create an example protocol buffer
    example = tf.train.Example(features=tf.train.Features(feature=feature))

    # serialize to string and write on the file
    writer.write(example.SerializeToString())
    
  writer.close()
  return exceptions

In [0]:
exceptions = create_unsupervised_tf_record(unsupervised_training_set, train_ul_tfrec)
exceptions.append(create_unsupervised_tf_record(unsupervised_test_set, test_ul_tfrec))
exceptions.append(create_unsupervised_tf_record(unsupervised_validation_set, val_ul_tfrec))

---

## PET IMAGES (DEPRECATED)

In [0]:
PET_DB_PATH = '/content/gdrive/My Drive/Education/Master/MIA/TFM/Data/ADNI/PET/'
CLASS_SUBFOLDERS = ['MCI/', 'AD/', 'CN/']

DB_TF_PET_PATH = '/content/gdrive/My Drive/Education/Master/MIA/TFM/Data/TFRecords/PET/TFRecords2D/'
# files
TFREC_2D_PET_TRAIN = 'train.pet.2D.tfrecords'
TFREC_2D_PET_TEST = 'test.pet.2D.tfrecords'
TFREC_2D_PET_VAL = 'validation.pet.2D.tfrecords'

In [0]:
LABELS = {'CN': 0, 'MCI': 1, 'AD': 2}
PET_IMG_SHAPE = (69, 95, 79)
PET_2D_SHAPE = (PET_IMG_SHAPE[1] * 4, PET_IMG_SHAPE[2] * 4)

In [0]:
TEST_SPLIT = 0.15
VALIDATION_SPLIT = 0.15

### Train/Test split

In [0]:
# array where we are going to save the filenames
filenames = np.array([])

# iterate all three class folders in the db
for subf in CLASS_SUBFOLDERS:
  path = PET_DB_PATH + subf
  for name in os.listdir(path):
    complete_name = os.path.join(path, name)
    if os.path.isfile(complete_name):
      filenames = np.concatenate((filenames, complete_name), axis=None)

In [0]:
for i in range(1000):
  np.random.shuffle(filenames)
  
test_margin = int(len(filenames) * TEST_SPLIT)
training_set, test_set = filenames[test_margin:], filenames[:test_margin]

validation_margin = int(len(training_set) * VALIDATION_SPLIT)
training_set, validation_set = training_set[validation_margin:], training_set[:validation_margin]

### 2D transformation

In [0]:
def pet_to_2D(img):
  ''' Transform a 3D PET image into a 2D image, by obtaining 9 slices 
      and placing them in a 4x4 two-dimensional grid.
      
      All 16 cuts are from a horizontal/axial view. They are selected
      from the ?th to the ?th level of the original 3D image.
      
      Parameters:
        img -- np.ndarray with the 3D image
        
      Returns:
        np.ndarray -- The resulting 2D image
  '''
  
  # create the final 2D image 
  image_2D = np.empty(PET_2D_SHAPE)
  
  # set the limits and the step
  TOP = 50
  BOTTOM = 20
  STEP = 2
  N_CUTS = 16
  
  # iterator for the cuts
  cut_it = TOP
  # iterator for the rows of the 2D final image
  row_it = 0
  # iterator for the columns of the 2D final image
  col_it = 0
  
  for cutting_time in range(N_CUTS):
    
    # cut
    cut = img[cut_it, :, :]
    cut_it -= STEP
    
    # reset the row iterator and move the
    # col iterator when needed
    if cutting_time in [4, 8, 12]:
      row_it = 0
      col_it += cut.shape[1]
    
    # copy the cut to the 2D image
    for i in range(cut.shape[0]):
      for j in range(cut.shape[1]):
        image_2D[i + row_it, j + col_it] = cut[i, j]
    row_it += cut.shape[0]
  
  # return the final 2D image
  return np.repeat(image_2D[None, ...], 3, axis=0).T
  #return image_2D

In [0]:
def load_PET_2D(abs_path):
  ''' Load an image (.nii) and its label, from its absolute path.
      Transform it into a 2D image, by obtaining 16 slices and placing them
      in a 4x4 two-dimensional grid.
      
      Parameters:
        abs_path -- Absolute path, filename included
        
      Returns:
        img -- The .nii image, converted into a numpy array
        label -- The label of the image (CN/MCI/AD = 0/1/2)
        
  '''
  
  # obtain the label from the path (it is the last directory name)
  label = LABELS[abs_path.split('/')[-2]]
  
  # load the image with SimpleITK
  sitk_image = sitk.ReadImage(abs_path)
  
  # transform into a numpy array
  img = sitk.GetArrayFromImage(sitk_image)
  
  # apply whitening
  img = preprocessing.whitening(img)
  
  # make the 2D image
  img = pet_to_2D(img)
  
  return img, label

### TFRecords creation

In [0]:
train_tfrecPET = os.path.join(DB_TF_PET_PATH, TFREC_2D_PET_TRAIN)
test_tfrecPET = os.path.join(DB_TF_PET_PATH, TFREC_2D_PET_TEST)
val_tfrecPET = os.path.join(DB_TF_PET_PATH, TFREC_2D_PET_VAL)

In [0]:
def tf_record_PET_2D(img_filenames, tf_rec_filename):
  ''' Create a TFRecord file, including the information
      of the specified images, after converting them into 
      a 2D grid.
      
      Parameters:
        img_filenames -- Array with the path to every
                         image that is going to be included
                         in the TFRecords file.
        tf_rec_filename -- Name of the TFRecords file.
  '''
  
  # open the file
  writer = tf.python_io.TFRecordWriter(tf_rec_filename)
  
  # iterate through all .nii files
  for meta_data in img_filenames:

    # load the image and label
    img, label = load_image_2D(meta_data)

    # create a feature
    feature = {'label': _int64_feature(label),
               'image': _float_feature(img.ravel())}

    # create an example protocol buffer
    example = tf.train.Example(features=tf.train.Features(feature=feature))

    # serialize to string and write on the file
    writer.write(example.SerializeToString())
    
  writer.close()

In [0]:
tf_record_PET_2D(training_set, train_tfrecPET)
tf_record_PET_2D(test_set, test_tfrecPET)
tf_record_PET_2D(validation_set, val_tfrecPET)