# Creating tf records
## For the training of a deep learning model in tensorflow we should encode our data into .tfrecord files

### The basic idea of our record writer is to follow the tf.train.Example format presented here https://www.tensorflow.org/tutorials/load_data/tfrecord

### This format is similar to that of a dictionary, with each key being a string name of a particular feature. For example, an image might be {'image': np_array}, where np_array is convereted into a tensor

## Creating a dictionary for record writing

### Lets start with an example of an image and a mask for image segmentation. If you already have an image and mask, change the paths

In [None]:
import os
import numpy as np
from Base_Deeplearning_Code.Plot_And_Scroll_Images.Plot_Scroll_Images import plot_Image_Scroll_Bar_Image
import SimpleITK as sitk

In [1]:
record_path = r'H:\TF_Record_Exports'

### If you have an image and annotation path, lets list them here

In [None]:
file_dictionary_list = []
files = [i for i in os.listdir(nifti_path) if i.startswith('Overall_Data')]
for file in files:
    index = file.split('_')[-1].split('.nii')[0]
    image_path = os.path.join(nifti_path, file)
    annotation_path = os.path.join(nifti_path, 'Overall_mask_Test_y{}.nii.gz'.format(index))
    temp_dict = {'image_path': image_path, 'annotation_path':annotation_path, 'out_name': '{}.tfrecord'.format(index)}
    file_dictionary_list.append(temp_dict)

This should be a list of paths to images and annotations, as well as an out_name for the tf record

In [None]:
example = file_dictionary_list[-1]
example

### We are going to handle everything through a series of image processors, the base class for ImageProcessor is given below

In [None]:
class ImageProcessor(object):
    def pre_process(self, input_features):
        return input_features

    def post_process(self, input_features):
        return input_features

## The pre_process should be any encoding, like normalization
## The post_process should be anything for decoding, and is normally reserved for ensuring the image dimensions go back to what they were previously

### The first thing we need to do is to load the images as nifti files

In [None]:
from Base_Deeplearning_Code.Image_Processors_Module.src.Processors import MakeTFRecordProcessors as Processors

### We want to do two things: First, take a set of paths and load the images. Second, convert those handles into arrays

In [None]:
processors = [
    Processors.LoadNifti(nifti_path_keys=('image_path', 'annotation_path'),  # Loads a file path as a SimpleITK Image
                         out_keys=('image_handle', 'annotation_handle')),
    Processors.SimpleITKImageToArray(nifti_keys=('image_handle', 'annotation_handle'),  # Converts an Image to array
                                      out_keys=('image_array', 'annotation_array'), dtypes=('float32', 'int8'))
]

### Run through each processors

In [None]:
for p in processors:
    example = p.pre_process(example)

### View the keys
Note that we have several keys now, path information, image handles, arrays, as well as spacing information! NumPy arrays do not have spacing information, and so the keys are automatically added when converting a Nifti to Array

In [None]:
example.keys()

In [None]:
plot_Image_Scroll_Bar_Image(example['image_array'])

In [None]:
plot_Image_Scroll_Bar_Image(example['annotation_array'])

### Now that we have our image and annotation, lets perform some pre-processing
### We do not need all of the keys present, and SITK handles are not able to be encoded

In [None]:
normalizing_processors = [
    Processors.DeleteKeys(keys_to_delete=('image_handle', 'annotation_handle')),
    Processors.Threshold_Images(image_keys=('image_array',), lower_bound=-100, upper_bound=200),
    Processors.Box_Images(image_keys=('image_array',), annotation_key='annotation_array',wanted_vals_for_bbox=(1,),
                          bounding_box_expansion=(0, 0, 0), power_val_z=1, power_val_r=512,
                          power_val_c=512, min_images=None, min_rows=None, min_cols=None)
]

In [None]:
for p in normalizing_processors:
    example = p.pre_process(example)

### Lets make this a 2D model generator, so we now need to distribute these images into 2D slices

In [None]:
distribution_processors = [
    Processors.DistributeInTo2DSlices(image_keys=('image_array', 'annotation_array'))
]

In [None]:
for p in distribution_processors:
    example = p.pre_process(example)

In [None]:
plot_Image_Scroll_Bar_Image(example[20]['image_array'])

## Write dictionary as .tfrecord

In [None]:
from Base_Deeplearning_Code.Image_Processors_Module.src.Processors import TFRecordWriter as Writer

In [None]:
record_writer = Writer.RecordWriter(out_path=record_path, file_name_key='out_name', rewrite=True)

In [None]:
record_writer.write_records(example)

## Writing the records in parallel
There is no reason that we should have to go through each of these nifti files individually. Instead, we have a function to run these in parallel for all steps listed above. Just pass along our list of dictionaries

In [None]:
max_records = 2  # Just write two out
Writer.parallel_record_writer(dictionary_list=file_dictionary_list, max_records=max_records,
                              image_processors=processors + normalizing_processors + distribution_processors,
                              recordwriter=record_writer, verbose=False)