Created: 2020.07.24

Modified: 2020.08.05

# Consequtive steps in data analysis - short guide

## 0. Copy data from /data-nas/brains to Titan

**Bash script:** 0.1_copy_data_to_Titan.sh

The following databases are considered:
1. ADNI - (7 921 examinations)
1. AIBL - (726)
1. IXI  - (581)
1. PPMI - (752)
1. SALD - (493)
1. SLIM - (1 036)
    - In total: 11 509 exams * 3 = 34 527 3D images

1. HCP-WU-Minn - (1 782)
    

Fo all images in a FSL_outputs folder only three sets of images with anisotropic voxel size were copied to a local folder **/data-10tb/marek/**:

- T1.nii.gz
- T1_biascorr.nii.gz
- T1_biascorr_mask.nii.gz



## 1.0 Info about 3D anisotropic images

**Notebooks:**
1. 1.0_ADNI_FSL_aniso_info.ipynb
1. 1.0_AIBL_FSL_aniso_info.ipynb
1. 1.0_IXI_FSL_aniso_info.ipynb
1. 1.0_PPMI_FSL_aniso_info.ipynb
1. 1.0_SALD_FSL_aniso_info.ipynb
1. 1.0_SLIM_FSL_aniso_info.ipynb
1. 1.0_HCP-WU-Minn_FSL_aniso_info.ipynb

The goal of this part is to check the diversity of voxes size, dimension and brightness range for individual images in every database and ten compare to each other. For each image type {T1, T1_biascorr, T1_biascorr_mask} two tables are created:
1. subject info (based on image path):
    - Examination,
    - Subject_id,
    - Image_id,
    - Modality,
    - Image,
    - Path
1. image info (based on metadata and image itself):
    - Examination,
    - Image,
    - Max (brightness),
    - Mean (brightness),
    - Min (brightness),
    - Dtype,
    - Size_1 (voxel count in dim 1),
    - Size_2 ( ---||--- in dim 2),
    - Size_3 ( ---||--- in dim 3),
    - Pixdim_1 (voxesl size in dim 1),
    - Pixdim_2 (voxesl size in dim 2),
    - Pixdim_3 (voxesl size in dim 3).
    
    

All those information is stored in an Excel files with 6 tabs {t1, t1_biascorr, t1_mask, t1_im, t1_bias_im, t1_mask_im} three for path and name info {t1,t1_biascorr,t1_mask}  and three for image info {t1_im, t1_bias_im, t1_mask_im}. Each database has its separate Excel file(**folder 1.0_results**).

## 1.1 A naive statistical inspection of images from databases (max, mean values)

**Notebook:** 1.1_ALL_aniso_compare.ipynb

# 2 Convertion to isotropic voxel size and splitting 3D images to 2D slices

## 2.1 Image convertion from anisotropic to isotropic voxel size (c3d soft)

**Bash script:** 2.1_c3d_convert_to_iso.sh


**Result folder:** 2.1_resample_jobs


Within a bash script the list of jobs is created, then those jobs are run in parallel from local compuer.


Isotropic images are stored in a folder **/data-10tb/shared/skull/train-3d-iso/** with subfolders:

In [2]:
! tree -d -L 1 /data-10tb/shared/skull/train-3d-iso/

[01;34m/data-10tb/shared/skull/train-3d-iso/[00m
├── [01;34mADNI[00m
├── [01;34mAIBL[00m
├── [01;34mHCP[00m
├── [01;34mIXI[00m
├── [01;34mPPMI[00m
├── [01;34mSALD[00m
└── [01;34mSLIM[00m

7 directories


1. ADNI (23 762 files)
1. AIBL (2 173 files)
1. IXI (1 743 files)
1. PPMI (2 250 files)
1. SALD (1472 files)
1. SLIM (3 103 files)
1. HCP (5 344 files)

### 2.1.1 <s>Mask binarization with c3d software<s>


**Bash script: 2.1.1_c3d_maks_binarization**

Voxel brightness interpolation during resampling mask images results in voxel brightness between 0 and 1. To make an image binary, a built in function from c3d piece of software with changing data type from float to short:

<center>c3d in.nii.gz -binarize -type short -o out.nii.gz</center>


#### Remarks:

1. Output images has two walues: 0 (background) and 2 (brain region). Those values are not acceptable by *fastai* library and it is corrected to 0 and 1 during splitting 3D image to set of 2D slices.

1.  I am not sure about used interpolation algorithm during reslampling algorithm - linear or bicubic? There is option to force bicubic interpolation, but I haven't applied it. TO DO!?

1. Some voxels on the brain mask ede are interpolated, then binarized (th>0) and clasified as a brain. It could lead to increase a region of the mask. Next time apply th=0.5? TO DO!?

1. Maybe mask should be done for isotropic images? It is done for anisotropic images now. Would it give a better result? TO CHECK!?

### 2.1.2 Rename of existing mask images

All existing images were prepared with default Linear interpolation, then those images were binarized. To store them on the disck, we decided to ad a *_lin* suffix to existed filenames (e.g. /IXI256-HH-1723-T1.anat_T1_biascorr_brain_mask_iso.nii.gz ---->>>IXI256-HH-1723-T1.anat_T1_biascorr_brain_mask_iso_lin.nii.gz)

### 2.1.3 Nearest neighbour interpolation of mask images

**Input images:** anisotropic mask images (e.g./data-10tb/marek/IXI/train_data/FSL_outputs/IXI002-Guys-0828-T1.anat/T1_biascorr_brain_mask.nii.gz)

**Ouput images:** isotropic, NN interpolation, names ends with \*\_nn suffix.nii.gz (e.g.  /data-10tb/shared/skull/train-3d-iso/IXI/T1_biascorr_brain_mask_nn.nii.gz)

For each data set {IXI, AIBL, ADNI, PPMI, SALD, SLIM, HCP} a file with jobs to do in parallel {DATA}\_resample_NN_jobx.txt is created and run from local computer (dell PRECISION-7540), then those txt filea are copied to the **2.1_resample_jobs** folder.

### 2.1.4 3D isotropic images info

**Notebook:** 2.1.4_get_3D_iso_resolution.ipynb

Get image shape information about 3D iso images in each brain database. The csv file for each database is created with the following information (folder **2.1_resample_jobs**):

1. file name
1. shape
1. shape id (number of different shapes in this database)
1. shape cnt (number of images with this shape)
1. max_dims (maximum dimension in each direction from all images in a database).

## 2.2  Split data between train and valid tests (Sathiesh)

#### New set 2020.08.07

**Notebook: 2.2_train_val_mk_iso_table.ipynb**

Sathiesh split data batween training and validation set based on *age balance* of patients. Here an update of Sathiesh's paths to the images to paths on my computer. Additionaly paths to mask and biascorr files were added.

**folder:** 2.2_train_valid_test_sets

**file:** train_val_mk_3d.csv

## 2.3 Split 3D isotropic images into set of 2D png slices

Based on file *train_val_mk_3d.csv* (from pnt 2.2) all images {t1, t1_biascorr, t1_biascorr_mask} are split into 2D slices and save to png files. Image brightness is stored as float (in ragge between 0 and 1) and masks are converted to uint8 with value equalled to 1.

To save png files as uint8 with one channel *imageio* libarary is used for mask:
 <center>io.imsave(path_png, np.where(im3d[:,:,k]>=1, 1, 0).astype(np.uint8))</center>

To save png files as floata (0 to 1) with one channel *matplotlib.pyplot* libarary is used for images:
 <center>plt.imsave(path_png, im3d[:,:,k], cmap='gray')</center>
            
Images are saved in 3 main anatomical planes:
1. axial
1. coronal
1. saggital

To reduce number of files in each folder the output images were saved in subfolders, as follows:

In [61]:
!tree -d -L 1 /data-10tb/shared/skull/

[01;34m/data-10tb/shared/skull/[00m
├── [01;34maxial-2d[00m
├── [01;34mcoronal-2d[00m
├── [01;34msagittal-2d[00m
└── [01;34mtrain-3d-iso[00m

4 directories


In [62]:
!tree -d -L 2 /data-10tb/shared/skull/axial-2d/

[01;34m/data-10tb/shared/skull/axial-2d/[00m
├── [01;34mmodels[00m
├── [01;34mtest[00m
│   ├── [01;34mAIBL[00m
│   └── [01;34mHCP[00m
├── [01;34mtrain[00m
│   ├── [01;34mADNI[00m
│   ├── [01;34mIXI[00m
│   ├── [01;34mPPMI[00m
│   ├── [01;34mSALD[00m
│   └── [01;34mSLIM[00m
└── [01;34mvalid[00m
    ├── [01;34mADNI[00m
    ├── [01;34mIXI[00m
    ├── [01;34mPPMI[00m
    ├── [01;34mSALD[00m
    └── [01;34mSLIM[00m

16 directories


In [63]:
!tree -d -L 1 /data-10tb/shared/skull/axial-2d/train/

[01;34m/data-10tb/shared/skull/axial-2d/train/[00m
├── [01;34mADNI[00m
├── [01;34mIXI[00m
├── [01;34mPPMI[00m
├── [01;34mSALD[00m
└── [01;34mSLIM[00m

5 directories


Thera are  are saved all images {t1, t1_biascorr, t1_biascorrr_mask} in each examinatin subfolder **/data-10tb/shared/skull/axial-2d/train/IXI/IXI002-Guys-0828-T1.anat/**

## 2.4 df for 2D images

Based on df with splited images between *train* and *valid* test sets for 3D isotropic images, there is need to build sucha a df for all 2D png slices in **axial, coronal** and **saggital** crossections. Df's containg paths to needed to crate DataBunch in fastai library. Df's are saved as csv file in the folder **2.4_train_val_3d_path_tables**.

Size of those csv files is bigger than 150 MB, so I couldn't uploat them to a (free) github repositroy, here are the links to files: 

1. [bias_mask_test_val_axial_2d.csv](https://129.177.233.24:8888/lab/tree/2.4_train_val_3d_path_tables/bias_mask-test-val-axial-2d.csv) - paths to t1_biascorr & t1_biascorr_mask images in axial crossection
1. [bias_mask_test_val_cornal_2d.csv](https://129.177.233.24:8888/lab/tree/2.4_train_val_3d_path_tables/bias_mask-test-val-coronal-2d.csv) - -------- || ------ cornal crossection
1. [bias_mask_test_val_saggital_2d.csv](https://129.177.233.24:8888/lab/tree/2.4_train_val_3d_path_tables/bias_mask-test-val-sagittal-2d.csv) -----------||--------sagittal crossection

1. [t1_mask_test_val_axial_2d.csv](https://129.177.233.24:8888/lab/tree/2.4_train_val_3d_path_tables/t1_mask-test-val-axial-2d.csv) - paths to t1 & t1_biascorr_mask images in axial crossection
1. [t1_mask_test_val_coronal_2d.csv](https://129.177.233.24:8888/lab/tree/2.4_train_val_3d_path_tables/t1_mask-test-val-coronal-2d.csv) - ------||------- coronal crossection
1. [t1_mask_test_val_saggital_2d.csv](https://129.177.233.24:8888/lab/tree/2.4_train_val_3d_path_tables/t1_mask-test-val-sagittal-2d.csv) - ---------||-------sagittal crossection


Total number of 2D files in each crossection {t1, t1_biascorr, t1_biascorr_mask} are as follows:

- axial ------> 1 277 298 files
- coronal ----> 1 808 145 files
- sagittal ---> 1 374 648 files
- -------------------------------
- total ------> 4 460 091

# 3 Training a NN with fastai library

Possible training options (to choose):
 
 1. training: pretained = True/False,
 1. axial/sagittal/coronal on t1 OR t1_biascorr,
 1. table
 
 | training | inference |
|-|-|
| axial  | axial|
| coronal | coronal |
| sagittal | sagittal |
| ax + cor + sag | ax OR cor OR sag |

### 3.01 2D axial, t1_biascorr

- resnet34 pre-trained model
- train+val: all 2D axial images ({ADNI + IXI + PPMI + SALD + SLIM} = 425766 png files)
- one epoch time : 24-27 min. (max 1h)
- test sets {aibl, hcp}

### 3.02 2D axial, t1

- resnet34 pre-trained model
- train+val: all 2D axial images ({ADNI + IXI + PPMI + SALD + SLIM} = 425766 png files)
- one epoch time : 24-27 min. (max 1h)
- test sets: {aibl, hcp}

### 3.03 coronal, t1_biascorr

- resnet34 pre-trained model
- train+val: all 2D coronal images ({ADNI + IXI + PPMI + SALD + SLIM} = 425766 png files)
- one epoch time : 34-72 min.
- test sets: {aibl, hcp}


# 4 Test sets preparation: AIBL & HCP and reference functions, data,... (in progress)

Here are mainly function templates for algorithms described in 5.xx (testing) and 3.xx (training) in this section.

# 4.1 Convertion testsets to 2D images

**Notebook:** 4.1_split_3d_train_iso_to_2d_slices.pynb

t1 & t1_biascorr images wers splitted into set of 2D png images.Those images will act as test set for inference of models trained in section 3.xx and results will be saved in coresponding section in 5.xx. The structure of folder tree is in the tables below:

| axial / coronal / sagittal  | rain / valid / test | AIBL / HCP| subfolder | files |
| :-: | :-: | :-: | :-: | :-: |
| axial-2d | test | AIBL | examination | t1_iso_001.png |
|           |      |      | examination | t1_biascorr_iso_001.png |
|           |      | HCP  |  examination | t1_iso_001.png |
|           |      |      |  examination | t1_biascorr_iso_001.png |
| coronal-2d | test | AIBL | examination | t1_iso_001.png |
|           |      |      | examination | t1_biascorr_iso_001.png |
|           |      | HCP  |  examination | t1_iso_001.png |
|           |      |      |  examination | t1_biascorr_iso_001.png |
| sagittal-2d | test | AIBL | examination | t1_iso_001.png |
|           |      |      | examination | t1_biascorr_iso_001.png |
|           |      | HCP  |  examination | t1_iso_001.png |
|           |      |      |  examination | t1_biascorr_iso_001.png |

## 4.2 Get reference image mask size (redundant)

**Notebook:** 4.2_reference_3d_mask_size.ipynb

Counts the number of "white" voxels in reference images (e.g. FSL_outputs). Results are saved in folder **4.0_reference** with approporiate name e.g.:
1. AIBL_FSL_masks.csv
1. HCP_FSL_masks.csv

I've hanged the structure of this part of the algorithm, it seems it is no needed any more :((((

# 5 Results - NN models from 3.xx vs. FSL_outputs (in progres)

To do in 5.xx:
1. inference a model on 2D images,
1. save as 3D nifti image,
1. calculate Dice/Jaccard coeff's and save in csv file.

### 5.01 Inference (model 3.01)

- axial slices,
- training set: {ADNI, IXI, PPMI, SLIM, SALD},
- test sets: {AIBL, HCP}
- output folders:
    - images: /data-10tb/shared/skull/predictions/3.01/{AIBL, HCP}
    - cvs files with Dice coef's (github repo): ~/fastai/5.0/3.01
- works in a row, it is really slow :(
 

# 6 Questions / issues

1. Calculation of one epoch on Titan serwer takes about 24 minutes (up to 1h).  Is it possible to get an account on the second - more powerfull server?
1. Is it OK to store all images in an "open" for all Titan users folder /data-10tb/shared? Or should I move those images to /data-10tb/marek/ with limited accesibility for our team only?
1. I store all my notebook files on "free account" on GitHub. I hope it is OK with a view to e.g. ADNI licence. There are no images, only paths (to them on Titan) and some inforamtion about geometry, brightness...