In [3]:
# Get Required Libraries
!pip install matplotlib
!pip install seaborn
!pip install nibabel

Collecting nibabel
  Downloading nibabel-3.0.2-py3-none-any.whl (3.3 MB)
[K     |████████████████████████████████| 3.3 MB 3.3 MB/s eta 0:00:01
Installing collected packages: nibabel
Successfully installed nibabel-3.0.2


# Introduction

While the model we loaded before does a good job. We should take a closer a look at the problem itself. Let's take a histogram of the data and see how many cells on average are part of the lesion and how many are not part of the lesion. 

In [4]:
import numpy as np
import nibabel as nib

In [7]:
dataRoot='/tf/data/scans/'
cohort = 'c0001'
cohortRoot = dataRoot + cohort + "/"
subject = 'c0001s0004t01'
subjectRoot = cohortRoot + subject + "/"

maskFilePath = subjectRoot + subject + "_maskresampled.nii.gz"

In [10]:
mask_nib = nib.load(maskFilePath)

In [12]:
mask_np = mask_nib.get_fdata()

In [17]:
(unique, counts) = np.unique(mask_np, return_counts=True)
frequencies = np.asarray((unique, counts)).T

In [18]:
print(frequencies)

[[0.0000000e+00 1.6775852e+07]
 [2.5500000e+02 1.3640000e+03]]


In [27]:
percent_nonLesion = counts[0] / (counts[0] + counts[1]) * 100
percent_Lesion = counts[1] / (counts[0] + counts[1]) * 100
print("nonLesion Voxels: %.2f %% | Lesion Voxels: %.2f %% " % (percent_nonLesion, percent_Lesion))

nonLesion Voxels: 99.99 % | Lesion Voxels: 0.01 % 


It looks like if our network guessed that a voxel was non-lesion it would be right > 90% of the time. This isn't ideal. In otherwords this is an unbalanced dataset.

How could we improve this? Well, what if instead of accurately outlining the lesion we decided to just detect if a patch of the image was healthy brain tissue or it had unhealthy brain tissue in it. Then we could "highlight" an area of interest for a doctor to take a look at. This idea of detecting outliers falls under a seperate domain of deep learning called Anomoly Detection. Futhermore we could use the highlighted area to build a dataset that contained a more balanced distribution of healthy and lesion tissue. We could then use this technique to agument our dataset and improve training therotectically. 