<a href="https://colab.research.google.com/github/retico/cmepda_medphys/blob/master/L10_code/Lecture10_autoenc.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Convolutional autoencoder for mass segmentation

## Reading data from Google Drive

In [0]:
from google.colab import drive
drive.mount('/content/gdrive')

In [0]:
!unzip -q /content/gdrive/My\ Drive/cmepda_medphys_dataset/IMAGES/Mammography_masses/large_sample_Im_segmented_ref.zip -d /content/


## Overview of the dataset

In [0]:
import os
import PIL

In [0]:
dataset_path = "/content/large_sample_Im_segmented_ref"

We have two kinds of images: *_resized*, i.e. the masses, and *_mass_mask*, the masks. 

In [0]:
!ls /content/large_sample_Im_segmented_ref | head -n 4

In [0]:
PIL.Image.open(os.path.join(dataset_path, "0001p1_1_1_2_resized.pgm"))



In [0]:
PIL.Image.open(os.path.join(dataset_path, "0001p1_1_1_2_mass_mask.pgm"))

## Reading the images in memory

In [0]:
import glob
import math

import numpy as np
import matplotlib.pyplot as plt

from skimage.io import imread

In [0]:
def read_dataset(dataset_path, x_id ="_resized", y_id="_mass_mask"):
    fnames = glob.glob(os.path.join(dataset_path, f"*{x_id}.pgm"  ))
    X = []
    Y = []
    for fname in fnames:
        X.append(imread(fname)[1:,1:,np.newaxis])
        Y.append(imread(fname.replace(x_id, y_id))[1:,1:,np.newaxis])
    return np.array(X, dtype='float32'), np.array(Y, dtype='float32') 

In [0]:
X,Y = read_dataset(dataset_path)


In [0]:
X /= 255
Y /= 255

In [0]:
X.min(), X.max()

In [0]:
X.shape, Y.shape

## Defining the model

We are trying to define a convolutional autoencoder. The following figure is just an example of a possible architecture.
![Convolutional autoencoder](http://indexsmart.mirasmart.com/ISMRM2017/PDFfiles/images/8249/ISMRM2017-008249_Fig1.png)

In [0]:
from keras.layers import Conv2D, Conv2DTranspose, Input
from keras.models import Model, load_model

In [0]:
def make_model(shape=(124,124,1)):
    input_tensor = Input(shape=shape)
    x = Conv2D(32, (5, 5), strides=2, padding='same', activation='relu')(input_tensor)
    x = Conv2D(64, (3,3), strides=2,  padding='same', activation='relu')(x)
    x = Conv2D(128, (3,3), strides=2, padding='same', activation='relu')(x)

    x = Conv2DTranspose(64, (3,3), strides=2,  padding='same', activation='relu')(x)
    x = Conv2DTranspose(32, (3,3), strides=2, padding='same',activation='relu')(x)
    x = Conv2DTranspose(32, (3,3), strides=2, padding='same',activation='relu')(x)
    out = Conv2D(1, (5,5), padding='valid',activation='tanh')(x)
    model = Model(input_tensor, out)
    
    return model

In [0]:
model = make_model()
model.summary()

In [0]:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['MAE'])

In [0]:
model.fit(X,Y, epochs=250)

In [0]:
!cp /content/gdrive/My\ Drive/cmepda_medphys_dataset/autoenc_mammo_mass.h5 /content/

In [0]:
model = keras.models.load_model("/content/autoenc_mammo_mass.h5")

In [0]:
idx=75
xtest = X[idx][np.newaxis,...]
ytest = Y[idx][np.newaxis,...]

plt.figure(figsize=(14,4))
plt.subplot(1,3,3)
plt.imshow(model.predict(xtest).squeeze()>0.3)
plt.subplot(1,3,2)
plt.imshow(ytest.squeeze())
plt.subplot(1,3,1)
plt.imshow(xtest.squeeze())

## Out-of-memory dataset

Quite often we deal with datasets which cannot be fully loaded into memory.

What's the best strategy to achieve high performances when loadind data from disks?

There is no universal answer to this question.

In the following lines we will briefly discuss the usage of keras sequences.

In [0]:
class MassesSequence(keras.utils.Sequence):

        def __init__(self, dataset_path, batch_size, x_id ="_resized", y_id="_mass_mask"):
            self.y = glob.glob(os.path.join(dataset_path, f"*{y_id}.pgm"  ))
            self.x = [fname.replace(y_id, x_id) for fname in self.y]
            
            self.batch_size = batch_size

        def __len__(self):
            return math.ceil(len(self.x) / self.batch_size)

        def _process(self, img_fname):
            img = skimage.io.imread(img_fname)
            
            img = img[1:,1:]
            return img[:,:,np.newaxis]/255
            
        def __getitem__(self, idx):
            batch_x = self.x[idx * self.batch_size:(idx + 1) * self.batch_size]
            
            batch_y = self.y[idx * self.batch_size:(idx + 1) * self.batch_size]
            
            X=[];
            Y=[];
            for fname_x, fname_y in zip(batch_x, batch_y): 
                X.append(self._process(fname_x))
                Y.append(self._process(fname_y))
            
            return np.array(X, dtype='float32'), np.array(Y, dtype='float32')

In [0]:
mass_gen = MassesSequence(dataset_path, 8)

In [0]:
batch = mass_gen.__getitem__(7)
batch[0].shape

In [0]:
plt.imshow(np.squeeze(batch[0][3]))
plt.figure()
plt.imshow(np.squeeze(batch[1][3]))

In [0]:
model = make_model()
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['MAE'])
model.fit_generator(generator=mass_gen, steps_per_epoch=100, epochs=10)