![image](https://drive.google.com/u/0/uc?id=15DUc09hFGqR8qcpYiN1OajRNaASmiL6d&export=download)

# **Taller No. 13 - ISIS4825**

## **Segmentación Semántica y Arquitecturas de Segmentación**

## **Contenido**
1. [**Objetivos**](#id1)
2. [**Problema**](#id2)
3. [**Importando las librerías necesarias para el laboratorio**](#id3)
4. [**Visualización y Análisis Exploratorio**](#id4)
5. [**Preparación de los Datos**](#id5)
6. [**Modelamiento**](#id6)
    - [**Auntoencoder**](#id6-1)
    - [**U-Net**](#id6-2)
7. [**Validación**](#id7)
8. [**Extracción de Características**](#id8)

## **Objetivos**<a name="id1"></a>
- Familiarizarse con el entrenamiento de bajo-nivel
- Conocer arquitecturas de segmentación tales como el Autoencoder y U-Net
- Conocer PyTorch como alternativa a TensorFlow.
- Visualizar modelos usando TensorBoard
- Conocer más patrones de aprendizaje.
- Extraer características de región

## **Problema**
- En un dataset de tomografías axiales computarizadas (TAC) de abdomen, hay varios pacientes con tumores en sus riñones, la idea es detectarlos utilizando inteligencia artificial.

## **Notebook Configuration**

In [None]:
!shred -u setup_colab.py
!shred -u setup_colab_general.py
!wget -q "https://github.com/jpcano1/python_utils/raw/main/setup_colab_general.py" -O setup_colab_general.py
!wget -q "https://github.com/jpcano1/python_utils/raw/main/ISIS_4825/setup_colab.py" -O setup_colab.py
import setup_colab as setup
setup.setup_workshop_13()

## **Importando las librerías necesarias para el laboratorio**<a name="id3"></a>

In [None]:
# Basic Data Analysis Libraries
import numpy as np
import pandas as pd

# Basic OS Libraries
import copy
import os

# Basic Graphic Functions
import matplotlib.pyplot as plt
plt.style.use("seaborn-deep")
import seaborn as sns

# Util Functions
from utils import general as gen
from utils import torch_utils
from utils import visualization_utils as vis
from utils import train_utils

# Loaders
from tqdm.auto import tqdm

# Data Augmentation Libraries
import albumentations as A

# PyTorch Libraries
import torch
from torch import nn
from torch import optim
from torch.optim.lr_scheduler import ReduceLROnPlateau
from torch.nn import functional as F
from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter('runs/KiTS')
%load_ext tensorboard

# Torchvision Functions
from torchvision.transforms.functional import to_tensor

# Summary Functions
from torchsummary import summary

# Dataset Creation and Splitting Functions
from sklearn.model_selection import ShuffleSplit
from torch.utils.data import Dataset, DataLoader, Subset

# Computer Vision Libraries
import cv2

# Image Processing Libraries
from skimage import measure

# Scikit-Learn Libraries
from sklearn.cluster import KMeans

### **Carga de Datos**

In [None]:
train_dir = gen.create_and_verify(".", "data", "train_data", list_=True)

In [None]:
train_data_dir = gen.read_listdir(train_dir[0])
train_labels_dir = gen.read_listdir(train_dir[1])

## **Visualización y Análisis Exploratorio**<a name="id4"></a>
- En esta ocasión vamos a trabajar con el dataset de Kidney Tumor Segmentation donde vamos a detectar tumores en riñón usando segmentación semántica.
- Escala Hounsfield

![image](http://1.bp.blogspot.com/-apELBiPpN0g/UCa1sYUjT9I/AAAAAAAAABc/BcFCZE_eHbg/s400/4a11f2.jpg)

> Tomado de http://modulotecguana.blogspot.com/2012/08/la-escala-de-hounsfield.html

In [None]:
np.random.seed(1999)
random_sample = np.random.choice(range(len(train_data_dir)), 3)
random_sample

In [None]:
plt.figure(figsize=(9, 9))

index = 1
for i in random_sample:
    path2img = train_data_dir[i]
    path2lab = train_labels_dir[i]
    
    X = np.load(path2img)
    y = np.load(path2lab)[..., 0]
    labeled_X = vis.get_labeled_image(X, y, (0, 1, 0), 
                                      (0, 1, 0), "thick")

    plt.subplot(3, 3, index)
    gen.imshow(X, color=False, cmap="bone")

    plt.subplot(3, 3, index+1)
    gen.imshow(y, color=False)

    plt.subplot(3, 3, index+2)
    gen.imshow(labeled_X)

    index += 3

In [None]:
plt.figure(figsize=(9, 9))

index = 1
for i in random_sample:
    path2img = train_data_dir[i]
    path2lab = train_labels_dir[i]
    
    X = np.load(path2img)
    y = np.load(path2lab)[..., 1]
    labeled_X = vis.get_labeled_image(X, y, (0, 0, 1), 
                                      (0, 0, 1), "thick")

    plt.subplot(3, 3, index)
    gen.imshow(X, color=False, cmap="bone")

    plt.subplot(3, 3, index+1)
    gen.imshow(y, color=False)

    plt.subplot(3, 3, index+2)
    gen.imshow(labeled_X)

    index += 3

## **Preparación de los Datos**<a name="id5"></a>

In [None]:
transform_train = A.Compose([
    A.Resize(128, 128),
    A.RandomBrightness(p=0.8),
    A.RandomGamma(p=0.65),
    A.RandomContrast(p=0.7),
    A.VerticalFlip(p=0.7)
])

# Stage 2
# transform_train = A.Resize(128, 128)

transform_val = A.Resize(128, 128)

In [None]:
class KidneyDataset(Dataset):
    def __init__(self, path2data, transform=None, *args, **kwargs):
        self.data_dir = gen.read_listdir(path2data[0])
        self.labels_dir = gen.read_listdir(path2data[1])

        self.transform = transform

    def __len__(self):
        return len(self.data_dir)

    def __getitem__(self, index):
        path2img = self.data_dir[index]
        path2lab = self.labels_dir[index]

        X = np.load(path2img)
        y = np.load(path2lab)

        if self.transform:
            augmented = self.transform(image=X, mask=y)
            X = augmented["image"]
            y = augmented["mask"]
        X = to_tensor(X)
        y = 255. * to_tensor(y)
        return X, y

In [None]:
kidney_ds1 = KidneyDataset(train_dir, transform=transform_train)
kidney_ds2 = KidneyDataset(train_dir, transform=transform_val)

In [None]:
X, y = kidney_ds1[259]

In [None]:
gen.imshow(X[0], color=False)

In [None]:
gen.imshow(y[1], color=False)

In [None]:
ss_data = ShuffleSplit(n_splits=2, test_size=0.2, random_state=1234)

In [None]:
indices = range(len(kidney_ds1))

In [None]:
for train_index, val_index in ss_data.split(indices):
    pass

In [None]:
train_data = Subset(kidney_ds1, train_index)
val_data = Subset(kidney_ds2, val_index)

In [None]:
train_dl = DataLoader(train_data, batch_size=32, shuffle=True)
val_dl = DataLoader(val_data, batch_size=16)

## **Modelamiento**<a name="id6"></a>

- En este laboratorio vamos a modelar dos arquitecturas neuronales para segmentación semántica.

![image](https://docs.google.com/uc?export=download&id=1XAMvojRsVynBrqwcYLZgwzOnaFvmP2vc)

![image](https://docs.google.com/uc?export=download&id=1p1aDB6jtU9MDs25PiIwoDUjvYezc2llR)

> Tomado de: Atienza, R., 2020. *Advanced Deep Learning With Tensorflow 2 And Keras*. (513 pages) 2nd ed. United Kingdom: Packt Publishing.
### **Auntoencoder**<a name="id6-1"></a>
![image](https://miro.medium.com/max/1000/0*uq2_ZipB9TqI9G_k)

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
pretrained = False

In [None]:
model = torch_utils.Autoencoder(1, 2, 16, 4, bn=True, jump=2)

model = model.to(device)

In [None]:
summary(model, (1, 128, 128))

In [None]:
writer.add_graph(model, torch.rand((2, 1, 128, 128)).to(device))
writer.close()

In [None]:
%tensorboard --logdir=runs

In [None]:
lr = 5e-3

if pretrained:
    weights_dir = gen.create_and_verify(".", "models", "autoencoder.pt")
    model.load_state_dict(torch.load(weights_dir))
    lr = 8e-4
    print("Weights Loaded")

#### **Training**<a name="id6-1-1"></a>

In [None]:
opt = optim.Adam(model.parameters(), lr=lr)
lr_scheduler = ReduceLROnPlateau(opt, mode="min", factor=0.5,
                                 patience=6, verbose=1)

In [None]:
if not os.path.exists("models"):
    os.makedirs("models")

weights_dir= "models/"

args_train = (
    100, train_utils.loss_func, opt, train_dl, val_dl, False,
    lr_scheduler, weights_dir + "weights.pt", device
)

kwargs_train = {
    "metric": train_utils.jaccard,
    "best_loss": 0.1261334,
    "best_acc": 89.61
}

In [None]:
out_model, loss_history, acc_history = train_utils.train(model, *args_train,
                                                         **kwargs_train)

In [None]:
from google.colab import drive
drive.mount('/gdrive')

In [None]:
%cd "/gdrive/My Drive/Datasets Limpios/KiTS/models"
!cp /content/models/weights.pt .
%cd /content

#### **Predicción**<a name="id6-1-2"></a>

In [None]:
weights_dir = gen.create_and_verify(".", "models", "autoencoder.pt")

model.load_state_dict(torch.load(weights_dir))
model = model.eval().to(device)

In [None]:
vis.predict(model, device, val_data, mode="inner", 
            random_state=2020)

In [None]:
vis.predict(model, device, val_data, class_="tumor", mode="thick", 
            random_state=2020)

### **U-Net**<a name="id6-2"></a>
![image](https://www.researchgate.net/profile/Olaf_Ronneberger/publication/276923248/figure/fig4/AS:639578838929408@1529498886425/U-net-architecture-example-for-32x32-pixels-in-the-lowest-resolution-Each-blue-box.png)

In [None]:
model = torch_utils.UNet(1, 2, 16, 5, bn=True, jump=2)

model = model.to(device)

In [None]:
summary(model, (1, 128, 128))

In [None]:
writer.add_graph(model, torch.rand((2, 1, 128, 128)).to(device))
writer.close()

In [None]:
%tensorboard --logdir=runs

In [None]:
lr = 5e-3

if pretrained:
    weights_dir = gen.create_and_verify(".", "models", "unet.pt")
    model.load_state_dict(torch.load(weights_dir))
    lr = 8e-4
    print("Weights Loaded")

#### **Training**<a name="id6-2-1"></a>

In [None]:
opt = optim.Adam(model.parameters(), lr=lr)
lr_scheduler = ReduceLROnPlateau(opt, mode="min", factor=0.5,
                                 patience=6, verbose=1)

In [None]:
if not os.path.exists("models"):
    os.makedirs("models")

weights_dir= "models/"

args_train = (
    100, train_utils.loss_func, opt, train_dl, val_dl, False,
    lr_scheduler, weights_dir + "weights.pt", device
)

kwargs_train = {
    "metric": train_utils.jaccard,
    "best_loss": 0.090871,
    "best_acc": 91.77
}

In [None]:
out_model, loss_history, acc_history = train_utils.train(model, *args_train,
                                                         **kwargs_train)

In [None]:
from google.colab import drive
drive.mount('/gdrive')

In [None]:
%cd "/gdrive/My Drive/Datasets Limpios/KiTS/models"
!cp /content/models/weights.pt .
%cd /content

#### **Predicción**<a name="id6-2-2"></a>

In [None]:
weights_dir = gen.create_and_verify(".", "models", "unet.pt")

model.load_state_dict(torch.load(weights_dir))
model = model.eval().to(device)

In [None]:
vis.predict(model, device, val_data, 
            random_state=2020)

In [None]:
vis.predict(model, device, val_data, class_="tumor", 
            random_state=2020)

## **Validación**<a name="id7"></a>

In [None]:
test_dir = gen.create_and_verify(".", "data", "test_data", list_=True)

In [None]:
test_ds = KidneyDataset(test_dir, transform=transform_val)
test_dl = DataLoader(test_ds, batch_size=16)

### **Autoencoder**<a name="id7-1"></a>

In [None]:
model = torch_utils.Autoencoder(1, 2, 16, 4, bn=True, jump=2)

weights_dir = gen.create_and_verify(".", "models", "autoencoder.pt")

model.load_state_dict(torch.load(weights_dir))
model = model.eval().to(device)

In [None]:
train_utils.evaluate(model, train_utils.loss_func, test_dl, device, 
                     sanity_check=False)

### **U-Net** <a name="id7-2"></a>

In [None]:
model = torch_utils.UNet(1, 2, 16, 5, bn=True, jump=2)

weights_dir = gen.create_and_verify(".", "models", "unet.pt")

model.load_state_dict(torch.load(weights_dir))
model = model.eval().to(device)

In [None]:
train_utils.evaluate(model, train_utils.loss_func, test_dl, device, 
                     sanity_check=False)

## **Extracción de Características**<a name="id8"></a>
- Extraer características, en este caso, nos va a ser muy útil para saber qué hacer con un tumor de riñón.

In [None]:
test_generator = train_utils.SimpleGenerator(test_dir)

In [None]:
kidney_data = np.zeros((len(test_generator), 224, 224), dtype="uint8")
tumor_data = np.zeros((len(test_generator), 224, 224), dtype="uint8")

In [None]:
index = 0
for X, y in tqdm(test_generator):
    kidney_data[index] = X
    tumor_data[index] = y[..., 1]
    index += 1

In [None]:
random_choice = np.random.choice(len(test_generator), 100)

In [None]:
plt.figure(figsize=(20, 20))

index = 1
for i in random_choice:
    plt.subplot(10, 10, index)
    gen.imshow(kidney_data[i], color=False, 
               title=f"Index: {i}", cmap="bone")

    index += 1

In [None]:
plt.figure(figsize=(20, 20))

index = 1
for i in random_choice:
    plt.subplot(10, 10, index)
    gen.imshow(tumor_data[i], color=False, 
               title=f"Index: {i}")

    index += 1

In [None]:
segmentation_data = np.zeros((len(test_generator), 224, 224))

In [None]:
for i in range(kidney_data.shape[0]):
    segmentation_data[i] = kidney_data[i] * tumor_data[i]

In [None]:
data_dict = {
    "Area/Mass": [measure.moments(array, 0)[0, 0] for 
                  array in segmentation_data],
    "Rugosity": [vis.rugosity(array) for array in segmentation_data]
}

In [None]:
data = pd.DataFrame(data_dict)

In [None]:
plt.scatter(x=data["Area/Mass"], y=data["Rugosity"])
plt.xlabel("Area of Tumor")
plt.ylabel("Rugosity Coefficient")
plt.grid(linestyle="--")
plt.show()

In [None]:
kmeans = KMeans(n_clusters=2, n_jobs=-1, random_state=1234)

In [None]:
kmeans = kmeans.fit(data)

In [None]:
centroids = kmeans.predict(data)

In [None]:
plt.scatter(x=data["Area/Mass"][centroids==0], 
            y=data["Rugosity"][centroids==0], 
            label="Don't Remove Kidney")
plt.scatter(x=data["Area/Mass"][centroids==1], 
            y=data["Rugosity"][centroids==1], 
            label="Remove Kidney")
plt.legend(loc="best")
plt.xlabel("Area of Tumor")
plt.ylabel("Rugosity Coefficient")
plt.grid(linestyle="--")
plt.show()

## **Trabajo Asíncrono**
1. Basados en el código que ya está escrito para el autoencoder, variar el parámetro `bn` y comparar resultados con los presentados en clase.
2. Nuevamente, basados en el código que ya está escrito para U-Net variar únicamente el parámetro `jump` y comparar resultados con los presentados en clase. Adicional a la métrica utilizada en clase, averiguar sobre el coeficiente de similitud [*Dice*](https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient), implementarlo y tomar métricas con el mismo.
3. Finalmente, tomar otros dos descriptores de los vistos en clase sobre el set de testing, construir otro modelo de clustering y hacer análisis de resultados obtenidos.