<h2> Notebook Contents </h2>

This is a little notebook with code for implementing a `torch` DataLoader when data is made of multiple files, each with its own number of targets. Of course the size of data is too much to be all loaded in a Notebook and that's why we need to load only a batch of it. 

The inspiration for the notebook came from [this](https://www.kaggle.com/c/indoor-location-navigation) beautiful challenge and the problem of dealing with more than 55Gb of data, as you can see [here](https://www.kaggle.com/c/indoor-location-navigation/data).

<div id="toc_container" style="background: #f9f9f9; border: 1px solid #aaa; display: table; font-size: 95%;
                               margin-bottom: 1em; padding: 20px; width: auto;">
<p class="toc_title" style="font-weight: 700; text-align: center">Notebook Contents</p>
<ul class="toc_list">
  <li><a href="#background">0. Some Background</a>
  <li><a href="#dataloader">1. DataLoader from Multiple Files (in progress)</a>
  <li><a href="#example">2. Example on Floor Images</a></li>
    <ul>
        <li><a href="#image_info">2.0. Info on Images size</a></li>
        <li><a href="#example_images">2.1. Example Images</a></li>
    </ul>
  <li><a href="#unet">3. Unet Implementation on Floor Images (in progress)</a></li>
    <ul>
        <li><a href="#model_train">3.0. Model Train</a></li>
        <li><a href="#unet_predictions">3.1. Unet Predictions</a></li>
    </ul>
</ul>
</div>

<h3> Props </h3>

**All my work** was born out of [this](https://medium.com/speechmatics/how-to-build-a-streaming-dataloader-with-pytorch-a66dd891d9dd) wonderful article by David MacLeod, please clap it! 

Props also to [this](https://amaarora.github.io/2020/09/13/unet.html) for the Unet implementation, clearly explained.

Let me thank also my friend [Brasnold](https://www.kaggle.com/brasnold) who's always full of bright machine learning ideas.

<h6> Edit </h6>

I'll probably restrain this notebook just to the unet application on floor images and then create another one with a deep dive on Torch datasets and dataloaders.

<a id = "background"><a>
<h4> 0. Some Background </h4>
    
In this notebook I'll use some of the pillar classes of pytorch, in particular:
    
    - Dataset
    - IterableDataset
    - DataLoader
    
I think that `torch` documentation is all you need to read: have a look [here](https://pytorch.org/docs/stable/data.html) to quickly understand what all of the above are. 
    
Other than the article mentioned in the Props section I suggest also [this](https://medium.com/swlh/how-to-use-pytorch-dataloaders-to-work-with-enormously-large-text-files-bbd672e955a0) one which stresses how we need different classes to deal with crazy large files. 

<a id = "dataloader"></a>
<h4> 1. DataLoader from Multiple Files </h4>

Let's get into it quick. 

To be done in general

<a id = "example"></a>

<h4> 2. Example on Floor Images from the Indoor Location and Navigation Kaggle Challenge </h4>

What I'll show you here is how you can use the DataLoader to load just a batch of images. 

I'll use the floor images of the Indoor Location and Navigation Kaggle Challenge.

<img src = "https://i.imgur.com/TSiP6rA.png" width = "30%"></img>

Each site/building has its own floors and each floor has some files linked to it, including the floor image.

In [None]:
import numpy as np
import pandas as pd
pd.options.display.max_columns = 50
pd.options.display.max_colwidth  = 200
import os
import multiprocessing
from multiprocessing import Pool
import time

import colorama
from colorama import Fore, Back, Style

y_ = Fore.YELLOW
r_ = Fore.RED
g_ = Fore.GREEN
b_ = Fore.BLUE
m_ = Fore.MAGENTA
c_ = Fore.CYAN
sr_ = Style.RESET_ALL

import torch
from itertools import chain, islice, cycle
from torch.utils.data import Dataset, IterableDataset, DataLoader

def get_device():
    if torch.cuda.is_available():
        device = 'cuda:0'
    else:
        device = 'cpu'
    return device

DEVICE = get_device()

models_path = os.path.join(os.getcwd(), "models")

import json
import re
from sys import getsizeof
import matplotlib
import matplotlib.pyplot as plt
plt.rcParams.update({'figure.max_open_warning': 0})
plt.style.use('fivethirtyeight')
import warnings # Supress warnings 
warnings.filterwarnings('ignore')

import cv2
import glob
import tqdm

root_path = '/kaggle/input/indoor-location-navigation'
metadata_path = '/kaggle/input/indoor-location-navigation/metadata'

BINARY_THRESHOLD = 30
N_CPUS = multiprocessing.cpu_count()
N_CPUS

In [None]:
floor_images_paths = glob.glob(metadata_path+"/*/*/floor_image.png")

N_floors = len(floor_images_paths)
N_floors

train_paths = np.random.choice(floor_images_paths, size = int(0.7*N_floors), replace = False)
N_train = len(train_paths)
val_paths = np.random.choice(list(set(floor_images_paths)-set(train_paths)), size = int(0.5*(N_floors-N_train)), replace = False)
test_paths = list(set(floor_images_paths)-set(val_paths)-set(train_paths))

print(N_train, len(val_paths), len(test_paths))

def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in range(0, len(lst), n):
        yield lst[i:i + n]
        
def read_image(image):
    return plt.imread(image).shape

def read_info(info):
    return pd.read_json(info).transpose()

shapes = []

with Pool(N_CPUS) as p:
    n_chunk = 0
    for chunk in tqdm.tqdm(chunks(floor_images_paths, int(len(floor_images_paths)/N_CPUS))):
        shapes += p.map(read_image, chunk)
        n_chunk +=1

<a id = "image_info">

In [None]:
df_sizes = pd.DataFrame(shapes, columns = ['height', 'width', 'channels'])
print("{} a sample of images sizes:".format(b_))
display(df_sizes.sample(3))
MAX_HEIGHT = int(df_sizes.height.max())
MAX_WIDTH = int(df_sizes.width.max())
MAX_CHANNELS = int(df_sizes.channels.max())
print("{}Each image has its own height. All images have 800 width and 4 channels (png)".format(b_))
print("{}Number of floors images: {}".format(b_,len(df_sizes)))

<a id = "example_images"></a>
<h6> Let's see some images </h6>

In [None]:
random_images = np.random.choice(np.arange(len(floor_images_paths)), 6).tolist()

def rgb2gray(rgb):
    return np.dot(rgb[...,:3], [0.2989, 0.5870, 0.1140])

fig, axes = plt.subplots(3, 2, figsize = (40, 40))
ax = axes.ravel()

heights = []
widths = []

for j, image_path in enumerate(random_images):
    
    floor_image = plt.imread(floor_images_paths[image_path])
    site, path = floor_images_paths[image_path].split("/")[-3:-1]
    shape = floor_image.shape
    heights.append(shape[0])
    widths.append(shape[1])
    
    floor_gray = rgb2gray(floor_image[:, :, :3])

    ax[j].imshow(floor_image)
    ax[j].set_title('-'.join([site, path, str(floor_image.shape[0]), str(floor_image.shape[1])]), fontdict={'fontsize': 15, 'fontweight': 'medium'})
    
fig.suptitle("Some Floors images", fontdict = {'fontsize': 30, 'fontweight': 'medium'})

Each image has its own size. What I'll implement is a DataLoader which reads each image, resizes it such that all images have the same shape and yields the corresping tensor.

<h5> Custom Dataset </h5>

In [None]:
import torch
from itertools import chain, islice, cycle
from torch.utils.data import Dataset, IterableDataset, DataLoader

class MultiFilesDataset(Dataset):
    
    def __init__(self, data_list, proper_shape = (3336, 800, 4)):
        
        self.data_list = data_list
        self.proper_shape = proper_shape
        
    def read_and_resize(self, image_path, mask = True):
        
        out_arr = np.zeros(self.proper_shape)
        image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
        image = np.expand_dims(cv2.resize(image, (self.proper_shape[0], self.proper_shape[1])), 2)
        
        if mask:
            mask = cv2.threshold(image, BINARY_THRESHOLD, 255, cv2.THRESH_BINARY)[1][:, :]
            mask = np.where(mask==255, 1, mask) 
            return (torch.from_numpy(image).permute(2, 0, 1)), (torch.from_numpy(np.expand_dims(mask, 2)).permute(2, 0, 1))
        else:
            return torch.from_numpy(image).permute(2, 0, 1)
            
    def __len__(self):
        return (len(self.data_list))
    
    def __getitem__(self, idx):
        
        return self.read_and_resize(self.data_list[idx])
    
custom_dataset = MultiFilesDataset(floor_images_paths[:10], proper_shape = (256, 256, 1))
data_loader = DataLoader(custom_dataset, batch_size = 3)

for batch in data_loader:
    print(batch[0].size(), batch[1].size())

<h5> Custom Iterable Dataset </h5>

In [None]:
import torch
from itertools import chain, islice, cycle
from torch.utils.data import Dataset, IterableDataset, DataLoader

class MultiFilesIterableDataset(IterableDataset):
    
    def __init__(self, data_list, proper_shape = (3336, 800, 4)):
        
        self.data_list = data_list
        self.proper_shape = proper_shape
        
    def read_and_resize(self, image_path, mask = True):
        
        out_arr = np.zeros(self.proper_shape)
        image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
        image = np.expand_dims(cv2.resize(image, (self.proper_shape[0], self.proper_shape[1])), 2)
        
        if mask:
            mask = cv2.threshold(image, BINARY_THRESHOLD, 255, cv2.THRESH_BINARY)[1][:, :]
            mask = np.where(mask==255, 1, mask) 
            yield (torch.from_numpy(image).permute(2, 0, 1)), (torch.from_numpy(np.expand_dims(mask, 2)).permute(2, 0, 1))
        else:
            yield torch.from_numpy(image).permute(2, 0, 1)
            
    def get_stream(self, data_list):
        return chain.from_iterable(map(lambda x: self.read_and_resize(x), data_list))
        
    def __iter__(self):
        return self.get_stream(self.data_list)
    
iterable_dataset = MultiFilesIterableDataset(floor_images_paths[:10], proper_shape = (256, 256, 1))
data_loader = DataLoader(iterable_dataset, batch_size = 3)

for batch in data_loader:
    print(batch[0].size(), batch[1].size())

<a id = "unet"></a>
<h5> 3. Unet Implementation on Floor Images </h5>

Unet is a neural network ideal to perform segmentation tasks. *Segmentation* is the task of classifying each pixel in an image.  

Look [here](https://towardsdatascience.com/understanding-semantic-segmentation-with-unet-6be4f42d4b47#:~:text=The%20UNET%20was%20developed%20by,The%20architecture%20contains%20two%20paths.&text=Thus%20it%20is%20an%20end,accept%20image%20of%20any%20size.) for a Unet explanation. 

<img src = "https://www.researchgate.net/profile/Alan-Jackson-2/publication/323597886/figure/fig2/AS:601386504957959@1520393124691/Convolutional-neural-network-CNN-architecture-based-on-UNET-Ronneberger-et-al.png" width = "550px" height = "100px" margin-left="100px"></img>


In this section I'll provide code to train a Unet in segmenting Floor Images.


**Disclaimer:** being this a walkthrough I will create artificial floor masks, by taking the binary thresholded image. 


<a id = "model_def"></a>
<h6> Model definition </h6>

In [None]:
from torch import nn
import torchvision

class Block(nn.Module):
    def __init__(self, in_ch, out_ch, ks = 3):
        super().__init__()
        self.conv1 = nn.Conv2d(in_ch, out_ch, ks, padding = 1)
        self.relu  = nn.ReLU()
        self.conv2 = nn.Conv2d(out_ch, out_ch, ks, padding = 1)
    
    def forward(self, x):
        return self.relu(self.conv2(self.relu(self.conv1(x))))

class Encoder(nn.Module):
    def __init__(self, chs=(3, 64, 128, 256, 512, 1024)):
        super().__init__()
        self.enc_blocks = nn.ModuleList([Block(chs[i], chs[i+1]) for i in range(len(chs)-1)])
        self.pool       = nn.MaxPool2d(2)
    
    def forward(self, x):
        ftrs = []
        for block in self.enc_blocks:
            x = block(x)
            ftrs.append(x)
            x = self.pool(x)
        return ftrs

class Decoder(nn.Module):
    def __init__(self, chs=(1024, 512, 256, 128, 64)):
        super().__init__()
        self.chs         = chs
        self.upconvs    = nn.ModuleList([nn.ConvTranspose2d(chs[i], chs[i+1], 2, 2) for i in range(len(chs)-1)])
        self.dec_blocks = nn.ModuleList([Block(chs[i], chs[i+1]) for i in range(len(chs)-1)]) 
        
    def forward(self, x, encoder_features):
        for i in range(len(self.chs)-1):
            x        = self.upconvs[i](x)
            enc_ftrs = self.crop(encoder_features[i], x)
            x        = torch.cat([x, enc_ftrs], dim=1)
            x        = self.dec_blocks[i](x)
        return x
    
    def crop(self, enc_ftrs, x):
        _, _, H, W = x.shape
        enc_ftrs   = torchvision.transforms.CenterCrop([H, W])(enc_ftrs)
        return enc_ftrs    
    
class UNet(nn.Module):
    def __init__(self, enc_chs=(3, 64, 128, 256, 512, 1024), dec_chs=(1024, 512, 256, 128, 64), num_class=1, retain_dim=True, out_sz=(256,256)):
        super().__init__()
        self.encoder     = Encoder(enc_chs)
        self.decoder     = Decoder(dec_chs)
        self.head        = nn.Conv2d(dec_chs[-1], num_class, 1)
        self.retain_dim  = retain_dim
        self.out_sz = out_sz

    def forward(self, x):
        enc_ftrs = self.encoder(x)
        out      = self.decoder(enc_ftrs[::-1][0], enc_ftrs[::-1][1:])
        out      = self.head(out)

        return out
 

In [None]:
PROPER_SHAPE = (256, 256, 1)
model = UNet(enc_chs = (1, 16, 32, 64, 128, 256), dec_chs = (256, 128, 64, 32, 16), out_sz = PROPER_SHAPE[:2])
#Loss function
criterion = nn.BCELoss()
#Optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
model.to(DEVICE)

<a id ="model_train"></a>
<h6> Model Train</h6>

In [None]:
SKIP_TRAIN = True #change to False if you wish to train the network
EARLY_STOPPING_STEPS = 5
N_EPOCHS = 20
if SKIP_TRAIN:
    N_EPOCHS = 0
BATCH_SIZE = 32

train_dataset = MultiFilesDataset(train_paths, proper_shape = PROPER_SHAPE)
val_dataset = MultiFilesDataset(val_paths, proper_shape = PROPER_SHAPE)
test_dataset = MultiFilesDataset(test_paths, proper_shape = PROPER_SHAPE)

train_dataloader = DataLoader(train_dataset, batch_size = BATCH_SIZE)
val_dataloader = DataLoader(val_dataset, batch_size = len(val_paths))
test_dataloader = DataLoader(test_dataset, batch_size = len(test_paths))

In [None]:
early_stopping_steps = EARLY_STOPPING_STEPS
early_step = 0
best_loss = np.inf

for epoch in range(1, N_EPOCHS+1):
    # monitor training loss
    train_loss = 0.0
    model.train()
    #Training
    counter = 1
    for data in tqdm.tqdm(train_dataloader):
        images, mask = data
        images = images.to(DEVICE, dtype=torch.float32)
        mask = mask.to(DEVICE, dtype=torch.float32)
        optimizer.zero_grad()
        outputs = model(images)
        if criterion.__class__.__name__ == 'MSELoss':
            loss = criterion(outputs, mask)
        else:
            loss = criterion(F.sigmoid(outputs), mask)
        loss.backward()
        optimizer.step()
        train_loss += loss.item()*images.size(0)
        counter+=1
    
    train_loss = train_loss/counter
    for val_data in val_dataloader:
        model.eval()
        images, mask = val_data
        images = images.to(DEVICE, dtype=torch.float32)
        mask = mask.to(DEVICE, dtype=torch.float32)
        outputs = model(images)
        if criterion.__class__.__name__ == 'MSELoss':
            val_loss = criterion(outputs, mask)
        else:
            val_loss = criterion(F.sigmoid(outputs), mask)
        print('Epoch: {} \tValidation Loss: {:.6f}'.format(epoch, val_loss))
    if val_loss < best_loss:          
        best_loss = val_loss

        torch.save(model.state_dict(), os.path.join(models_path, 'unet_grayscale_{}_{}_{}'.format(BINARY_THRESHOLD, PROPER_SHAPE, PROPER_SHAPE)))


    elif(EARLY_STOP == True):

        early_step += 1
        if (early_step >= early_stopping_steps):
            break

        
        
    print('Epoch: {} \tTraining Loss: {:.6f}'.format(epoch, train_loss))

In [None]:
#LOADING TRAINED MODEL
trained_model = UNet(enc_chs = (1, 16, 32, 64, 128, 256), dec_chs = (256, 128, 64, 32, 16), out_sz = PROPER_SHAPE[:2])
trained_model.load_state_dict(torch.load("../input/unet-trained/unet_grayscale_30"))
trained_model.to(DEVICE)

<a id = "unet_predictions"></a>
<h6>Unet Predictions </h6>

In [None]:
predictions_example_paths = np.random.choice(test_paths, size =3, replace = False)

fig, axes = plt.subplots(3, 3, figsize = (20, 12))
ax = axes.ravel()


for j, path in enumerate(predictions_example_paths): 
    image, mask = test_dataset.read_and_resize(path)
    image_color = plt.imread(path)
    
    ax[3*j].imshow(image_color)
    ax[3*j].set_title('original floor image', fontdict={'fontsize': 10, 'fontweight': 'medium'})
    ax[3*j+1].imshow(mask[0, :, :], cmap = 'gray')
    ax[3*j+1].set_title('binary thresholded image', fontdict={'fontsize': 10, 'fontweight': 'medium'})
    model.eval()
    out = trained_model(torch.from_numpy(np.expand_dims(image, 0)).to(DEVICE, dtype=torch.float32)).detach().numpy()
    ax[3*j+2].imshow(out[0, 0, :, :], cmap = 'gray')
    ax[3*j+2].set_title('unet prediction', fontdict={'fontsize': 10, 'fontweight': 'medium'})

I hope you found the notebook useful! I will tidy it up in the next days!