## Introduction

### In this notebook we use for Land Cover Classfication from Satellite Imagery using [DeepGlobe Land Cover Classification Dataset](https://www.kaggle.com/balraj98/deepglobe-land-cover-classification-dataset) using fastai Datablock API

We begin by importing the requisite libraries

In [None]:
from fastai.vision.all import *
from fastai.data.all import *
from pathlib import Path

The structure of this dataset is a bit strange. It has 'train', 'test' and 'valid' datasets but only train dataset has the '.png' masks. I think the idea is to create masks from the '.jpg'images in 'test' and 'valid' folders but for the purpose of this notebook, I am using only the train dataset and splitting it into train valid. The 'train' folder has 1606 images files, 803 of them are '.jpg' and 803 are '.png'. The '.jpg' are the images and '.png' are the masks which classify the pixels into different classes.

In [None]:
path = Path('../input/deepglobe-land-cover-classification-dataset/train')
path.ls()

The class_dict.csv file has the rgb values for each class of the masks. The values are as follows:

In [None]:
df1 = pd.read_csv('../input/deepglobe-land-cover-classification-dataset/class_dict.csv')
codes = df1['name']

codes=array(codes, dtype=str)

df1

FastAI Datablock API has a PILMask.create method which prepares the masks for segmentation. This method opens the 3 channel '.png'image file using 'L' mode of the [PIL module](https://pillow.readthedocs.io/en/4.1.x/reference/Image.html). This mode calculates the luminance using the rgb values, maps the luminance values onto a 1 channel image. The luminance values are calculated in the 'pixel_value' column of the dataframe below. 

In [None]:
df1['pixel_value'] =  round(df1['r'] * 299/1000 + df1['g'] * 587/1000 + df1['b'] * 114/1000,0).astype(int, copy=False)
df1.sort_values(by='pixel_value')

I far as I understand from [This notebook](https://colab.research.google.com/github/muellerzr/Practical-Deep-Learning-for-Coders-2.0/blob/master/Computer%20Vision/07_Binary_Segmentation.ipynb#scrollTo=9SuB5y-DIZuT), FastAI does not work with the non-consecutive values for the segmentation classes. So, These values have to be mapped on to a list of correponding consecutive values. For this, I created a dictionary 'p2d' as follows:

In [None]:
vals = [0,29,105,150,179,226,255]
p2d = dict()
for i, val in enumerate(vals):
    p2d[val] = i
p2d

All the '.jpg' files are collected as the items and the corresponding '.png' files are collected as masks. 

In [None]:
items = partial(get_files, extensions='.jpg')
def masks(o): return path/f'{o.stem[:-4]}_mask.png'

The get_msk function maps the pixel values in tensors of the masks to the cosecutive pixels as defined in the p2d dictionary above. The datablock API can then link the consecutive numerical pixel values to the segmentation class names as shown in the 'names' column of the dataframe df1.

In [None]:
def get_y(clas_dic):
    def get_msk(fn):
        mask = masks(fn)
        mask_img=PILMask.create(mask)
        mask_tensor = tensor(mask_img)
        for i in vals:
            mask_tensor[mask_tensor == i] = clas_dic[i]
        return mask_tensor
    return get_msk

In [None]:
def custom_split(pct):
    def fn(name_list):
        train_x,valid_x = RandomSplitter(valid_pct=0.1)(name_list)
        np.random.shuffle(train_x)
        train_idx = int(len(train_x)*pct)
        train_ = train_x[0:train_idx]
        return train_, valid_x
    return fn

In [None]:
dblock = DataBlock(blocks=(ImageBlock, MaskBlock(codes=codes)),
                    get_items = items,
                    get_y = get_y(p2d),
                    splitter = custom_split(0.5),
                    item_tfms=[Resize(128)],
                    batch_tfms =[*aug_transforms(), Normalize.from_stats(*imagenet_stats)])

In [None]:
dsets = dblock.datasets(path)

In [None]:
len(dsets.train)

In [None]:
dls = dblock.dataloaders(path, bs=4)

In [None]:
dls.show_batch(max_n = 4)

In [None]:
dls.vocab = codes

In [None]:
name2id = {v:k for k,v in enumerate(codes)}
void_code= name2id['unknown']

In [None]:
learn = unet_learner(dls, resnet34)

In [None]:
learn.fit_one_cycle(2, lr_max=3.9e-4)

In [None]:
learn.export(fname = Path("/kaggle/working/export.pkl"))