# Cloud and sea tile classifier

It has been identified that ESRGAN in both pretrain PSNR mode and GAN mode struggles with super-resoluting satellite image tiles completely covered by either sea surface or opaque clouds. In addition, both areas of interest, the harbors of Toulon and La Spezia, has lots of sea surface (approaching 50%). Left without intervention around 50% of tiles will only consist of sea and opaque clouds. It is assumed that this leads to an unwanted imbalance in what we want the model to be optimized to perform on.

There are several ways to mitigate this imbalance. One way would be to manually draw a sea surface polygon in a GIS software and undersample tiles extracted from within this polygon. A downside of this approach is that interesting features (ships) within the sea surface polygon would also be undersampled.

Another approach would be to train a cloud and sea tile classifier to detect the the unwanted tiles and discard all or a significant proportion of these tiles before training. This approach has the benefit of addressing the problem head on. The main downside of the approach is that it might be time-consuming to label tiles. However it is hypothesized that relatively little training data is needed to train a modern neural net classifier on such a *simple* classification task.

In [None]:
import pickle
import geopandas
import pandas as pd

from modules.tile_generator import *

In [None]:
with open('metadata_df.pickle', 'rb') as file:
    meta = pickle.load(file)
# Path to location where individual satellite images are located
DATA_PATH = 'data/toulon-laspezia/'
DATA_PATH_TILES = 'data/toulon-laspezia-cloud-sea-classifier/'

SENSORS = ['WV02', 'GE01', 'WV03_VNIR']
AREAS = ['La_Spezia', 'Toulon']
meta = meta.loc[meta['sensorVehicle'].isin(SENSORS)]
meta = meta.loc[meta['area_name'].isin(AREAS)]

N_IMAGES = len(meta.index)

#96x96, 128x128, 196x196, 384x384 -- All tiles are squares
TILE_SIZES = [96, 128, 196, 384]
# number of tiles to generate at each tile size
N_TILES = {96: 500, 128: 500, 196: 500, 384: 500}
N_TILES_TOTAL = sum(N_TILES.values())
#meta

In [None]:
meta = allocate_tiles(meta, by_partition=False, n_tiles_total=N_TILES[96], new_column_name='n_tiles_96')
meta = allocate_tiles(meta, by_partition=False, n_tiles_total=N_TILES[128], new_column_name='n_tiles_128')
meta = allocate_tiles(meta, by_partition=False, n_tiles_total=N_TILES[196], new_column_name='n_tiles_196')
meta = allocate_tiles(meta, by_partition=False, n_tiles_total=N_TILES[384], new_column_name='n_tiles_384')
meta