## Data preparation

To get data for the segmetation step:

 1. Download data from Sentinel-2 using the DataRequest/download_data.py script
 2. Filter out cloudy images using the preprocessing/filtering.py script
 3. Set apart 20% of the data to save it for validation
 4. As not many images are left after this process, use the Augmentor library for data augmentation (this notebook)

Augmentor: https://github.com/mdbloice/Augmentor

In [12]:
import os
import sys
import glob
import random
sys.path.append('../util')
import myaugmentor

In [13]:
IMG_DIR = '../data/tulips/bloom/filtered/'
ext = '.png'

Lets set apart a fraction of the images, which will be used for validation. These images shouldnt be used as part of the base for the augmentations.

In [27]:
root = os.path.abspath(os.path.join(IMG_DIR, os.pardir))
val_dir = os.path.join(root, 'val')
os.makedirs(val_dir, exist_ok=True)

In [28]:
image_list = glob.glob(IMG_DIR + '*' + ext)

In [29]:
val = random.sample(image_list, int(len(image_list)*0.2))

In [30]:
for img in val:
    os.rename(img, os.path.join(val_dir, os.path.basename(img)))

In [31]:
p = myaugmentor.MyPipeline(IMG_DIR, output_directory=os.path.join(root, "train"))

Initialised with 3828 image(s) found.
Output directory set to /home/ANT.AMAZON.COM/jlcont/tulip-fields/data/tulips/bloom/train.

In [32]:
p.ground_truth(os.path.join(root, "masks"))

                                                         

3828 ground truth image(s) found.




Define transformations to be applied to our images. I chose those which I think can be beneficial for satellite image segmentation, in particular those which do not alter the shape of crops (with these ones, the shape can vary, but crops remain polygons with straight sides). 

Details of the transformations here:
https://github.com/mdbloice/Augmentor#main-features

In [33]:
p.skew(probability=0.5, magnitude=0.5)
p.shear(probability=0.3, max_shear_left=15, max_shear_right=15)
p.flip_left_right(probability=0.5)
p.flip_top_bottom(probability=0.5)
p.rotate_random_90(probability=0.75)
p.rotate(probability=0.75, max_left_rotation=20, max_right_rotation=24)

Define the number of images to generate

In [34]:
N = 20000

In [35]:
p.sample(N)

Processing <PIL.Image.Image image mode=RGB size=256x256 at 0x7F42E0793860>: 100%|██████████| 20000/20000 [01:20<00:00, 249.26 Samples/s]                 


Generated images are saved in IMG_DIR/train/