# Training set creation

## Getting started

This notebook will walk you through how we created the training set to train and validate instances of sealnet. From generating a vector database to extracting patches from rasters. To recreate the trining set, you will need Qgis (tested for 2.8), Python 3.6, WV03 raster image catalog and a shapefile with a database for seal occurences.  

## Exporting points 

Before we can extract patches from raster files, we need to export our database of seal points to a csv file. This operation requires the MMQGIS plugin and can be done by opening Qgis with the seal points shape file and selecting the following option: 

<img src="jupyter_notebook_images/export_geometry.png">

If you selected the correct shape file as the input layer, you will be prompted to save a .csv file. Keep the default name and move this .csv file to the root directory of this repository. 

## Extracting patches

Once the output from the geometry export is in the repository root, we are ready to extract patches from the raster files. Run the following cell to extract patches and create the training sets. This process will take around 20 minutes and requires at least 16GB of RAM. 

In [1]:
# point to folder with raster images
raster_dir = '/home/bento/imagery'
# point to shapefile
shape_file = 'seal_points_espg3031.csv'
# specify training labels, separate each class by an '_', classes need to correspond to ones in the shapefile
labels = 'crabeater_crack_emperor_glacier_ice-sheet_marching-emperor_open-water_other_pack-ice_rock_weddell'
# specify labels used for detection
det_classes = 'crabeater_weddell'

# create vanilla training set (spatial bands = 450, 450, 450)
%run create_trainingset.py --det_classes=$det_classes --rasters_dir=$raster_dir --scale_bands='450_450_450' --out_folder='training_set_vanilla' --labels=$labels --shape_file=$shape_file 

# create multiscale training set (spatial bands = 450, 1350, 4000)
%run create_trainingset.py --det_classes=$det_classes --rasters_dir=$raster_dir --scale_bands='450_1350_4000' --out_folder='training_set_multiscale' --labels=$labels --shape_file=$shape_file


Creating training_set_vanilla:

Checking input folder for invalid files:


  Untitled Document is not a valid scene.
  other.qpj is not a valid scene.
  seal_points.shp is not a valid scene.
  other.shx is not a valid scene.
  ae.qpj is not a valid scene.
  ae.shp is not a valid scene.
  seals_wv3.qgs~ is not a valid scene.
  other.prj is not a valid scene.
  ae.dbf is not a valid scene.
  ae.shx is not a valid scene.
  other.dbf is not a valid scene.
  seal_points.prj is not a valid scene.
  seal_points.qpj is not a valid scene.
  seal_points.shx is not a valid scene.
  seal_points.dbf is not a valid scene.
  other.shp is not a valid scene.
  ae.prj is not a valid scene.
  seals_wv3.qgs is not a valid scene.
  WV03_20141120201053_1040010004D3DB00_14NOV20201053-P1BS-500231411140_01_P001_u08rf3031.tif is not an annotated scene.
  WV03_20151120214007_1040010014257C00_15NOV20214007-P1BS-500659007010_01_P001_u08rf3031.tif.aux.xml is not a valid scene.
  WV03_20141125114425_104001000455930

  patch_sizes=patch_sizes, labels=labels)


MemoryError: 


Creating training_set_multiscale:

Checking input folder for invalid files:


  Untitled Document is not a valid scene.
  other.qpj is not a valid scene.
  seal_points.shp is not a valid scene.
  other.shx is not a valid scene.
  ae.qpj is not a valid scene.
  ae.shp is not a valid scene.
  seals_wv3.qgs~ is not a valid scene.
  other.prj is not a valid scene.
  ae.dbf is not a valid scene.
  ae.shx is not a valid scene.
  other.dbf is not a valid scene.
  seal_points.prj is not a valid scene.
  seal_points.qpj is not a valid scene.
  seal_points.shx is not a valid scene.
  seal_points.dbf is not a valid scene.
  other.shp is not a valid scene.
  ae.prj is not a valid scene.
  seals_wv3.qgs is not a valid scene.
  WV03_20141120201053_1040010004D3DB00_14NOV20201053-P1BS-500231411140_01_P001_u08rf3031.tif is not an annotated scene.
  WV03_20151120214007_1040010014257C00_15NOV20214007-P1BS-500659007010_01_P001_u08rf3031.tif.aux.xml is not a valid scene.
  WV03_20141125114425_104001000455

  WV03_20141008171556_104001000281A100_14OCT08171556-P1BS-500258406090_01_P003_u08rf3031.tif.aux.xml is not a valid scene.
  WV03_20151005095617_1040010012382500_15OCT05095617-P1BS-500652418050_01_P001_u08rf3031.tif.aux.xml is not a valid scene.
  WV03_20151024095526_10400100123DE100_15OCT24095526-P1BS-500656020090_01_P001_u08rf3031.tif.aux.xml is not a valid scene.
  WV03_20141030211711_10400100036F5800_14OCT30211711-P1BS-500231417160_01_P001_u08rf3031.tif.aux.xml is not a valid scene.
  WV03_20151024193848_1040010013779B00_15OCT24193848-P1BS-500656046010_01_P001_u08rf3031.tif.aux.xml is not a valid scene.
  WV03_20151021104725_10400100124A0000_15OCT21104725-P1BS-500652455030_01_P001_u08rf3031.tif.aux.xml is not a valid scene.
  WV03_20151009060426_10400100126E8100_15OCT09060426-P1BS-500652410030_01_P001_u08rf3031.tif.aux.xml is not a valid scene.
  WV03_20151015203404_1040010012A43300_15OCT15203404-P1BS-500652486050_01_P001_u08rf3031.tif is not an annotated scene.
  WV03_201410081715

MemoryError: 

## Creating scene bank

One step in evaluating sealnet instances is defining how well the models can identify scenes that contain seals. In order to measure that we need to create a 'scene bank' which stores which scenes count as positives (i.e. has seals) or negatives. To generate scene banks, run the cell bellow: 

In [1]:
# creating seal scene banks
for label in ['crabeater', 'weddell', 'emperor', 'marching-emperor']:
    out_file = "{}_scene_bank.csv".format(label)
    %run create_scene_bank.py --positive_classes=$label --out_file=$out_file


