# xView3 SAR - Object detection on giga-pixel satellite imagery

## Background

In this notebook, we demonstrate how to train, evaluate, and deploy a custom object detection model for xView3 SAR.

## Setup

#### [Optional] Configure docker 
Below we configure Docker in our Amazon SageMaker environment to increase the shared memory size and specify a root directory located on the instance EBS volume.

In [None]:
%%writefile /home/ec2-user/SageMaker/daemon.json
{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    },
    "default-shm-size": "200G",
    "data-root": "/home/ec2-user/SageMaker/docker"
} 

Run the shell script below to make changes to Docker.

In [None]:
%%bash
sudo service docker stop
mkdir -p /home/ec2-user/SageMaker/docker
sudo rsync -aqxP /var/lib/docker/ /home/ec2-user/SageMaker/docker
sudo mv /var/lib/docker /var/lib/docker.old
sudo mv /home/ec2-user/SageMaker/daemon.json /etc/docker/
sudo service docker start

#### Imports

In [None]:
%reload_ext autoreload
%autoreload 2

In [None]:
import sys
from datetime import datetime
from pathlib import Path

import boto3
import sagemaker
from sagemaker import get_execution_role
from sagemaker.estimator import Estimator
from sagemaker.processing import ProcessingInput, ProcessingOutput, Processor, ScriptProcessor

sys.path.append('tools/')
from docker_utils import build_and_push_docker_image

Define execution role, S3 bucket in account, and SageMaker session.

In [None]:
role = get_execution_role()
region = boto3.Session().region_name
bucket = 'xview3-blog-sagemaker'
sagemaker_session = sagemaker.Session(default_bucket=bucket)
account = sagemaker_session.account_id()
tags =[{'Key': 'project', 'Value': 'xview3-blog'}]

Set the boolean flag `USE_TINY` to run the notebook using a tiny subset of data.

In [None]:
USE_TINY = True

## Dataset Creation with SageMaker Processing
In this section we create the following from the xView3 challenge dataset:
1. a new `train` and `valid`, after merging the train and validation set provided by the challenge. 
2. Detectron2 compatible dataset dicts to be used for training. 


#### Merge & split data labels. 
The xView3 challenge provided detection labels for each scene in `train.csv`, `validation.csv`, and `public.csv`. 
We will merge the `train.csv` and `validation.csv` and create a new `train` and `validation` set for training. The `public` leaderboard set will remain fixed.

#### Create Detectron2 Datasets
Here we create the Detectron2-compatible [dataset dicts](https://detectron2.readthedocs.io/en/latest/tutorials/datasets.html) used for training models in Detectron2. The format of the dataset is a list of dictionaries with each dict containing information for one image with at least the following fields:
- `filename`:str
- `height`:int
- `width`: int
- `image_id`:str or int
- `annotations`: list[dict]

For more information on how to generate the dataset dict, see [Detectron2 docs] (https://detectron2.readthedocs.io/en/latest/tutorials/datasets.html#standard-dataset-dicts).

Our dataset dict is generated from the information provided in the label `csv` files used in the previous section. Depending on whether we train our models with inputs originating from image chips (tiles) or from the full scene we will use one of two functions in the `xview3_d2` pacakage: `create_xview3_chipped_scene_annotations` or `create_xview3_full_scene_annotations`, respectively. 

### Build base image for SM Processing tasks.
For convenience, we build a base processing container which handles package installations. We can build a subsequent image from this base container to include the code we want to run. Here is what the base processing container looks like.

For building and pushing the containers, we use helper function `build_and_push_docker_image` in `tools/docker_utils.py`

In [None]:
!pygmentize -l docker docker/processing/base.Dockerfile

Let's build and push the base processing container.

In [None]:
processing_base_name = 'xview3-processing:base'
base_image = build_and_push_docker_image(processing_base_name, 
                                         dockerfile='docker/processing/base.Dockerfile')

Here's the main processing container, which copies the `.py` scripts in `tools/`. In each processing job, to follow, we can specify which the entrypoint `.py` script to run.

In [None]:
!pygmentize -l docker docker/processing/main.Dockerfile

#### Build and push main processing container.

In [None]:
processing_base_image = f'{account}.dkr.ecr.{region}.amazonaws.com/xview3-processing:base'

In [None]:
processing_main_name = 'xview3-processing:main'
processing_main_image = build_and_push_docker_image(processing_main_name, 
                                                    dockerfile='docker/processing/main.Dockerfile', 
                                                    base_image=base_image)

In [None]:
processing_main_image = f'{account}.dkr.ecr.{region}.amazonaws.com/xview3-processing:main'

#### Launch SageMaker Processing job for dataset preparation. 
The SageMaker Processing task will run `tools/create_xview3_dataset_dict.py`. This script creates a detectron2-compatible dataset dict for full scene imagery or chipped scenes. Optionally, this script will merge train and validation csvs and create a new split. 

Let's see the arguments required this script:

In [None]:
!pygmentize -l python tools/create_xview3_dataset_dict.py

#### Initialize SM Processing job. 
We only need 1 instance for this task.

In [None]:
instance_type = 'ml.t3.xlarge'
volume_size_in_gb = 30 
instance_count = 1
base_job_name = 'xview3-dataset-prep'
                      
dataset_processor = Processor(image_uri=processing_image_name,
                              role=role,
                              instance_count=instance_count,
                              base_job_name=base_job_name,
                              instance_type=instance_type, 
                              volume_size_in_gb=volume_size_in_gb, 
                              entrypoint=['python3', 'create_xview3_dataset_dict.py'],
                              sagemaker_session=sagemaker_session, 
                              tags=tags)

#### Specify inputs and run processing job. 

`tools/create_xview3_dataset_dict.py` has several defaults, which can be overridden by providing the relevant argument in the processor `arugments`.  The cell below will launch a processor job that creates a new data split and creates a dataset dict for full scenes. To create a dataset dict for chipped scenes, change `dataset-type` to `chipped` and provide additional inputs and/or arguments such as `--shoreline_dir`

In [None]:
override = False
current_timestamp = '202207250702'
SEED = 46998886

if override:
    current_timestamp = datetime.now().strftime("%Y%m%d%M%S")


In [None]:
input_labels = ProcessingInput(source='data/labels/', 
                               destination='/opt/ml/processing/input/labels',
                              input_name='labels')
input_stats = ProcessingInput(source='data/scene-stats.csv', 
                              destination='/opt/ml/processing/input/scene-stats',
                             input_name='stats')

job_output = ProcessingOutput(source='/opt/ml/processing/output/prepared/',  
                              destination=f's3://xview3-blog/data/processing/{current_timestamp}',
                              output_name='prepared-dataset')

dataset_processor.run(inputs=[input_labels, input_stats], 
              outputs=[job_output],
              arguments=["--dataset-type", "full", 
                         "--train-labels-csv", f"{input_labels.destination}/train.csv",
                         "--valid-labels-csv", f"{input_labels.destination}/validation.csv",
                         "--tiny-labels-csv", f"{input_labels.destination}/tiny.csv",
                         "--scene-stats-csv", f"{input_stats.destination}/scene-stats.csv",
                         "--seed", str(SEED), 
                         "--output-dir", job_output.source,
              ],
              wait=True,
              logs=True)

#### [Optional] Run processing job to created dataset dict for chipped scenes.

In [None]:
processing_image_name = f'{account}.dkr.ecr.{region}.amazonaws.com/xview3-processing:main'

In [None]:
instance_type = 'ml.t3.xlarge'
volume_size_in_gb = 30 
instance_count = 1
base_job_name = 'xview3-dataset-prep'
                      
dataset_processor = Processor(image_uri=processing_image_name,
                              role=role,
                              instance_count=instance_count,
                              base_job_name=base_job_name,
                              instance_type=instance_type, 
                              volume_size_in_gb=volume_size_in_gb, 
                              entrypoint=['python3', 'create_xview3_dataset_dict.py'],
                              sagemaker_session=sagemaker_session, 
                              tags=tags)

In [None]:
s3_destination_uri = f's3://xview3-blog/data/processing/{current_timestamp}'

input_stats = ProcessingInput(source='data/scene-stats.csv', 
                              destination='/opt/ml/processing/input/scene-stats',
                              input_name='stats')
input_label_trn = ProcessingInput(source=f'{s3_destination_uri}/labels/train.csv',
                                  destination='/opt/ml/processing/input/labels/train',
                                  input_name='trn-labels')
input_labels_tiny = ProcessingInput(source=f'{s3_destination_uri}/labels/tiny-train.csv',
                                    destination='/opt/ml/processing/input/labels/tiny',
                                    input_name='tiny-labels')
inputs_shoreline = ProcessingInput(source='s3://xview3-blog/data/shoreline/trainval/', 
                                  destination='/opt/ml/processing/input/shoreline/')

job_output = ProcessingOutput(source='/opt/ml/processing/output/prepared/',  
                              destination=s3_destination_uri,
                              output_name='prepared-dataset')

dataset_processor.run(inputs=[input_label_trn, input_labels_tiny, input_stats, inputs_shoreline], 
                      outputs=[job_output],
                      arguments=["--dataset-type", "chipped", 
                                 "--scene-stats-csv", f"{input_stats.destination}/scene-stats.csv",
                                 "--seed", str(SEED), 
                                 "--output-dir", job_output.source, 
                                 "--shoreline-dir", inputs_shoreline.destination,
                                 "--gt-labels-dir", str(Path(input_label_trn.destination).parent)],
                      wait=True,
                      logs=True)

## Imagery Preparation with SageMaker Processing
We use SageMaker Processing to prepare our imagery for training. 
The imagery data will be uploaded to the SageMaker session S3 bucket under `imagery`

### a. Save native scene imagery in file storage/
For dynamically sampling from full scene imagery, we observed that we can speed up training and evaluation by a factor of 10 if the scene imagery was stored in `hdf5` format, compared to loading the provided GeoTIFF (Geostationary Earth Orbit Tagged Image File Format) imagery data with `rasterio`. This is also useful during inference for evaluation.

Let's kick of SageMaker Processsing job to convert imagery to `hdf5`. This only needs to be done once.

In [None]:
instance_type = 'ml.t3.xlarge'
volume_size_in_gb = 300 
instance_count = 1 if USE_TINY else 75
                      
s3_uri_source = 's3://xview3-blog/data/raw'
s3_uri_imagery = f'{s3_destination_uri}/imagery'

storage_processor = Processor(image_uri=processing_image_name,
                              role=role,
                              instance_count=instance_count, 
                              base_job_name='xview3-storage',
                              instance_type=instance_type, 
                              volume_size_in_gb=volume_size_in_gb,
                              entrypoint=['python3', 'store_xview3_imagery.py'],
                              sagemaker_session=sagemaker_session,
                              tags=tags,)

storage_processor.run(inputs=[ProcessingInput(source=s3_uri_source, 
                                              destination='/opt/ml/processing/input/',
                                              s3_data_distribution_type='ShardedByS3Key')], 
                      outputs=[ProcessingOutput(source='/opt/ml/processing/output/imagery/', 
                                                destination=s3_uri_imagery,
                                                output_name='imagery',
                                                s3_upload_mode="Continuous")],
                      arguments=["--store-format", "hdf5"],
                      wait=False,
                      logs=False)

### b. [Optional] Image chipping 
If we decide to train with image chips, we can also use SageMaker Processing to generate image chips using the dataset dict created in the previous section.


In [None]:
s3_destination_uri = f's3://xview3-blog/data/processing/{current_timestamp}'

In [None]:
s3_uri_imagery = f'{s3_destination_uri}/imagery'
s3_uri_imagery

In [None]:
processing_image_name = f'{account}.dkr.ecr.{region}.amazonaws.com/xview3-processing:main'

In [None]:
s3_uri_destination_base = f"{s3_uri_imagery}/chipped-scenes"
s3_uri_source_base = "s3://xview3-blog/data/raw"
s3_uri_d2_datasets = f'{s3_destination_uri}/detectron2_dataset/'


d2_dataset_fn = f"xview3-chipped_2560x2560-{'tiny' if USE_TINY else 'train'}.dataset"
num_instances = 2 if USE_TINY else 50 
s3_uri_imagery_source = f"{s3_uri_source_base}/{'tiny' if USE_TINY else 'trainval'}"
s3_uri_destination = f"{s3_uri_destination_base}/{'tiny' if USE_TINY else 'train'}"

# specify local input data for SageMaker Processing job.
input_scenes = ProcessingInput(source=s3_uri_imagery_source, 
                               destination='/opt/ml/processing/input/scenes/', 
                               s3_data_distribution_type='ShardedByS3Key')

input_d2_dataset = ProcessingInput(source=s3_uri_d2_datasets, 
                                   destination='/opt/ml/processing/input/datasets/',)
                                                
job_output = ProcessingOutput(source='/opt/ml/processing/output/', 
                              destination=s3_uri_destination, 
                              s3_upload_mode="Continuous",)

Need at least 32GB CPU instance

In [None]:
chip_processor = Processor(image_uri=processing_image_name,
                           role=role,
                           instance_count=num_instances, 
                           base_job_name=f"xview3-chip-scenes-{'tiny' if USE_TINY else 'train'}", 
                           instance_type='ml.t3.2xlarge',#'ml.r5.xlarge', 
                           volume_size_in_gb=1024, 
                           entrypoint=['python3', 'chip_scenes_from_annotations.py'],
                           sagemaker_session=sagemaker_session, 
                           tags=tags)

chip_processor.run(inputs=[input_scenes, input_d2_dataset], 
                   outputs=[job_output],
                   arguments=['--scenes-input-dir', input_scenes.destination,
                              '--d2-dataset', f"{input_d2_dataset.destination}/{d2_dataset_fn}",],
                   wait=USE_TINY,
                   logs=USE_TINY)

## Train

In [None]:
from dataclasses import dataclass

from sagemaker.inputs import TrainingInput

In [None]:
USE_CHIPPED = False
LOCAL = False 

In [None]:
base_train_dockerfile = str(Path("docker/training/base.Dockerfile").resolve())
train_dockerfile = str(Path("docker/training/main.Dockerfile").resolve())

In [None]:
!pygmentize -l docker {base_train_dockerfile}

#### Build Base Training Container

In [None]:
training_base_name = 'xview3-training:base'

base_image_uri = build_and_push_docker_image(training_base_name,  
                                             dockerfile=str(base_train_dockerfile),)
print(f'Base image: {base_image_uri}')

#### Build Training Container

In [None]:
!pygmentize -l docker {train_dockerfile}

In [None]:
training_base_name = 'xview3-training:base'
base_image_uri = f'{account}.dkr.ecr.{region}.amazonaws.com/{training_base_name}'
training_main_name = 'xview3-training:train'

In [None]:
training_image_uri = build_and_push_docker_image(training_main_name, 
                                                 dockerfile=str(train_dockerfile),
                                                 base_image=base_image_uri)
print(f'Training image: {training_image_uri}')

In [None]:
training_image_uri = f'{account}.dkr.ecr.{region}.amazonaws.com/xview3-training:train'

In [None]:
output_dir='/opt/ml/model/FRCNN/auto'
shoreline_dir  = '/opt/ml/input/data/shoreline/'

metrics = [
    {"Name": "training:loss", "Regex": "total_loss: ([0-9\\.]+)",},
    {"Name": "training:loss_cls", "Regex": "loss_cls: ([0-9\\.]+)",},
    {"Name": "training:loss_box_reg", "Regex": "loss_box_reg: ([0-9\\.]+)",},
    {"Name": "training:loss_rpn_cls", "Regex": "loss_rpn_cls: ([0-9\\.]+)",},
    {"Name": "training:loss_rpn_loc", "Regex": "loss_rpn_loc: ([0-9\\.]+)",},
    {"Name": "training:loss_length_reg", "Regex": "loss_length_reg: ([0-9\\.]+)",},
    {"Name": "training:lr", "Regex": "lr: ([0-9\\.]+)"},
    {"Name": "training:dataloader_time", "Regex": "data_time: ([0-9\\.]+)"},
    {"Name": "training:time", "Regex": "time: ([0-9\\.]+)"},
    {"Name": "validation:aggregate", "Regex": "aggregate=([0-9\\.]+)",},
    {"Name": "validation:loc_fscore", "Regex": "loc_fscore=([0-9\\.]+)",},
    {"Name": "validation:loc_fscore_shore", "Regex": "loc_fscore_shore=([0-9\\.]+)",},
    {"Name": "validation:vessel_fscore", "Regex": "vessel_fscore=([0-9\\.]+)",},
    {"Name": "validation:fishing_fscore", "Regex": "fishing_fscore=([0-9\\.]+)",},
    {"Name": "validation:length_acc", "Regex": "length_acc=([0-9\\.]+)",},
]

In [None]:
def compute_iterations_from_epochs(epochs, bs, num_annotations, max_evals, warmup_prop, num_gpus=1):
    iter_max = int(num_annotations / (num_gpus * bs) * epochs)
    eval_period = iter_max//max_evals
    iter_warmup = int(iter_max * warmup_prop)
    
    return iter_max, eval_period, iter_warmup

In [None]:
@dataclass(order=True)
class Instances:
    name: str
    num_gpus: int = 1
    instance_limit: int = 1
    num_workers: int = 4
    batch_size: int = 12
    volume: int = 2048

In [None]:
instance_members = [Instances('local_gpu', num_gpus=4),
                    Instances('ml.p3.2xlarge'), 
                    Instances('ml.p3.8xlarge', 4, 4, 16), 
                    Instances('ml.p3.16xlarge', 8, 2, 32),
                    Instances('ml.p3dn.24xlarge', 8, num_workers=48, batch_size=24, volume=1800)]

In [None]:
NUM_ANNOTS = {'tiny': 1679, 
              'train': 54360}

if USE_CHIPPED:
    NUM_ANNOTS['tiny'] = 1907
    NUM_ANNOTS['train'] = 62766

In [None]:
instance = instance_members[-2]
instance

In [None]:
epochs = 6
num_annotations = NUM_ANNOTS['tiny'] if USE_TINY else NUM_ANNOTS['train']
bs = instance.batch_size
#num_gpus = 1 #instance.num_gpus
max_evals = 5
max_checkpoints = max_evals * 2
warmup_prop = 0.2

max_iter, eval_period, warmup_iter = compute_iterations_from_epochs(epochs, bs, num_annotations, num_gpus, max_evals, warmup_prop)
checkpoint_period = eval_period // 2
print(max_iter, eval_period, warmup_iter, checkpoint_period)

In [None]:
# Datasets
mode = "tiny" if USE_TINY else "trainval"
imagery_s3_uri = f's3://xview3-blog/data/processing/202207250702/imagery/hdf5/{mode}/'

if USE_CHIPPED:
    imagery_s3_uri = f's3://xview3-blog/data/processing/202207250702/imagery/chipped-scenes/{mode}/xview3_chipped_2560x2560_{mode.replace("val", "")}/'
    val_imagery_s3_uri = f's3://xview3-blog/data/processing/202207250702/imagery/hdf5/{mode}/'
    s3_channel_valid_imagery = TrainingInput(val_imagery_s3_uri, 
                                   distribution='FullyReplicated', 
                                   s3_data_type='S3Prefix',
                                   input_mode='FastFile')
    
shoreline_s3_uri = 's3://xview3-blog/data/shoreline/trainval/'
datasets_s3_uri = 's3://xview3-blog/data/processing/202207250702/detectron2_dataset/'

s3_channel_imagery = TrainingInput(imagery_s3_uri, 
                                   distribution='FullyReplicated', 
                                   s3_data_type='S3Prefix',
                                   input_mode='FastFile')
s3_channel_shoreline = TrainingInput(shoreline_s3_uri, 
                                     distribution='FullyReplicated', 
                                     s3_data_type='S3Prefix', 
                                     input_mode='FastFile')
s3_channel_datasets = TrainingInput(datasets_s3_uri, 
                                    distribution='FullyReplicated', 
                                    s3_data_type='S3Prefix',
                                    input_mode='FastFile')

train_inputs = {'imagery': s3_channel_imagery, 
                'shoreline': s3_channel_shoreline, 
                'datasets': s3_channel_datasets}
if USE_CHIPPED:
    train_inputs['valid_imagery'] = s3_channel_valid_imagery

# Use EFS if local
if LOCAL:
    train_inputs['imagery'] = f'file:////home/ec2-user/SageMaker/xview3-blog/data/imagery/hdf5/tiny/'
    train_inputs['shoreline'] = 'file:///home/ec2-user/SageMaker/xview3-blog/data/shoreline/trainval/'
    train_inputs['datasets'] = 'file:///home/ec2-user/SageMaker/xview3-blog/data/detectron2_datasets/new/'
    

In [None]:
config_file = 'frcnn_X101_32x8d_FPN_full.yaml'#'frcnn_R101_FPN_full.yaml'#'frcnn_R101_FPN_full_VH3.yaml' 
if USE_CHIPPED:
    config_file = 'frcnn_R101_FPN_chipped_histeq.yaml'

config_params = [f'OUTPUT_DIR {output_dir}',
                 f'TEST.INPUT.SHORELINE_DIR {shoreline_dir}',
                 f'INPUT.DATA.SHORELINE_DIR {shoreline_dir}',
                 f"SOLVER.IMS_PER_BATCH {bs}",
                 f"TEST.EVAL_PERIOD {eval_period}",
                 f"SOLVER.WARMUP_ITERS {warmup_iter}",
                 f"SOLVER.MAX_ITER {max_iter}",
                 f"SOLVER.CHECKPOINT_PERIOD {checkpoint_period}",
                 f"DATALOADER.NUM_WORKERS {instance.num_workers}",
                 "SOLVER.LR_SCHEDULER_NAME WarmupCosineLR",
                 "SOLVER.BASE_LR 0.005",
                ]

training_job_hp = {'config-file': f'/opt/ml/code/configs/{config_file}',
                   'imagery-dir': '/opt/ml/input/data/imagery',
                   'd2-dataset-dir': '/opt/ml/input/data/datasets',
                   'zopts': ' '.join(config_params)}

if USE_CHIPPED:
    training_job_hp['valid-imagery-dir'] = '/opt/ml/input/data/valid_imagery'

In [None]:
config_params

In [None]:
#base_job_name = f"xview3-{'chipped' if USE_CHIPPED else 'full'}-{'tiny' if USE_TINY else 'trainval'}"
base_job_name = f"xview3-{config_file.split('.')[0].replace('_', '-')}"

training_instance = instance.name
num_instances = 1
training_session = sagemaker_session


if training_instance.startswith("local"):
    training_session = sagemaker.LocalSession()
    training_session.config = {"local": {"local_code": True}}
    LOCAL = True

d2_estimator = Estimator(image_uri=training_image_uri,
                         role=role, 
                         sagemaker_session=training_session, 
                         instance_count=num_instances, 
                         instance_type=training_instance, 
                         volume_size=instance.volume,
                         metric_definitions=metrics, 
                         hyperparameters=training_job_hp,
                         base_job_name=base_job_name, 
                         max_retry_attempts=30, 
                         max_run=432000,
                         checkpoint_local_path=None if LOCAL else '/opt/ml/checkpoints/' ,
                         checkpoint_s3_uri=None if LOCAL else 's3://xview3-blog-sagemaker/checkpoints/',
                         disable_profiler=True,
                         debugger_hook_config=False,
                        tags=tags)

d2_estimator.fit(inputs=train_inputs, 
                 wait=True if USE_TINY else False, 
                 logs="All")

`/tmp/tmp7dix_o_f/algo-1-2i620`

In [None]:
len(data)