# Sample Patchlets from EOPatches

```
#
# Copyright (c) Sinergise, 2019 -- 2021.
#
# This file belongs to subproject "field-delineation" of project NIVA (www.niva4cap.eu).
# All rights reserved.
#
# This source code is licensed under the MIT license found in the LICENSE
# file in the root directory of this source tree.
#
```

This notebook shows how to sample image chips out of the larger `EOPatches`. A maximum number of chips is sampled randomly from the `EOPatch`, depending on the fraction of reference `EXTENT` pixels. A buffer where patchlets are not sampled from can also be specified. Image chips containing only valid image data and a cloud coverage lower than a threshold are sampled.

With the same interface, positive (i.e. where reference labels are present) and negative (i.e. where reference labels are not present) patchlets can be sampled. Negative samples should be added when the reference ground-truth masks are of high quality, where a lack of reference data means that there is actually no target labels in the area. Often this is not the case, and is preferred to use positive samples only even though this leads to larger number of false positives.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from functools import partial
from concurrent.futures import ProcessPoolExecutor

from tqdm.auto import tqdm

from fd.sampling import sample_patch, SamplingConfig, prepare_eopatches_paths
from fd.utils import multiprocess

# Define configs

In [3]:
positive_examples_config = SamplingConfig(
    bucket_name='',
    aws_access_key_id='',
    aws_secret_access_key='',
    aws_region='',
    eopatches_location='data/Castilla/2020-04/eopatches',
    output_path='data/Castilla/2020-04/patchlets',
    sample_positive=True,
    mask_feature_name='EXTENT',
    buffer=50,
    patch_size=256,
    num_samples=10,
    max_retries=10,
    fraction_valid=0.4,
    sampled_feature_name='BANDS',
    cloud_coverage=0.05)

In [4]:
negatives_examples_config = SamplingConfig(
    bucket_name='bucket-name',
    aws_access_key_id='',
    aws_secret_access_key='',
    aws_region='eu-central-1',
    eopatches_location='data/Castilla/2020/eopatches/with-overlap/',
    output_path='data/Castilla/2020-04/patchlets-neg',
    sample_positive=False, 
    grid_definition_file='../../input-data/cyl-grid-definition.gpkg',
    area_geometry_file='../../input-data/cyl-province-border.geojson', 
    fraction_valid=0.1,
    mask_feature_name='EXTENT',
    patch_size=256,
    num_samples=10,
    max_retries=10,
    sampled_feature_name='BANDS',
    cloud_coverage=0.05)

# Positive examples sampling

In [5]:
eopatches_paths = prepare_eopatches_paths(positive_examples_config)

In [6]:
len(eopatches_paths)

1083

In [7]:
process_fn = partial(sample_patch, sampling_config=positive_examples_config)

In [None]:
_ = multiprocess(process_fun=process_fn, arguments=eopatches_paths, max_workers=24)

# Negative examples sampling

In this case we don't consider negative samples, as the reference data is quite noisy. Uncomment and run the following if you want to add negative samples.

In [None]:
# eopatches_paths_neg = prepare_eopatches_paths(negatives_examples_config)

In [None]:
# multiprocess(eopatches_paths_neg, negatives_examples_config, max_workers=4)