Exercise 3 - Matching Imagery
---

<a style="display: inline-block;" href="https://mybinder.org/v2/gh/RadiantMLHub/ml4eo-bootcamp-2021/main?filepath=Lecture%202%2Fexercises%2F3_matching_imagery.ipynb"><img src="https://mybinder.org/badge_logo.svg" alt="Launch in Binder"/></a>

For this exercise we will match our input data with Sentinel-2 Imagery which matches the spatial and temporal extent of the input data. We will use the same temporal range that we explored in the first exercise but you can modify this value to be whatever you like. We can see with this next cell that our input data all falls within one Sentinel-2 tile (34HCH).

In [1]:
import fiona
from shapely.geometry import shape, GeometryCollection, box
import arrow
import boto3
from botocore import UNSIGNED
from botocore.client import Config
import os


with fiona.open('../data/south_africa_crops/south_africa_crops.shp') as input_data:
    crop_collection = GeometryCollection([ shape(crop['geometry']).buffer(0) for crop in input_data ])    
    b = crop_collection.bounds
    crop_collection = box(b[0], b[1], b[2], b[3])
    
date_range = [arrow.get('2017-05-15', 'YYYY-MM-DD'), arrow.get('2018-03-10', 'YYYY-MM-DD')]
tiles = []

with fiona.open('../data/sentinel_2_tiles.geojson') as sentinel_2_tiles:
    for tile in sentinel_2_tiles:
        tile_name = tile['properties']['Name']
        tile_shape = shape(tile['geometry'])
        
        if tile_shape.intersects(crop_collection):
            tiles.append(tile_name)

In [2]:
tiles

['34HCH']

Searching for Imagery
---

These next few cells will find all of the Sentinel-2 scenes which fall within our temporal and spatial ranges. We see here that there are 58 scenes which match our requirements. The S3 bucket which we are grabbing our imagery from does not require authentication but other buckets, such as the official [Sentinel-2 bucket](https://registry.opendata.aws/sentinel-2/) require authentication and are requestor pays, meaning you will be billed for the data transfer costs.

In [3]:
s3 = boto3.client('s3', config=Config(signature_version=UNSIGNED))

def get_matching_s3_keys(bucket, prefix='', suffix=''):
    kwargs = {'Bucket': bucket, 'Prefix': prefix}
    while True:
        resp = s3.list_objects_v2(**kwargs)
        for obj in resp['Contents']:
            key = obj['Key']
            if key.endswith(suffix):
                yield key

        try:
            kwargs['ContinuationToken'] = resp['NextContinuationToken']
        except KeyError:
            break

In [4]:
matching_scenes = []

for tile in tiles:
    keys = list(get_matching_s3_keys('sentinel-cogs', prefix=f'sentinel-s2-l2a-cogs/{tile[0:2]}/{tile[2]}/{tile[3:5]}/'))
    for key in keys:
        scene_id = key.split('/')[-2]
        scene_date = arrow.get(scene_id.split('_')[2], 'YYYYMMDD')
        if scene_id not in matching_scenes and scene_date >= date_range[0] and scene_date <= date_range[1]:
            matching_scenes.append(scene_id)

In [5]:
print(f'Found {len(matching_scenes)} matching scenes')
print(matching_scenes)

Found 58 matching scenes
['S2A_34HCH_20171008_0_L2A', 'S2A_34HCH_20171018_0_L2A', 'S2A_34HCH_20171028_0_L2A', 'S2B_34HCH_20171003_0_L2A', 'S2B_34HCH_20171013_0_L2A', 'S2B_34HCH_20171023_0_L2A', 'S2A_34HCH_20171107_0_L2A', 'S2A_34HCH_20171117_0_L2A', 'S2A_34HCH_20171127_0_L2A', 'S2A_34HCH_20171127_1_L2A', 'S2B_34HCH_20171102_0_L2A', 'S2B_34HCH_20171112_0_L2A', 'S2B_34HCH_20171112_1_L2A', 'S2B_34HCH_20171112_2_L2A', 'S2B_34HCH_20171122_0_L2A', 'S2A_34HCH_20171207_0_L2A', 'S2A_34HCH_20171217_0_L2A', 'S2A_34HCH_20171227_0_L2A', 'S2B_34HCH_20171202_0_L2A', 'S2B_34HCH_20171212_0_L2A', 'S2B_34HCH_20171222_0_L2A', 'S2A_34HCH_20170521_0_L2A', 'S2A_34HCH_20170531_0_L2A', 'S2A_34HCH_20170610_0_L2A', 'S2A_34HCH_20170610_1_L2A', 'S2A_34HCH_20170620_0_L2A', 'S2A_34HCH_20170630_0_L2A', 'S2A_34HCH_20170710_0_L2A', 'S2A_34HCH_20170720_0_L2A', 'S2A_34HCH_20170730_0_L2A', 'S2B_34HCH_20170705_0_L2A', 'S2B_34HCH_20170705_1_L2A', 'S2B_34HCH_20170715_0_L2A', 'S2B_34HCH_20170715_1_L2A', 'S2B_34HCH_20170725_0_

Listing Available Assets
---

The code below will list the assets available for download in the first scene. Here, we can see that there's a True Color composite image (TCI.tif) available and we will be downloading that.

In [6]:
import json
for scene in matching_scenes:
    tile = scene.split('_')[1]
    date = scene.split('_')[2]
    year = date[0:4]
    month = date[4:6]
    prefix = f'sentinel-s2-l2a-cogs/{tile[0:2]}/{tile[2]}/{tile[3:5]}/{year}/{int(month)}/{scene}'
    keys = [ k.split('/')[-1] for k in list(get_matching_s3_keys('sentinel-cogs', prefix=prefix)) ]
    
    print(json.dumps(keys, indent=4))
    break

[
    "AOT.tif",
    "B01.tif",
    "B02.tif",
    "B03.tif",
    "B04.tif",
    "B05.tif",
    "B06.tif",
    "B07.tif",
    "B08.tif",
    "B09.tif",
    "B11.tif",
    "B12.tif",
    "B8A.tif",
    "L2A_PVI.tif",
    "S2A_34HCH_20171008_0_L2A.json",
    "SCL.tif",
    "TCI.tif",
    "WVP.tif"
]


Downloading Imagery
---

For this example we will only download a the first matching scene but in a real use-case you would download all of the scenes. We are also just downloading the True Color composite image for this first scene, you can modify this code to download all of the individual band 

In [7]:
for scene in matching_scenes:
    print(scene)
    tile = scene.split('_')[1]
    date = scene.split('_')[2]
    year = date[0:4]
    month = date[4:6]
    
    if not os.path.exists(f'../data/imagery/{scene}'):
        os.makedirs(f'../data/imagery/{scene}')
    
    files = ['TCI.tif']
    
    for f in files:
        key = f'sentinel-s2-l2a-cogs/{tile[0:2]}/{tile[2]}/{tile[3:5]}/{year}/{int(month)}/{scene}/{f}'
    
        fname = key.split('/')[-1]
        s3.download_file('sentinel-cogs', key, f'../data/imagery/{scene}/{fname}')
    break # Remove this line to download all of the scenes

S2A_34HCH_20171008_0_L2A
