<img src='https://radiant-assets.s3-us-west-2.amazonaws.com/PrimaryRadiantMLHubLogo.png' alt='Radiant MLHub Logo' width='300'/>

# CV4A ICRL Crop Type Classification Challenge
# A Guide to Access the data on Radiant MLHub


This notebook walks you through the steps to get access to Radiant MLHub and access the data for the crop type classification competition being organized as part of the [CV4A](https://www.cv4gc.org/cv4a2020/) workshop at 2020 ICLR. 

### Radiant MLHub API


The Radiant MLHub API gives access to open Earth imagery training data for machine learning applications. You can learn more about the repository at the [Radiant MLHub site](https://mlhub.earth) and about the organization behind it at the [Radiant Earth Foundation site](https://radiant.earth).

Full documentation for the API is available at [docs.mlhub.earth](docs.mlhub.earth).

Each item in our collection is explained in json format compliant with [STAC](https://stacspec.org/) [label extension](https://github.com/radiantearth/stac-spec/tree/master/extensions/label) definition.

In [1]:
# Required libraries
import requests
from urllib.parse import urlparse
from pathlib import Path
from datetime import datetime



In [2]:
# output path where you want to download the data
output_path = Path("data/")

## Authentication

Access to the Radiant MLHub API requires an access token. To get your access token, go to [dashboard.mlhub.earth](https://dashboard.mlhub.earth). If you have not used Radiant MLHub before, you will need to sign up and create a new account. Otherwise, sign in. Under **Usage**, you'll see your access token, which you will need. *Do not share* your access token with others: your usage may be limited and sharing your access token is a security risk.

Copy the access token, and paste it in the box bellow. This header block will work for all API calls.

In [3]:
# copy your access token from dashboard.mlhub.earth and paste it in the following
ACCESS_TOKEN = 'eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCIsImtpZCI6IlJqa3dNMEpFTURsRlFrSXdOemxDUlVZelJqQkdPRFpHUVRaRVFqWkRNRVJGUWpjeU5ERTFPQSJ9.eyJpc3MiOiJodHRwczovL3JhZGlhbnRlYXJ0aC5hdXRoMC5jb20vIiwic3ViIjoiYXV0aDB8NWRlNDkwYmE5ZDdkM2MwY2Q2MjY4YzYyIiwiYXVkIjpbImh0dHBzOi8vYXBpLnJhZGlhbnQuZWFydGgvdjEiLCJodHRwczovL3JhZGlhbnRlYXJ0aC5hdXRoMC5jb20vdXNlcmluZm8iXSwiaWF0IjoxNTgwNzQzMTI3LCJleHAiOjE1ODEzNDc5MjcsImF6cCI6IlAzSXFMcWJYUm0xMEJVSk1IWEJVdGU2U0FEbjBTOERlIiwic2NvcGUiOiJvcGVuaWQgcHJvZmlsZSBlbWFpbCIsInBlcm1pc3Npb25zIjpbXX0.XOF1Mp44DqpEi3OmGE5MR1ULV8xv3zz-891dDD_ZKXAhFc9BD61TawnWrrwQvsKq3WLE6LwbHCaF-BmHwjyugOCxLugVBPd86HQEZhFEmzlvk69QF7DvQcBVy1cuPjEfSr6bXliQqhmDQ4BGg38-xvyvExiBG2Oz2Cpr0lexSsBuGbkxhrou5vZam82RfFPNMbRLYICIPUWznmZ_Nysy38ZLLmvb0k-y9OCHLvrJFQGtrxI6Xb2GOTw3AfbcarspDaQFrLrRl83DlLVL23UHcR_q8zumEbhhpE-SqIdp_uia1kLasKdE8EzrU6u54WHtlZqOIFlcxyXnDbll0u2lTw'

# these headers will be used in each request
headers = {
    'Authorization': f'Bearer {ACCESS_TOKEN}',
    'Accept':'application/json'
}

## Retrieving the competition dataset

Datasets are stored as collections on Radiant MLHub catalog. A collection represents the top-most data level. Typically this means the data comes from the same source for the same geography. It might include different years or sub-geographies.

The two collections for this competition are:
- `ref_african_crops_kenya_02_source`: includes the multi-temporal bands of Sentinel-2
- `ref_african_crops_kenya_02_labels`: includes the labels and field IDs

In [4]:
def get_download_url(item, asset_key, headers):
    asset = item.get('assets', {}).get(asset_key, None)
    if asset is None:
        print(f'Asset "{asset_key}" does not exist in this item')
        return None
    r = requests.get(asset.get('href'), headers=headers, allow_redirects=False)
    return r.headers.get('Location')

def download_label(url, output_path, tileid):
    filename = urlparse(url).path.split('/')[-1]
    outpath = output_path/tileid
    outpath.mkdir(parents=True, exist_ok=True)
    
    r = requests.get(url)
    f = open(outpath/filename, 'wb')
    for chunk in r.iter_content(chunk_size=512 * 1024): 
        if chunk:
            f.write(chunk)
    f.close()
    print(f'Downloaded {filename}')
    return 

def download_imagery(url, output_path, tileid, date):
    filename = urlparse(url).path.split('/')[-1]
    outpath = output_path/tileid/date
    outpath.mkdir(parents=True, exist_ok=True)
    
    r = requests.get(url)
    f = open(outpath/filename, 'wb')
    for chunk in r.iter_content(chunk_size=512 * 1024): 
        if chunk:
            f.write(chunk)
    f.close()
    print(f'Downloaded {filename}')
    return

### Downloading Labels


The `assets` property of the items in a collection contains all the assets associated with that item and links to download them. The labels for the item will always be the asset with the key `labels`. The following code will go through every item in the collection and download the labels and field_ids raster feature.

In [5]:
# paste the id of the labels collection:
collectionId = 'ref_african_crops_kenya_02_labels'

# these optional parameters can be used to control what items are returned. 
# Here, we want to download all the items so:
limit = 100 
bounding_box = []
date_time = []

# retrieves the items and their metadata in the collection
r = requests.get(f'https://api.radiant.earth/mlhub/v1/collections/{collectionId}/items', params={'limit':limit, 'bbox':bounding_box,'datetime':date_time}, headers=headers)
collection = r.json()

In [6]:
# retrieve list of features (in this case tiles) in the collection
for feature in collection.get('features', []):
    assets = feature.get('assets').keys()
    print("Feature", feature.get('id'), 'with the following assets', list(assets))

Feature ref_african_crops_kenya_02_tile_02_label with the following assets ['field_train_test_ids', 'field_ids', 'labels']
Feature ref_african_crops_kenya_02_tile_03_label with the following assets ['field_train_test_ids', 'field_ids', 'labels']
Feature ref_african_crops_kenya_02_tile_01_label with the following assets ['field_train_test_ids', 'field_ids', 'labels']
Feature ref_african_crops_kenya_02_tile_00_label with the following assets ['field_train_test_ids', 'field_ids', 'labels']


In [7]:
for feature in collection.get('features', []):
    
    tileid = feature.get('id').split('tile_')[-1][:2]

    # download labels
    download_url = get_download_url(feature, 'labels', headers)
    download_label(download_url, output_path, tileid)
    
    #download field_ids
    download_url = get_download_url(feature, 'field_ids', headers)
    download_label(download_url, output_path, tileid)

Downloaded 2_label.tif
Downloaded 2_field_id.tif
Downloaded 3_label.tif
Downloaded 3_field_id.tif
Downloaded 1_label.tif
Downloaded 1_field_id.tif
Downloaded 0_label.tif
Downloaded 0_field_id.tif


### Downloading Imagery

Similar to the `labels` and `field_ids`, the imagery is made available in a collection and you can use the assets property of each item to download the imagery.

In [8]:
# paste the id of the imagery collection:
collectionId = 'ref_african_crops_kenya_02_source'

# these optional parameters can be used to control what items are returned. 
# Here, we want to download all the items so:
limit = 100 
bounding_box = []
date_time = []

# retrieves the items and their metadata in the collection
r = requests.get(f'https://api.radiant.earth/mlhub/v1/collections/{collectionId}/items', params={'limit':limit, 'bbox':bounding_box,'datetime':date_time}, headers=headers)
collection = r.json()

In [9]:
# List assets of the items
for feature in collection.get('features', []):
    assets = feature.get('assets').keys()
    print(list(assets))
    break #all the features have the same type of assets. for simplicity we break the loop here. 

['B11', 'B01', 'B12', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B09', 'B8A', 'CLD']


In [10]:
# This cell downloads all the multi-spectral images throughout the growing season for this competition.
# The size of data is about 1.5 GB, and download time depends on your internet connection. 
# Note that you only need to run this cell and download the data once.

for feature in collection.get('features', []):
    assets = feature.get('assets').keys()
    tileid = feature.get('id').split('tile_')[-1][:2]
    date = datetime.strftime(datetime.strptime(feature.get('properties')['datetime'], "%Y-%m-%dT%H:%M:%SZ"), "%Y%m%d")
    for asset in assets:
        download_url = get_download_url(feature, asset, headers)
        download_imagery(download_url, output_path, tileid, date)

Downloaded 0_B11_20191103.tif
Downloaded 0_B01_20191103.tif
Downloaded 0_B12_20191103.tif
Downloaded 0_B02_20191103.tif
Downloaded 0_B03_20191103.tif
Downloaded 0_B04_20191103.tif
Downloaded 0_B05_20191103.tif
Downloaded 0_B06_20191103.tif
Downloaded 0_B07_20191103.tif
Downloaded 0_B08_20191103.tif
Downloaded 0_B09_20191103.tif
Downloaded 0_B8A_20191103.tif
Downloaded 0_CLD_20191103.tif
Downloaded 1_B11_20191103.tif
Downloaded 1_B01_20191103.tif
Downloaded 1_B12_20191103.tif
Downloaded 1_B02_20191103.tif
Downloaded 1_B03_20191103.tif
Downloaded 1_B04_20191103.tif
Downloaded 1_B05_20191103.tif
Downloaded 1_B06_20191103.tif
Downloaded 1_B07_20191103.tif
Downloaded 1_B08_20191103.tif
Downloaded 1_B09_20191103.tif
Downloaded 1_B8A_20191103.tif
Downloaded 1_CLD_20191103.tif
Downloaded 3_B11_20191103.tif
Downloaded 3_B01_20191103.tif
Downloaded 3_B12_20191103.tif
Downloaded 3_B02_20191103.tif
Downloaded 3_B03_20191103.tif
Downloaded 3_B04_20191103.tif
Downloaded 3_B05_20191103.tif
Downloaded

Downloaded 3_B01_20190825.tif
Downloaded 3_B12_20190825.tif
Downloaded 3_B02_20190825.tif
Downloaded 3_B03_20190825.tif
Downloaded 3_B04_20190825.tif
Downloaded 3_B05_20190825.tif
Downloaded 3_B06_20190825.tif
Downloaded 3_B07_20190825.tif
Downloaded 3_B08_20190825.tif
Downloaded 3_B09_20190825.tif
Downloaded 3_B8A_20190825.tif
Downloaded 3_CLD_20190825.tif
Downloaded 0_B11_20190825.tif
Downloaded 0_B01_20190825.tif
Downloaded 0_B12_20190825.tif
Downloaded 0_B02_20190825.tif
Downloaded 0_B03_20190825.tif
Downloaded 0_B04_20190825.tif
Downloaded 0_B05_20190825.tif
Downloaded 0_B06_20190825.tif
Downloaded 0_B07_20190825.tif
Downloaded 0_B08_20190825.tif
Downloaded 0_B09_20190825.tif
Downloaded 0_B8A_20190825.tif
Downloaded 0_CLD_20190825.tif
Downloaded 2_B11_20190825.tif
Downloaded 2_B01_20190825.tif
Downloaded 2_B12_20190825.tif
Downloaded 2_B02_20190825.tif
Downloaded 2_B03_20190825.tif
Downloaded 2_B04_20190825.tif
Downloaded 2_B05_20190825.tif
Downloaded 2_B06_20190825.tif
Downloaded

Downloaded 1_B12_20190706.tif
Downloaded 1_B02_20190706.tif
Downloaded 1_B03_20190706.tif
Downloaded 1_B04_20190706.tif
Downloaded 1_B05_20190706.tif
Downloaded 1_B06_20190706.tif
Downloaded 1_B07_20190706.tif
Downloaded 1_B08_20190706.tif
Downloaded 1_B09_20190706.tif
Downloaded 1_B8A_20190706.tif
Downloaded 1_CLD_20190706.tif
Downloaded 2_B11_20190706.tif
Downloaded 2_B01_20190706.tif
Downloaded 2_B12_20190706.tif
Downloaded 2_B02_20190706.tif
Downloaded 2_B03_20190706.tif
Downloaded 2_B04_20190706.tif
Downloaded 2_B05_20190706.tif
Downloaded 2_B06_20190706.tif
Downloaded 2_B07_20190706.tif
Downloaded 2_B08_20190706.tif
Downloaded 2_B09_20190706.tif
Downloaded 2_B8A_20190706.tif
Downloaded 2_CLD_20190706.tif
Downloaded 1_B11_20190701.tif
Downloaded 1_B01_20190701.tif
Downloaded 1_B12_20190701.tif
Downloaded 1_B02_20190701.tif
Downloaded 1_B03_20190701.tif
Downloaded 1_B04_20190701.tif
Downloaded 1_B05_20190701.tif
Downloaded 1_B06_20190701.tif
Downloaded 1_B07_20190701.tif
Downloaded