<img src='https://radiant-assets.s3-us-west-2.amazonaws.com/PrimaryRadiantMLHubLogo.png' alt='Radiant MLHub Logo' width='300'/>

How to use the Radiant MLHub API
=====

The Radiant MLHub API gives access to open Earth imagery training data for machine learning applications. You can learn more about the repository at the [Radiant MLHub site](https://mlhub.earth) and about the organization behind it at the [Radiant Earth Foundation site](https://radiant.earth).

This Jupyter notebook, which you may copy and adapt for any use, shows basic examples of how to use the API. Full documentation for the API is available at [docs.mlhub.earth](docs.mlhub.earth).

We'll show you how to set up your authorization, see the list of available collections and datasets, and retrieve the items (the data contained within them) from those collections. 

Each item in our collection is explained in json format compliant with [STAC](https://stacspec.org/) [label extension](https://github.com/radiantearth/stac-spec/tree/master/extensions/label) definition.

Authentication
-----

Access to the Radiant MLHub API requires an access token. To get your access token, go to [dashboard.mlhub.earth](https://dashboard.mlhub.earth). If you have not used Radiant MLHub before, you will need to sign up and create a new account. Otherwise, sign in. Under **Usage**, you'll see your access token, which you will need. *Do not share* your access token with others: your usage may be limited and sharing your access token is a security risk.

Copy the access token, and paste it in the box bellow. This header block will work for all API calls.

Click **Run** or press `SHIFT` + `ENTER` before moving on to run this first piece of code.

In [1]:
# only the requests module is required to access the API
import requests

# copy your access token from dashboard.mlhub.earth and paste it in the following
ACCESS_TOKEN = 'PASTE_YOUR_ACCESS_TOKEN_HERE'

# these headers will be used in each request
headers = {
    'Authorization': f'Bearer {ACCESS_TOKEN}',
    'Accept':'application/json'
}

Search for data collections
-----

To see what training data is available, you will want to see the collections available through the API.

A collection represents the top-most data level. Typically this means the data comes from the same source for the same geography. It might include different years or sub-geographies.

To find data with specific parameters, see the [API documentation](http://docs.mlhub.earth/?python#the-feature-collections-in-the-dataset).

To see the list, simply run the following cell. The returned list shows the collection id values.

In [2]:
# get list of all collections
r = requests.get('https://api.radiant.earth/mlhub/v1/collections', headers=headers)
h = r.json()
collections = h['collections']

# print the list of collections 
for c in collections:
    print(c['id'])

ref-african-crops-tanzania-01
ref-african-crops-uganda-01
ref-african-crops-kenya-01


Retrieve properties of a collection
----

Once you have found the collection that you want to access, you can get its properties from the API.

You can  limit what data you get in the response using the optional parameters:
* **Limit** limits how many items will be returned, with a minimum of 1 and maximum of 10000.
* **Bounding box** limits the returned items to a specific geographic area. 
* **Date time** limits the returned items to those that fall within a specific time-frame.

See the [get features](http://docs.mlhub.earth/#getfeatures) API documentation for more information.

Paste the collection id below for `collectionId`, and enter any desired parameters, then run the cell.

In [3]:
# paste the id of the collection you are interested in here:
collectionId = 'ref-african-crops-kenya-01'
# use these optional parameters to control what items are returned. maximum limit is 10000
limit = 10
bounding_box = []
date_time = []

# retrieves the items and their metadata in the collection
r = requests.get(f'https://api.radiant.earth/mlhub/v1/collections/{collectionId}/items', params={'limit':limit, 'bbox':bounding_box,'datetime':date_time},headers=headers)
collection = r.json()

In [4]:
selected_item = None
for feature in collection.get('features', []):
    assets = feature.get('assets').keys()
    selected_item = feature
    print(list(assets))
    # For demo purposes we only want the first item
    break

['2019_04_07_b02', '2019_04_07_b03', '2019_01_27_b8a', '2019_04_07_b01', '2019_01_17_b09', '2019_01_17_b08', '2019_01_17_b07', '2019_01_17_b06', '2019_01_17_b05', '2019_01_17_b04', '2019_01_17_b03', '2019_01_17_b02', '2019_01_17_b01', '2019_11_03_tci', '2019_04_17_b8a', '2019_09_09_b10', '2019_09_09_b11', '2019_04_07_b08', '2019_09_09_b12', '2019_04_07_b09', '2019_04_07_b06', '2019_04_07_b07', '2019_04_07_b04', '2019_04_07_b05', '2019_09_09_b07', '2019_09_09_b08', '2019_09_09_b09', '2019_05_12_b8a', '2019_09_19_b8a', '2019_09_09_b01', '2019_09_09_b02', '2019_09_09_b03', '2019_09_09_b04', '2019_09_09_b05', '2019_09_09_b06', '2019_05_02_b02', '2019_05_02_b03', '2019_05_02_b04', '2019_05_02_b05', '2019_05_02_b06', '2019_05_02_b07', '2019_05_02_b08', '2019_12_08_b08', '2019_05_02_b09', '2019_12_08_b09', '2019_12_08_b06', '2019_10_24_tci', '2019_12_08_b07', '2019_12_08_b04', '2019_02_06_b8a', '2019_12_08_b05', '2019_12_08_b02', '2019_12_08_b03', '2019_02_21_b8a', '2019_05_02_b01', '2019_12_

Downloading Labels
----

The `assets` property of the items in a collection contains all the assets associated with that item and links to download them. The labels for the item will always be the asset with the key `labels`. The following code will go through every item in the collection and download the labels geojson feature.

In [5]:
from urllib.parse import urlparse

def get_download_url(item, asset_key, headers):
    asset = item.get('assets', {}).get(asset_key, None)
    if asset is None:
        print(f'Asset "{asset_key}" does not exist in this item')
        return None
    r = requests.get(asset.get('href'), headers=headers, allow_redirects=False)
    return r.headers.get('Location')

def download_file(url):
    filename = urlparse(url).path.split('/')[-1]
    r = requests.get(url)
    f = open(filename, 'wb')
    for chunk in r.iter_content(chunk_size=512 * 1024): 
        if chunk:
            f.write(chunk)
    f.close()
    print(f'Downloaded {filename}')
    return 

In [6]:
download_url = get_download_url(selected_item, 'labels', headers)
download_file(download_url)

Downloaded african-crops-kenya-01-001.geojson


Downloading Source Imagery
---

For this example, we'll query the API for the download url for a Sentinel 2 true color scene associated with this asset. Since the Sentinel2 S3 bucket is in Requester Pays mode, you must provide your AWS Access Key ID and Secret Key.

In [7]:
import boto3
AWS_ACCESS_KEY_ID = 'YOUR_AWS_ACCESS_KEY_ID'
AWS_SECRET_KEY = 'YOUR_AWS_SECRET_KEY'

def download_s3_file(url, access_key, secret_key):
    parsed_url = urlparse(url)
    
    bucket = parsed_url.hostname.split('.')[0]
    path = parsed_url.path[1:]
    filename = path.split('/')[-1]
    
    s3 = boto3.client(
        's3',
        aws_access_key_id=AWS_ACCESS_KEY_ID,
        aws_secret_access_key=AWS_SECRET_KEY
    )
    
    s3.download_file(bucket, path, filename, ExtraArgs={'RequestPayer': 'requester'})
    print(f'Downloaded s3://{bucket}/{path}')

In [8]:
true_color_asset_url = get_download_url(selected_item, '2019_07_31_tci', headers)
download_s3_file(true_color_asset_url, AWS_ACCESS_KEY_ID, AWS_SECRET_KEY)

Downloaded s3://sentinel-s2-l1c/tiles/36/N/XF/2019/7/31/0/TCI.jp2
