<img src='https://radiant-assets.s3-us-west-2.amazonaws.com/PrimaryRadiantMLHubLogo.png' alt='Radiant MLHub Logo' width='300'/>

How to use the Radiant MLHub API
=====

The Radiant MLHub API gives access to open Earth imagery training data for machine learning applications. You can learn more about the repository at the [Radiant MLHub site](https://mlhub.earth) and about the organization behind it at the [Radiant Earth Foundation site](https://radiant.earth).

This Jupyter notebook, which you may copy and adapt for any use, shows basic examples of how to use the API. Full documentation for the API is available at [docs.mlhub.earth](docs.mlhub.earth).

We'll show you how to set up your authorization, see the list of available collections and datasets, and retrieve the items (the data contained within them) from those collections. 

Each item in our collection is explained in json format compliant with [STAC](https://stacspec.org/) [label extension](https://github.com/radiantearth/stac-spec/tree/master/extensions/label) definition.

Authentication
-----

Access to the Radiant MLHub API requires an API key. To get your API key, go to [dashboard.mlhub.earth](https://dashboard.mlhub.earth). If you have not used Radiant MLHub before, you will need to sign up and create a new account. Otherwise, sign in. In the **API Keys** tab, you'll be able to create API key, which you will need. *Do not share* your API key with others: your usage may be limited and sharing your API key is a security risk.

Copy the API key, and paste it in the box bellow.

Click **Run** or press `SHIFT` + `ENTER` before moving on to run this first piece of code.

In [None]:
# only the requests module is required to access the API
import requests

# copy your API key from dashboard.mlhub.earth and paste it in the following
API_KEY = 'PASTE_YOUR_API_KEY_HERE'
API_BASE = 'https://api.radiant.earth/mlhub/v1'

Search for data collections
-----

To see what training data is available, you will want to see the collections available through the API.

A collection represents the top-most data level. Typically this means the data comes from the same source for the same geography. It might include different years or sub-geographies.

To find data with specific parameters, see the [API documentation](http://docs.mlhub.earth/?python#the-feature-collections-in-the-dataset).

To see the list, simply run the following cell. The returned list shows the collection id values, collection license, and data source citation (if available).

In [None]:
# get list of all collections
r = requests.get(f'{API_BASE}/collections?key={API_KEY}')
h = r.json()
collections = h['collections']

# print the list of collections 
for c in collections:
    print(f'ID:       {c["id"]}\nLicense:  {c.get("license", "N/A")}\nCitation: {c.get("sci:citation", "N/A")}\n')

Retrieve properties of a collection
----

Once you have found the collection that you want to access, you can get its properties from the API.

You can  limit what data you get in the response using the optional parameters:
* **Limit** limits how many items will be returned, with a minimum of 1 and maximum of 10000.
* **Bounding box** limits the returned items to a specific geographic area. 
* **Date time** limits the returned items to those that fall within a specific time-frame.

See the [get features](http://docs.mlhub.earth/#getfeatures) API documentation for more information.

Paste the collection id below for `collectionId`, and enter any desired parameters, then run the cell.

In [None]:
# paste the id of the collection you are interested in here:
collectionId = 'ref_african_crops_kenya_01_labels'

# retrieves the items and their metadata in the collection
r = requests.get(f'{API_BASE}/collections/{collectionId}/items?key={API_KEY}')
collection = r.json()

Selecting an Item to Download
---

For the purposes of this demo we will only download the assets of one item. The next cell selects the first item in the collection. If you wish to download the assets for all of the items in the collection then the following cells should be repeated for every item in the collection.


In [None]:
selected_item = None
assets = None
for feature in collection.get('features', []):
    selected_item = feature
    assets = list(feature.get('assets').keys())
    # For demo purposes we only want the first item
    break

Listing Available Assets
---

Source imagery assets follow the pattern `year_month_day_type` so we'll loop through the list of assets and only print the ones which don't match that pattern.

In [None]:
import re

# List all assets which don't match the pattern "year_month_day_*"
for asset in assets:
    if not re.match('\d{4}_\d{2}_\d{2}_.*', asset):
        print(asset)

As you can see, there are 3 assets which match this criteria: `labels`, `documentation`, and `property descriptions`.

Downloading Assets
---
We'll need to set up some functions to download assets first.

In [None]:
from urllib.parse import urlparse

def get_download_url(item, asset_key):
    asset = item.get('assets', {}).get(asset_key, None)
    if asset is None:
        print(f'Asset "{asset_key}" does not exist in this item')
        return None
    r = requests.get(asset.get('href'), allow_redirects=False)
    return r.headers.get('Location')

def download_file(url):
    filename = urlparse(url).path.split('/')[-1]
    r = requests.get(url)
    f = open(filename, 'wb')
    for chunk in r.iter_content(chunk_size=512 * 1024): 
        if chunk:
            f.write(chunk)
    f.close()
    print(f'Downloaded {filename}')
    return 

Downloading Labels
---

We can download the `labels` asset of the `selected_item` by calling the following function: 

In [None]:
download_file(get_download_url(selected_item, 'labels'))

Downloading Metadata
---

Likewise, we can download the documentation pdf and property description csv.

In [None]:
download_file(get_download_url(selected_item, 'documentation'))
download_file(get_download_url(selected_item, 'property_descriptions'))

Downloading Source Imagery
---

For this example, we'll query the API for the download url for three bands of a Sentinel 2 scene associated with this asset.

In [None]:
for link in selected_item['links']:
    if link['rel'] == 'source':
        r = requests.get(f'{link["href"]}?key={API_KEY}')
        source_item = r.json()
        break
download_file(get_download_url(source_item, 'B01'))
download_file(get_download_url(source_item, 'B02'))
download_file(get_download_url(source_item, 'B03'))