<img src='https://radiant-assets.s3-us-west-2.amazonaws.com/PrimaryRadiantMLHubLogo.png' alt='Radiant MLHub Logo' width='300'/>

# CV4A ICRL Crop Type Classification Challenge
# A Guide to Access the data on Radiant MLHub


This notebook walks you through the steps to get access to Radiant MLHub and access the data for the crop type classification competition being organized as part of the [CV4A](https://www.cv4gc.org/cv4a2020/) workshop at 2020 ICLR. 

## Radiant MLHub API


The Radiant MLHub API gives access to open Earth imagery training data for machine learning applications. You can learn more about the repository at the [Radiant MLHub site](https://mlhub.earth) and about the organization behind it at the [Radiant Earth Foundation site](https://radiant.earth).

Full documentation for the API is available at [https://mlhub.earth/docs](https://mlhub.earth/docs).

Each item in our collection is explained in json format compliant with [STAC](https://stacspec.org/) [label extension](https://github.com/stac-extensions/label) definition.

## Dependencies

This notebook utilizes the [`radiant-mlhub` Python client](https://pypi.org/project/radiant-mlhub/) for interacting with the API. If you are running this notebooks using Binder, then this dependency has already been installed. If you are running this notebook locally, you will need to install this yourself.

See the official [`radiant-mlhub` docs](https://radiant-mlhub.readthedocs.io/) for more documentation of the full functionality of that library.

In [None]:
# Required libraries
from radiant_mlhub import Dataset, client
import tarfile
from pathlib import Path

## Authentication

### Create an API Key

Access to the Radiant MLHub API requires an API key. To get your API key, go to [mlhub.earth/profile](https://mlhub.earth/profile). If you have not used Radiant MLHub before, you will need to sign up and create a new account. Otherwise, sign in. In the **API Keys** tab, you'll be able to create API key(s), which you will need. *Do not share* your API key with others: your usage may be limited and sharing your API key is a security risk.

### Configure the Client

Once you have your API key, you need to configure the `radiant_mlhub` library to use that key. There are a number of ways to configure this (see the [Authentication docs](https://radiant-mlhub.readthedocs.io/en/latest/authentication.html) for details). 

For these examples, we will set the `MLHUB_API_KEY` environment variable. Run the cell below to save your API key as an environment variable that the client library will recognize.

*If you are running this notebook locally and have configured a profile as described in the [Authentication docs](https://radiant-mlhub.readthedocs.io/en/latest/authentication.html), then you do not need to execute this cell.*


In [None]:
import os

os.environ['MLHUB_API_KEY'] = 'PASTE_YOUR_API_KEY_HERE'

## Retrieving the competition dataset

Datasets are stored as collections on Radiant MLHub catalog. A collection represents the top-most data level. Typically this means the data comes from the same source for the same geography. It might include different years or sub-geographies.

The dataset for this competition is `ref_african_crops_kenya_02`.

In [None]:
dataset = Dataset.fetch('ref_african_crops_kenya_02')

print(f'ID: {dataset.id}')
print(f'Title: {dataset.title}')
print('Collections:')
for collection in dataset.collections:
    print(f'* {collection.id}')

The two collections associated with this dataset are:
- `ref_african_crops_kenya_02_source`: includes the multi-temporal bands of Sentinel-2
- `ref_african_crops_kenya_02_labels`: includes the labels and field IDs

### Downloading Labels

ML Hub makes archives available that contain all the assets for a given collection. We will download these archives for the `ref_african_crops_kenya_02_source` and `ref_african_crops_kenya_02_labels` collections.

In [None]:
# output path where you want to download the data
output_path = Path("./data/").resolve()

In [None]:
archive_paths = dataset.download(output_dir=output_path)
for archive_path in archive_paths:
    print(f'Extracting {archive_path}...')
    with tarfile.open(archive_path) as tfile:
        tfile.extractall(path=output_path)

print('Done\n')
