<img src='https://radiant-assets.s3-us-west-2.amazonaws.com/PrimaryRadiantMLHubLogo.png' alt='Radiant MLHub Logo' width='300'/>

# A Guide to Access the Data on Radiant MLHub


This notebook walks you through the steps to get access to Radiant MLHub and download and explore the Kenya Crop dataset. These notebooks were originally developed for the CV4A ICRL Crop Type Classification Challenge, which was a part of the [CV4A](https://www.cv4gc.org/cv4a2020/) workshop at 2020 ICLR. 

To explore the data after download, please see the `cv4a-crop-challenge-load-data` in this same repository.

## Radiant MLHub API


The Radiant MLHub API gives access to open Earth imagery training data for machine learning applications. You can learn more about the repository at the [Radiant MLHub site](https://mlhub.earth) and about the organization behind it at the [Radiant Earth Foundation site](https://radiant.earth).

Full documentation for the API is available at [https://mlhub.earth/docs](https://mlhub.earth/docs).

Each item in our collection is explained in json format compliant with [STAC](https://stacspec.org/) [label extension](https://github.com/stac-extensions/label) definition.

## Dependencies

This notebook utilizes the [`radiant-mlhub` Python client](https://pypi.org/project/radiant-mlhub/) for interacting with the API. See the official [`radiant-mlhub` docs](https://radiant-mlhub.readthedocs.io/) for more documentation of the full functionality of that library.

Please see the [`mlhub-tutorials README.md`](https://github.com/radiantearth/mlhub-tutorials/blob/Fix/version-pinning/README.md) for information on how to install dependencies for the noteboooks in this repository.

In [None]:
# Import required libraries
from radiant_mlhub import Dataset
from pathlib import Path
import os

## Authentication

### Create an API Key

Access to the Radiant MLHub API requires an API key. To get your API key, go to [mlhub.earth/profile](https://mlhub.earth/profile). If you have not used Radiant MLHub before, you will need to sign up and create a new account. Otherwise, sign in. In the **API Keys** tab, you'll be able to create API key(s), which you will need. *Do not share* your API key with others: your usage may be limited and sharing your API key is a security risk.

### Configure the Client

Once you have your API key, you need to configure the `radiant_mlhub` library to use that key. There are a number of ways to configure this (see the [Authentication docs](https://radiant-mlhub.readthedocs.io/en/latest/authentication.html) for details). 

For these examples, we will set the `MLHUB_API_KEY` environment variable. Run the cell below to save your API key as an environment variable that the client library will recognize.

*If you are running this notebook locally and have configured a profile as described in the [Authentication docs](https://radiant-mlhub.readthedocs.io/en/latest/authentication.html), then you do not need to execute this cell.*


In [None]:
os.environ['MLHUB_API_KEY'] = 'YOUR API KEY'

## Retrieving the crop type dataset

The dataset for this competition is `ref_african_crops_kenya_02`. Within each Radiant MLHub dataset, the data is stored in thematic collections. A collection represents the top-most data level in [STAC](https://stacspec.org/). Let's take a look at the dataset and its collections. 


In [None]:
dataset = Dataset.fetch('ref_african_crops_kenya_02')

print(f'ID: {dataset.id}')
print(f'Title: {dataset.title}')
print('Collections:')
for collection in dataset.collections:
    print(f'* {collection.id}')

The two collections associated with this dataset are:
- `ref_african_crops_kenya_02_source`: includes the multi-temporal bands of Sentinel-2
- `ref_african_crops_kenya_02_labels`: includes the labels and field IDs

In [None]:
catalog_size = dataset.stac_catalog_size
estimated_dataset_size = dataset.estimated_dataset_size
print("Catalog Size: ", catalog_size) 
print("Estimated Dataset Size: ", estimated_dataset_size) 

Looks like the dataset is estimated to be around 16GB. If you are doing this work locally, make sure you have enough storage space available before moving onto the next step. If you have limited storage, check out one of our other tutorials that deals with smaller datasets (insert tutorials here)

### Downloading the Data

We will download the `ref_african_crops_kenya_02` dataset. As we learned in our dataset exploration above, this dataset includes the `ref_african_crops_kenya_02_source` and `ref_african_crops_kenya_02_labels` collections.

NOTE: If you modify the data download location below, be sure to also update this path in the following tutorial notebook `cv4a-crop-challenge-load-data`.

In [None]:
# Set output path for data download
output_path = Path("./data/").resolve()

In [None]:
dataset.download(output_dir=output_path)

print('Done\n')