<img src='https://radiant-assets.s3-us-west-2.amazonaws.com/PrimaryRadiantMLHubLogo.png' alt='Radiant MLHub Logo' width='300'/>

# 2019 Mali Crop Type Dataset Tutorial
# A Guide to Access the data on Radiant MLHub


This notebook walks you through the steps to get access to Radiant MLHub and access the data for the 2019 Mali crop type dataset, which will be used for mdoel training to develop a baseline model.

## Radiant MLHub API


The Radiant MLHub API gives access to open Earth imagery training data for machine learning applications. You can learn more about the repository at the [Radiant MLHub site](https://mlhub.earth) and about the organization behind it at the [Radiant Earth Foundation site](https://radiant.earth).

Full documentation for the API is available at [docs.mlhub.earth](docs.mlhub.earth).

Each item in our collection is explained in json format compliant with [STAC](https://stacspec.org/) [label extension](https://github.com/radiantearth/stac-spec/tree/master/extensions/label) definition.

## Dependencies

This notebook utilizes the [`radiant-mlhub` Python client](https://pypi.org/project/radiant-mlhub/) for interacting with the API. If you are running this notebooks using Binder, then this dependency has already been installed. If you are running this notebook locally, you will need to install this yourself.

See the official [`radiant-mlhub` docs](https://radiant-mlhub.readthedocs.io/) for more documentation of the full functionality of that library.

The notebook uses [`rasterio`](https://rasterio.readthedocs.io/en/latest/) for reading the bands from sentinel-2 images and [`numpy`](https://numpy.org/doc/stable/) for array computations.

In [1]:
# Required libraries
from radiant_mlhub import Dataset, client
import tarfile
from pathlib import Path
import rasterio
import os
import numpy as np

## Authentication

### Create an API Key

Access to the Radiant MLHub API requires an API key. To get your API key, go to [dashboard.mlhub.earth](https://dashboard.mlhub.earth). If you have not used Radiant MLHub before, you will need to sign up and create a new account. Otherwise, sign in. In the **API Keys** tab, you'll be able to create API key(s), which you will need. *Do not share* your API key with others: your usage may be limited and sharing your API key is a security risk.

### Configure the Client

Once you have your API key, you need to configure the `radiant_mlhub` library to use that key. There are a number of ways to configure this (see the [Authentication docs](https://radiant-mlhub.readthedocs.io/en/latest/authentication.html) for details). 

For these examples, we will set the `MLHUB_API_KEY` environment variable. Run the cell below to save your API key as an environment variable that the client library will recognize.

*If you are running this notebook locally and have configured a profile as described in the [Authentication docs](https://radiant-mlhub.readthedocs.io/en/latest/authentication.html), then you do not need to execute this cell.*


In [2]:
import os
os.environ['MLHUB_API_KEY'] = 'paste your api key' #paste your api key

## Retrieving the competition dataset

Datasets are stored as collections on Radiant MLHub catalog. A collection represents the top-most data level. Typically this means the data comes from the same source for the same geography. It might include different years or sub-geographies.

The dataset for this competition is `umd_mali_crop_type`.

In [3]:
dataset = Dataset.fetch('umd_mali_crop_type')

print(f'ID: {dataset.id}')
print(f'Title: {dataset.title}')
print('Collections:')
for collection in dataset.collections:
    print(f'* {collection.id}')

ID: umd_mali_crop_type
Title: 2019 Mali CropType Training Data
Collections:
* umd_mali_crop_type_labels
* umd_mali_crop_type_source


The two collections associated with this dataset are:
- `umd_mali_crop_type_source`: includes the multi-temporal bands of Sentinel-2
- `umd_mali_crop_type_labels`: includes the labels and field IDs

### Downloading Labels

ML Hub makes archives available that contain all the assets for a given collection. We will download these archives for the `umd_mali_crop_type_source` and `umd_mali_crop_type_labels` collections.

In [4]:
# output path where you want to download the data
output_path = Path("./data/").resolve()

In [5]:
archive_paths = dataset.download(output_dir=output_path)
for archive_path in archive_paths:
    print(f'Extracting {archive_path}...')
    with tarfile.open(archive_path) as tfile:
        tfile.extractall(path=output_path)

print('Done\n')


  0%|          | 0/0.2 [00:00<?, ?M/s]

  0%|          | 0/473.9 [00:00<?, ?M/s]

Extracting /Users/mac/Downloads/umd_mali_crop_type/data/umd_mali_crop_type_labels.tar.gz...
Extracting /Users/mac/Downloads/umd_mali_crop_type/data/umd_mali_crop_type_source.tar.gz...
Done



In this tutorial, we will make use of the RGB bands from sentinel-2, where
- Red is band `B04`
- Blue is band `B03`
- Green is band `B02`

We will then load the three bands into a single image using `rasterio` and write it into the `rgb_source` folder as seen below

In [2]:
#source and label file paths
data = Path("./data").resolve()
source = Path("./data/umd_mali_crop_type_source").resolve()
label = Path("./data/umd_mali_crop_type_labels").resolve()

In [3]:
#creating the folder for storing the rgb bands
if not os.path.isdir(f"{data}/rgb_source/"):
    os.makedirs(f"{data}/rgb_source/")

In [4]:
# Load in source images
for i in range(0,len(next(os.walk(source))[1])):
        source_items= [
            f"{source}/{next(os.walk(source))[1][i]}/B04.tif", #red
             f"{source}/{next(os.walk(source))[1][i]}/B03.tif", #blue
             f"{source}/{next(os.walk(source))[1][i]}/B02.tif", #green
        ]
        all_bands = np.zeros((len(source_items), 256, 256), dtype=np.uint8) #three bands to read
        profile = None
        for j, item in enumerate(source_items):
            with rasterio.open(item) as dataset:
                bands = dataset.read()
                all_bands[j] = bands[0] #placing the three bands in one image
                profile = dataset.profile
        profile.update(count=len(source_items))

        with rasterio.open(f"{data}/rgb_source/" +next(os.walk(source))[1][i]+'.tif', 'w+', **profile) as dst:
            dst.write(all_bands) #write them in memory