# FarmVibes.AI Crop Segmentation - Azure Machine Learning Training

This notebook demonstrates how to train a neural network with Azure Machine Learning (AML) to segment crops on NDVI timeseries and [Crop Data Layer](https://data.nal.usda.gov/dataset/cropscape-cropland-data-layer#:~:text=The%20Cropland%20Data%20Layer%20%28CDL%29%2C%20hosted%20on%20CropScape%2C,as%20well%20as%20boundary%2C%20water%20and%20road%20layers.) (CDL) maps provided by FarmVibes.AI platform.


### Conda environment setup
Before running this notebook, let's build a conda environment. If you do not have conda installed, please follow the instructions from [Conda User Guide](https://docs.conda.io/projects/conda/en/latest/user-guide/index.html). 

```
$ conda env create -f ./crop_env.yaml
$ conda activate crop-seg
```

-----------

### Azure Machine Learning workspace setup
Before running this notebook, please, make sure to define the subscription id, resource group, and AML workspace that will be used to run model training. If you do not have a workspace configured, please follow the instructions from [Azure ML Quickstart](https://learn.microsoft.com/en-us/azure/machine-learning/quickstart-create-resources).

In [1]:
SUBSCRIPTION_ID = "<SUBSCRIPTION_ID>"
RESOURCE_GROUP_NAME = "<RESOURCE_GROUP>"
WORKSPACE_NAME = "<AML_WORKSPACE_NAME>"

-------

### Notebook outline
In this notebook, we will train the segmentation model using FarmVibes.AI data through AML. Our first step is to create chips/patches from the NDVI stacks and CDL maps, and upload them to an AML workspace. We will setup the AML environment and computing resource, and train the segmentation model over uploaded chips. Once trained, we export the model as an ONNX file that can be used within FarmVibes.AI cluster, as shown in the [Inference Notebook](./04_inference.ipynb). 


Below are the main libraries used for this example and other useful links:
- **Geospatial data manipulation**:
    - [Shapely](https://github.com/shapely/shapely) is a library for manipulating geometric shapes.
    - [xarray](https://github.com/pydata/xarray) and the extension [rioxarray](https://github.com/corteva/rioxarray) are used for merging and visualizing predictions.
- **Model definition, training and exportation**:
    - [Pytorch](https://github.com/pytorch/pytorch) is used as our deep learning framework.
    - [TorchGeo](https://github.com/microsoft/torchgeo) is a library built for training models on geospatial data. We use it to dinamically sample fixed-sized chips to train/evaluate our model. We define torchgeo dataset at notebook_lib/datasets.py
    - [Pytorch-Lightning](https://github.com/Lightning-AI/lightning) is wrapper over pytorch to reduce boilerplate code for training and evaluating models. We define lightning modules at notebook_lib/modules.py.
    - [onnx](https://onnx.ai/get-started.html) is a library for exporting machine learning models to a interoperable format.
- **Azure Machine Learning**:
    - [AzureML Python SDK](https://learn.microsoft.com/en-us/azure/machine-learning/): is a package that offers multiple ways to interact with AML environment. In this notebook, we use it to connect to the AML workspace, upload data generated by FarmVibes.AI platform, submit a training job.

### Code organization 
The training code is mainly organized into:

- The datasets (`notebook_lib/datasets.py`) containing the code for loading and preparing the data produced by FarmVibes.AI.
- The lightning data module (`notebook_lib/modules.py`) contains the code for data loaders. The modules are responsible for loading NDVI and CDL rasters from FarmVibes.AI and pre-generate the chips to be uploaded to AML workspace. It also include the lightning data module employed within the AML pipeline to load and preprocess the training and validation chips. 
- The lightning module (`notebook_lib/models.py`) contains the code for running/training/evaluating the neural network: instantiating the neural network, training steps, computing metrics, and others. If you want to change the architecure, the loss, and generally how the model is trained, this is probably where you should go to. Also check [pytorch-lightning documentation](https://pytorch-lightning.readthedocs.io/en/latest/).
- Two utility subpackages (`notebook_lib/utils.py` and `notebook_lib/constants.py`) with supporting code for monitoring the workflow execution, defining crop indexes constants, etc.

----------

### Imports & Constants

General and utility imports:

In [2]:
from datetime import datetime
from shapely import wkt
import os

Dataset generation imports (FarmVibes.AI and data modules):

In [3]:
# FarmVibes.AI client
from vibe_core.client import get_default_vibe_client

# notebook_lib imports
from notebook_lib.modules import CropSegDataModule, save_chips_locally
import notebook_lib.constants as constants

# Dataset constants
CHIP_SIZE = 256
EPOCH_SIZE = 1024
BATCH_SIZE = 32
NDVI_STACK_BANDS = 37
VAL_RATIO = 0.3

  warn(f"Failed to load image Python extension: {e}")


AML imports and training definitions:

In [16]:
# AML imports
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential
from azure.ai.ml import MLClient
from azure.ai.ml.entities import AmlCompute, Environment, Data, Model
from azure.ai.ml.constants import AssetTypes, InputOutputModes
from azure.ai.ml import command, Input, Output
from azure.core.exceptions import ResourceNotFoundError

# ONNX package for inference
import onnx


# AML constants
AML_ROOT_DIR = "./aml"
AML_ENV_PATH = os.path.join(AML_ROOT_DIR, "aml_env.yaml")
AML_DATASET_DIR = os.path.join(AML_ROOT_DIR, "dataset")
AML_CODE_DIR = "./notebook_lib"

# AML Compute Instance name and VM Size (will reuse if CI with name exists)
AML_COMPUTE_INFO = {
    "name": "crop-seg-compute",
    "size": "Standard_NC6"
}

# AML Environment name and path to the conda yaml file
AML_ENV_INFO = {
    "name": "crop-seg-env",
    "path": os.path.join(AML_ROOT_DIR, "aml_env.yaml")
}

# AML Chip dataset name, version, and description
AML_DATASET_INFO = {
    "name": "dataset_crop_seg",
    "version": "1",
    "description": "Crop Segmentation Dataset"
}

# Registration name and version of model that will be trained and stored in AML
AML_MODEL_INFO = {
    "name": "crop_seg_model",
    "version": "1",
    "aml_root_path": "azureml://datastores/workspaceblobstore/paths/"
}

# Training hyperparameters
LR = 1e-3  # Learning rate
WD = 1e-2  # Weight decay
MAX_EPOCHS = 10  # How many epochs to train for in AML

# Change the number of workers depending on the available shared memory
NUM_WORKERS = 4 
SHARED_MEMORY = "16g"
NUM_GPUS = 1

### Retrieve the dataset with FarmVibes.AI platform

We will retrieve the dataset from FarmVibes.AI cache by running the dataset generation workflow once again:

In [7]:
input_geometry_path = "./input_region.wkt"
time_range = (datetime(2020, 1, 1), datetime(2020, 12, 31))

# Reading the geometry file 
with open(input_geometry_path) as f:
    geometry = wkt.load(f)


# Instantiate the client
client = get_default_vibe_client()

# Run the workflow
wf_run = client.run("ml/dataset_generation/datagen_crop_segmentation", 
                    "Retrieve dataset cached outputs",
                    geometry=geometry, 
                    time_range=time_range
                    )

wf_run.monitor()

cdl_rasters = wf_run.output["cdl"]
ndvi_rasters = wf_run.output["ndvi"]

Output()

### Preprocess data and generate chips

With both cdl and ndvi rasters, we will use the `CropDataModule` (from `notebook_lib.modules.py`) to preprocess them and save the chips that will be uploaded to AML workspace. 

As each CDL map represents a single year, we will combine multiple NDVI rasters along the year, stacking in the channel dimension a number of rasters equal to `NDVI_STACK_BANDS`. In this notebook, we set `NDVI_STACK_BANDS = 37`, which means a 10-day interval between each ndvi raster of a year.

The preprocess consists of stacking the NDVI rasters and upsampling the CDL maps. For this notebook, the DataLoader also splits the ROI in two disjoint regions and extracts chips within each of them for training and validation.

`CropDataModule` has the following arguments:

- `ndvi_rasters`: NDVI rasters generated by FarmVibes.AI workflow.
- `cdl_rasters`: CDL maps downloaded by FarmVibes.AI workflow.
- `ndvi_stack_bands`: how many daily NDVI maps will be stacked to be used as input for training. Default: 37
- `img_size`: tuple that defines the size of each chip that is fed to the network. Default: (256, 256)
- `epoch_size`: how many samples are sampled during training for one epoch (this is for the random sampler used in training). Default: 1024
- `batch_size`: how many samples are fed to the network in a single batch. Default: 16
- `num_workers`: how many worker processes to use in the data loader. Default: 4
- `val_ratio`: how much of the data to separate for validation. Default: 0.2
- `positive_indices`: which CDL indices are considered as positive samples. Crop types with a minimum of 1e5 pixels in the RoI are available in the module `notebook_lib.constants`. You can combine multiple constants by adding them (e.g., `constants.POTATO_INDEX + constants.CORN_INDEX`) Default: `constants.CROP_INDICES`
- `train_years`: years used for training. Default: [2020]
- `val_years`: years used for validation. Default: [2020]

In [8]:
data = CropSegDataModule(
    ndvi_rasters,
    cdl_rasters,
    ndvi_stack_bands=NDVI_STACK_BANDS,
    img_size=(CHIP_SIZE, CHIP_SIZE),
    epoch_size=EPOCH_SIZE,
    batch_size=BATCH_SIZE,
    num_workers=NUM_WORKERS,
    positive_indices=constants.CROP_INDICES,
    val_ratio=VAL_RATIO,
)

data.setup()

Converting CDLMask CRS from EPSG:5070 to EPSG:32611
Converting CDLMask resolution from 30.0 to 10.0


And generate the training and validation chips and save them to a local folder:

In [None]:
save_chips_locally(data.train_dataloader(), os.path.join(AML_DATASET_DIR, "train"))
save_chips_locally(data.val_dataloader(), os.path.join(AML_DATASET_DIR, "val"))

------------

## AML

The remainder of this notebook differentiates from the [Local Training Notebook](./03_local_training.ipynb) by leveraging AML capacities to train the segmentation model based on the dataset generated so far. In this sense, we will upload the chips to an AML workspace and submit a training job.

As a first step, let's log into Azure through:

In [None]:
!az login --use-device-code

We will instantiate a MLClient, connected to the workspace in the Azure subscription: 

In [10]:
try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    # This will open a browser page for
    credential = InteractiveBrowserCredential()


ml_client = MLClient(
    credential=credential,
    subscription_id=SUBSCRIPTION_ID,
    resource_group_name=RESOURCE_GROUP_NAME,
    workspace_name=WORKSPACE_NAME,
)


Class RegistryOperations: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.


With the client set up, we can upload the training and validation chips stored in `AML_DATASET_DIR` to the workspace:

In [None]:
my_data = Data(
    path=AML_DATASET_DIR,
    type=AssetTypes.URI_FOLDER,
    name=AML_DATASET_INFO["name"],
    version=AML_DATASET_INFO["version"]
)

ml_client.data.create_or_update(my_data)

An AML Compute is a managed-compute infrastructure that allows us to easily create a compute instance within the workspace. For information on the compute instances types and sizes available, refer to [AML Documentation](https://learn.microsoft.com/en-us/azure/machine-learning/concept-compute-target).

In this example, we will create an Azure Compute Cluster with GPU. If a compute instance with the same name already exists, we will reuse it: 

In [12]:
try:
    # let's see if the compute target already exists
    compute_resource = ml_client.compute.get(AML_COMPUTE_INFO["name"])
    print(f"You already have a cluster named {AML_COMPUTE_INFO['name']}, we'll reuse it as is.")
except ResourceNotFoundError:
    print("Creating a new compute target...")

    # Let's create the Azure ML compute object with the intended parameters
    compute_resource = AmlCompute(
        # Name assigned to the compute cluster
        name=AML_COMPUTE_INFO["name"],
        # Azure ML Compute is the on-demand VM service
        type="amlcompute",
        # VM Family
        size=AML_COMPUTE_INFO["size"],
        # Minimum running nodes when there is no job running
        min_instances=0,
        # Nodes in cluster
        max_instances=4,
        # How many seconds will the node running after the job termination
        idle_time_before_scale_down=300,
        # Dedicated or LowPriority. The latter is cheaper but there is a chance of job termination
        tier="Dedicated",
    )

    # Now, we pass the object to MLClient's create_or_update method
    compute_resource = ml_client.begin_create_or_update(compute_resource).result()

print(
    f"AMLCompute with name {compute_resource.name} is created, the compute size is {compute_resource.size}"
)

You already have a cluster named crop-seg-compute, we'll reuse it as is.
AMLCompute with name crop-seg-compute is created, the compute size is STANDARD_NC6


As we create our compute instance, we will need to configure an execution environment with the packages required to run our training script. We provide a minimum conda environment yaml that we will use in this example:

In [13]:
pipeline_job_env = Environment(
    name=AML_ENV_INFO["name"],
    description="Custom environment for the crop segmentation pipeline",
    conda_file=AML_ENV_INFO["path"],
    image = "mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.0.3-cudnn8-ubuntu18.04"
)
pipeline_job_env = ml_client.environments.create_or_update(pipeline_job_env)

print(
    f"Environment with name {pipeline_job_env.name} is registered to workspace, the environment version is {pipeline_job_env.version}"
)

Environment with name crop-seg-env is registered to workspace, the environment version is 13


We can now submit the training job to AML. To do so, we need to get a reference to the upload dataset and build the command with all the necessary parameters:

In [17]:
registered_data_asset = ml_client.data.get(name=AML_DATASET_INFO["name"], version=AML_DATASET_INFO["version"])

my_job_inputs = {
    "dataset": Input(type=AssetTypes.URI_FOLDER, path=registered_data_asset.id, mode=InputOutputModes.DOWNLOAD),
    "ndvi_stack_bands": NDVI_STACK_BANDS,
    "batch_size": BATCH_SIZE,
    "max_epochs": MAX_EPOCHS,
    "learning_rate": LR,
    "weight_decay": WD,
    "num_workers": NUM_WORKERS,
    "num_gpus": NUM_GPUS
}

my_jobs_outputs = {
    "onnx_model_path": Output(type=AssetTypes.URI_FILE, 
                              path = os.path.join(AML_MODEL_INFO["aml_root_path"], f"{AML_MODEL_INFO['name']}_{AML_MODEL_INFO['version']}.onnx"), 
                              mode = "upload")
}

command_str = (
    "python aml_train_script.py "
    "--dataset ${{inputs.dataset}} "
    "--onnx_model_path ${{outputs.onnx_model_path}} "
    "--ndvi_stack_bands ${{inputs.ndvi_stack_bands}} "
    "--batch_size ${{inputs.batch_size}} "
    "--max_epochs ${{inputs.max_epochs}} "
    "--learning_rate ${{inputs.learning_rate}} "
    "--weight_decay ${{inputs.weight_decay}} "
    "--num_workers ${{inputs.num_workers}} "
    "--num_gpus ${{inputs.num_gpus}}"
)

job = command(
    code=AML_CODE_DIR,
    command=command_str,
    inputs=my_job_inputs,
    outputs=my_jobs_outputs,
    environment=f"{AML_ENV_INFO['name']}@latest" ,
    compute=AML_COMPUTE_INFO["name"],
    experiment_name="crop_segmentation",
    display_name="crop_segmentation",
    shm_size=SHARED_MEMORY
)

# submit the command
returned_job = ml_client.create_or_update(job)

To monitor the status of the job, MLClient provides an endpoint to inspect the progress:

In [15]:
returned_job.services["Studio"].endpoint

'https://ml.azure.com/runs/quiet_library_zq0vncy8p6?wsid=/subscriptions/360f1ea9-ce0e-4441-ab8b-a18aae98809b/resourcegroups/eywa/workspaces/farmvibes-ai-dev&tid=72f988bf-86f1-41af-91ab-2d7cd011db47'

### Registering and exporting the model after training

Our code exports the model in the [Open Neural Network Exchange](https://onnx.ai/) (ONNX) format. ONNX is a open source format that represents machine learning models, both deep learning and traditional ML. It is supported by many frameworks, tools and hardware, enabling interoperability between different components easily. Exporting our trained model as an ONNX file allows us to load it and perform inference over new data under different hardware setups and even within FarmVibes.AI platform. For additional resources, refer to the [ONNX](https://onnx.ai/get-started.html) or [PyTorch](https://pytorch.org/docs/master/onnx.html) documentations.

Once the training job completes, we will register the output onnx file in AML:


In [None]:
run_model = Model(
    path=returned_job.outputs["onnx_model_path"].path,
    name=AML_MODEL_INFO["name"],
    version=AML_MODEL_INFO["version"],
    description="Exported ONNX model for crop segmentation",
    type="custom_model"
)

ml_client.models.create_or_update(run_model) 

Once registered, we can download it locally:

In [None]:
ml_client.models.download(AML_MODEL_INFO["name"], version=AML_MODEL_INFO["version"], download_path=AML_ROOT_DIR)

Let's use ONNX checker to verify the model was exported successfully: 

In [20]:
onnx_output_path = os.path.join(AML_ROOT_DIR, 
                                AML_MODEL_INFO["name"], 
                                f"{AML_MODEL_INFO['name']}_{AML_MODEL_INFO['version']}.onnx"
                                )

onnx_model = onnx.load(onnx_output_path)
onnx.checker.check_model(onnx_model)

---------

### Next steps

With the model trained and exported into an ONNX file, we recommend following to the [Inference Notebook](./04_inference.ipynb) to see how the model can be used within FarmVibes.AI cluster for segmenting new regions.
Besides that, we also recommend checking the [Local Training Notebook](./03_local_training.ipynb), for an example on how to train the segmentation model locally.