# FarmVibes.AI Crop Segmentation - Dataset Generation

This notebook demonstrates how to generate a dataset for crop land segmentation with the FarmVibes.AI platform. The workflow outputs NDVI timeseries and [Crop Data Layer](https://data.nal.usda.gov/dataset/cropscape-cropland-data-layer#:~:text=The%20Cropland%20Data%20Layer%20%28CDL%29%2C%20hosted%20on%20CropScape%2C,as%20well%20as%20boundary%2C%20water%20and%20road%20layers.) (CDL) maps that are used for training a segmentation model in the following notebooks in this repository.

As provided, this notebook retrieves and preprocesses a region of ~5,000 km² over a 1-year period. **We recommend having at least 500 GB of disk space available. The workflow may take multiple days to run, depending on the number of workers and your VM spec.**


### Conda environment setup
Before running this notebook, let's build a conda environment. If you do not have conda installed, please follow the instructions from [Conda User Guide](https://docs.conda.io/projects/conda/en/latest/user-guide/index.html). 

```
$ conda env create -f ./crop_env.yaml
$ conda activate crop-seg
```

### Notebook outline
The user provides a geographical region and a date range of interest, which are used as input to a FarmVibes.AI workflow that generates the dataset for this task. The workflow consists of downloading and preprocessing Sentinel-2 data, running SpaceEye to obtain cloud-free imagery, and computing daily NDVI indexes at 10m resolution. It also downloads CDL maps in the same time frame at 30m resolution, upsampling them to 10m resolution via nearest neighbor interpolation to be used as ground-truth labels.   

In this notebook, we will:
- Instantiate FarmVibes.AI client
- Load the input geometry and time range from which the workflow will create the dataset
- Run the dataset generation workflow

--------

### Imports & Constants

In [1]:
# Utility imports
from datetime import datetime
from shapely import wkt

# FarmVibes.AI imports
from vibe_core.client import get_default_vibe_client

# FarmVibes.AI workflow name and description
WORKFLOW_NAME = "ml/dataset_generation/datagen_crop_segmentation"
RUN_NAME = "dataset generation for crop segmentation task"

### Generate the dataset with FarmVibes.AI platform

Let's define the region and the time range to consider for this task:
- **Region:** FarmVibes.AI platform expects a `.wkt` file with the polygon of the ROI (an example `input_region.wkt` is already provided);
- **Time Range:** we define the range as a tuple with two datetimes (start and end dates);

In [2]:
input_geometry_path = "./input_region.wkt"
time_range = (datetime(2020, 1, 1), datetime(2020, 12, 31))

# Reading the geometry file 
with open(input_geometry_path) as f:
    geometry = wkt.load(f)

For the crop segmentation task, we will run the `ml/dataset_generation/datagen_crop_segmentation` workflow.

To build the dataset, we will instantiate the FarmVibes.AI remote client and run the workflow:

In [3]:
# Instantiate the client
client = get_default_vibe_client()

In [4]:
client.document_workflow(WORKFLOW_NAME)

In [5]:
# Run the workflow
wf_run = client.run(WORKFLOW_NAME, RUN_NAME, geometry=geometry, time_range=time_range)

`wf_run` is a `VibeWorkflowRun` that holds the information about the workflow execution. A few of its important attributes:
- `wf_run.id`: the ID of the run
- `wf_run.status`: indicate the status of the run (pending, running, failed, or done)
- `wf_run.workflow`: the name of the workflow being executed (i.e., `WORKFLOW_NAME`)
- `wf_run.name`: the description provided by `WORKFLOW_DESC`
- `wf_run.output`: the dictionary with outputs produced by the workflow, indexed by sink names

In case you need to retrieve a previous workflow run, you can use `client.list_runs()` to list all existing executions and find the id of the desired run. It can be recovered by running `wf_run = client.get_run_by_id("ID-of-the-run")`.

We can also use the method `monitor` from `VibeWorkflowRun` to verify the progress of each op/inner workflow of our run.

In [6]:
wf_run.monitor()

Output()

Once finished, we can access the generated outputs through `wf_run.output`.

The list of outputs of the dataset generation workflow is:

To access a specific output, we can do:

In [7]:
cdl_rasters = wf_run.output["cdl"]
ndvi_rasters = wf_run.output["ndvi"]

---------

### Next steps

Now that we ran the workflow, future runs will retrieve the cached results, allowing for easier and faster experimentation with the data.
With the dataset generated, we recommend the following notebooks:
- [Visualization Notebook](./02_visualize_dataset.ipynb) shows the intermediate outputs of the workflow, exploring how Sentinel-2 data is processed into the NDVI rasters
- [Local Training Notebook](./03_local_training.ipynb) shows how we can perform a local training with the data generated by FarmVibes.AI
- [AML Training Notebook](./03_aml_training.ipynb) leverages the computing capabilities of Azure Machine Learning to train the segmentation model