# FarmVibes.AI Weed Detection

This notebook demonstrates how to run the weed detection workflow on a raster image collected from a drone or satellite. We assume the image is saved in a remote location such as [Azure File Storage](https://azure.microsoft.com/en-us/products/storage/files/#overview).


### Conda environment setup
Before running this notebook, let's build a conda environment. If you do not have conda installed, please follow the instructions from [Conda User Guide](https://docs.conda.io/projects/conda/en/latest/user-guide/index.html). 

```
$ conda env create -f ./weed_detection.yaml
$ conda activate weed_detection
```

### Notebook outline
The current script in this notebook is configured to download a remote raster image and generate shape files marking similar regions. This is useful in detecting portions of land affected by weeds. The workflow accepts a signed URL to a remote raster image, such as a [shared access signature](https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview) for images stored in Azure, and a geometry for the area of interest. The workflow downloads the raster and trains a Gaussian Mixture Model to group similar regions. The workflow outputs a zip archive with each of the shapefiles defining the partitions.


Below are the main libraries used for this example and other useful links:
- [NumPy](https://github.com/numpy/numpy) is a python package that provides powerful N-dimensional array object, broadcasting functions and useful linear algebra, Fourier transform, and random number capabilities.
- [pandas](https://github.com/scikit-learn/scikit-learn) is a Python package that provides fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive.
- [rasterio](https://github.com/rasterio/rasterio) is a library for reading and writing geospatial raster data. It is used on torchgeo and rioxarray. It is a good option when reading/writing GeoTIFFs.
- [Scikit-Learn](https://github.com/scikit-learn/scikit-learn) is a Python package for machine learning built on top of SciPy. It Simple and efficient tools for predictive data analysis.
- [Shapely](https://github.com/shapely/shapely) is a library for manipulating geometric shapes.

### Imports & API Client

In [None]:
import geopandas as gpd
from vibe_core.client import get_default_vibe_client
from vibe_core.data import ExternalReferenceList
from datetime import datetime
from shapely import geometry as shpg
from urllib import request

client = get_default_vibe_client()

### Running the workflow
This workflow requires a url for a raster image and a geometry defining the boundaries for the region of interest in the raster image. This cell uses a url and a shapefile to generate the input object passed into the weed detection workflow.

In [None]:
url = "<SAS URL>"
boundary_shape_file = ""

now = datetime.now()
data_frame = gpd.read_file(boundary_shape_file).to_crs("epsg:4326")
assert data_frame is not None
geometry = shpg.mapping(data_frame.geometry.iloc[0])
url_hash = str(hash(url))

inputs = ExternalReferenceList(id=url_hash, time_range=(now, now), geometry=geometry, assets=[], urls=[url])

In [None]:
run = client.run(workflow='farm_ai/agriculture/weed_detection', name="weed_detection_example", input_data=inputs)

### Optional Parameters
You can specify optional parameters to tailor the computational complexity or the result smoothing to your needs. Options are supplied via the "parameters" dictionary when running the workflow. These parameters are:
- samples: [int] The number of pixels to sample from the input image. We don’t build the model with all pixels in the image. Rather, we randomly select a subset of pixels to sample when building the model.
- buffer: [int] An offset from each edge of the raster that should be ignored by the pipeline
- grid_size: [int] This is a tunable parameter for categorizing pixels into regions. This affects workflow speed and memory usage, not the output. Larger grids are generally faster until you reach the device’s memory limit
- clusters: [int] This is the number of region classes you would like to identify within the raster. Pixels of the same class may or may not be collocated.
- sieve_size: [int] This is the cutoff for determining if a region is too small to be included
- simplify: One of ["none", "simplify", "convex"] This is the option for simplifying the borders of regions. Options are: [Convex, Simplify, None]
- tolerance: [float] This is the tolerance used in simplification. All parts of a geometry will be no more than tolerance distance from the original. See GeoSeries.simplify for more documentation.



In [None]:
# Uncomment to run with configurable parameters
# run = client.run(workflow='farm_ai/agriculture/weed_detection', name="weed_detection_example", input_data=inputs, parameters={"buffer": -100, "grid_size": 250, "clusters": 4, "sieve_size": 1000,"tolerance": 0.25, "samples": 150000})


### Collecting results
The ouput of the workflow contains a zip archive containing a shape file for each cluster of pixels grouped by the GMM. This code extracts the archive path from the DataVibeDict returned from the workflow.

In [None]:
run.monitor()
# Output is a DataVibeDict
output = run.output
# There was only one input raster to the weed detection workflow so there is only one DataVibe in the result
dv = output['result'][0]
# The DataVibe output by a weed detection workflow instance has only one asset
asset = dv.assets[0]
# Get the asset path containing the generated shape files
archive_path = asset.path_or_url

### Visualizing Output
We use matplotlib and geopandas to display the results of the workflow, overlayed on the input raster. This is a quick way to get a high level view the workflow output. To get a more detailed look, we recommend opening the generated files and input raster in a geographic analysis platform, such as [QGIS](https://www.qgis.org/en/site/). This will allow you to zoom in and more closely examine the results.

In [None]:
import geopandas as gpd
import matplotlib.pyplot as plt
import rasterio
import rasterio.plot
from zipfile import ZipFile

# SAS url or local path to raster
raster_file = ""

# Compute the number of images and layout for plotting
num_clusters = len([name for name in ZipFile(archive_path).namelist() if name.endswith('.shp')])
width = 3
height = num_clusters // width + 1
plt.figure(figsize=(30, 30))
fig, axs = plt.subplots(height, width)

# Remove the axis lables from all subplots
for ax in axs.reshape(-1):
    ax.axis("off")

# Open the raster image and display it in each subplot
raster = rasterio.open(raster_file)
for i in range(1, num_clusters + 2):
    ax = plt.subplot(height, width, i)
    rasterio.plot.show(raster, ax=ax)

# read and plot zip archive via geopandas
for cluster_num in range(num_clusters):
    filename = f"cluster{cluster_num}.shp"
    zipfile = f"zip:///{archive_path}!{filename}"
    cluster = gpd.read_file(zipfile)
    ax = plt.subplot(height, width, cluster_num + 2)
    cluster.plot(ax=ax, color=f'C{9 - cluster_num}')
