# Post-process for Entire District

## Description
Once you are happy with the post-processed results through previous notebook, you can then implement the same post-processing steps for the entire district of interest. To run this analysis, run all the cells in the notebook, starting with the "Load packages" cell.

## Load Packages

In [None]:
%matplotlib inline
import os
import datacube
import warnings
import numpy as np
import geopandas as gpd
import xarray as xr
import rioxarray
from rasterio.enums import Resampling
from datacube.utils.cog import write_cog
from deafrica_tools.spatial import xr_rasterize
from deafrica_tools.plotting import rgb
from deafrica_tools.bandindices import calculate_indices
from deafrica_tools.coastal import get_coastlines
from skimage.morphology import binary_dilation,disk
from skimage.filters.rank import modal
from odc.algo import xr_reproject
import matplotlib.pyplot as plt
import subprocess
from matplotlib.colors import ListedColormap,BoundaryNorm
from matplotlib.patches import Patch

## Analysis parameters
* `prediction_map_path`: File path and name of the classification map for the entire district.
* `dict_map`: A dictionary map of class names corresponding to pixel values.
* `output_crs`: Coordinate reference system for output raster files.

In [None]:
prediction_map_path='Results/Land_cover_prediction_entire_district.tif'
dict_map={'Tree crops':11,'Field crops':12,'Forest plantations':21,'Grassland':31,
                 'Aquatic or regularly flooded herbaceous vegetation':41,'Water body':44,
                 'Settlements':51,'Bare soils':61,'Mangrove':70,'Mecrusse':71,
                'Broadleaved (Semi-) evergreen forest':72,'Broadleaved (Semi-) deciduous forest':74,'Mopane':75}
output_crs='epsg:32736' # WGS84/UTM Zone 36S

## External Layers
The country boundary shapefile and a few external layers were sourced and prepared in the 'Data/' folder, which are helpful to provide information on specific classes, e.g. Settlementss and Water Body. The data includes:
* `country_boundary_shp`: Country boundary shapefile.
* `hand_raster`: Hydrologically adjusted elevations, i.e. Height Above the Nearest Drainage (hand) derived from the [MERIT Hydro dataset](https://developers.google.com/earth-engine/datasets/catalog/MERIT_Hydro_v1_0_1#description).
* `river_network_shp`: OSM river network shapefile. The OSM layers were sourced from the [Humanitarian OpenStreetMap Team (HOT)](https://data.humdata.org/organization/hot) website.
* `road_network_shp`: OSM road network shapefile.
* `google_building_raster`: A rasterised layer of [Google Open Building polygons](https://developers.google.com/earth-engine/datasets/catalog/GOOGLE_Research_open-buildings_v2_polygons), which consist of outlines of buildings derived from high-resolution 50 cm satellite imagery. As there are many polygons in the original vector layer, we rasterised the layer to 10 m resolution to reduce disk storage and memory required for processing.
* `wsf2019_raster`: 2019 [World Settlement Footprint (WSF) layer](https://gee-community-catalog.org/projects/wsf/), a 10m resolution binary mask outlining the extent of human settlements globally derived by means of 2019 multitemporal Sentinel-1 and Sentinel-2 imagery.

> Note: In this notebook we have made the data prepared for you to run through the demonstration. If you would like to apply it to your own project, you may need to spend some time sourcing the datasets and do some pre-processing if needed, e.g. clipping to your study area, filtering, rasterisation or vectorisation. Alternatively you can revise this notebook depending on your data format.

In [None]:
country_boundary_shp='Data/Mozambique_boundary.shp' # country boundary shapefile
river_network_shp='Data/hotosm_moz_waterways_lines_filtered.shp' # OSM river network data
road_network_shp='Data/hotosm_moz_roads_lines_filtered.shp' # OSM road network data
google_building_raster='Data/GoogleBuildingLayer_Mozambique_rasterised.tif' # rasterised google bulding mask layer
hand_raster='Data/hand_Mozambique_UInt16.tif' # Hydrologically adjusted elevations, i.e. height above the nearest drainage (hand)
wsf2019_raster='Data/WSF2019_v1_Mozambique_clipped.tif' # 2019 World Settlement Footprint layer

## Load land cover map and external layers
First let's load the land cover map:

In [None]:
# import land cover map of 2021 and reproject
lc_map=rioxarray.open_rasterio(prediction_map_path).astype(np.uint8).squeeze()

As the external raster layers cover the entire country, they can be too large to be loaded to your Sandbox memory (especially if you are using the default instance) and used for analysis. Therefore, in this notebook we first clipped the layers to the extent of the district using GDAL commands. We first create the shapefiles of the extents of the predicted maps using [`gdaltindex`](https://gdal.org/programs/gdaltindex.html), then clip the raster layers using [`gdalwarp`](https://gdal.org/programs/gdalwarp.html).

In [None]:
google_building_clipped=google_building_raster[:-4]+'_clipped.tif' # clipped google bulding mask layer
hand_raster_clipped=hand_raster[:-4]+'_clipped.tif' # clipped hand layer
wsf2019_raster_clipped=wsf2019_raster[:-4]+'_clipped.tif' # clipped WSF 2019 layer
tile_shp='Results/Mozambique_district_extent.shp' # output region extents
subprocess.run(['gdaltindex',tile_shp,prediction_map_path])
subprocess.run(['gdalwarp','-cutline',tile_shp,'-crop_to_cutline', google_building_raster,google_building_clipped,'-overwrite'])
subprocess.run(['gdalwarp','-cutline',tile_shp,'-crop_to_cutline', hand_raster,hand_raster_clipped,'-overwrite'])
subprocess.run(['gdalwarp','-cutline',tile_shp,'-crop_to_cutline', wsf2019_raster,wsf2019_raster_clipped,'-overwrite'])

We then load other layers. The OSM road network layer contains multi-lines with various surface attributes. We'll select some major road types and buffer them by 10 metres:

In [None]:
# import OSM road network data and reproject
road_network=gpd.read_file(road_network_shp).to_crs(output_crs) 
road_network=road_network.loc[road_network['surface'].isin(['asphalt', 'paved', 'compacted', 'cobblestone', 
                                                             'concrete', 'metal', 'paving_stones', 
                                                             'paving_stones:30'])] # select road network by attributes
road_network.geometry=road_network.geometry.buffer(10) # buffer the road network by 10m

Similarly we load and select major waterways from the OSM river network layer:

In [None]:
river_network=gpd.read_file(river_network_shp).to_crs(output_crs) # import OSM river network data and reproject
river_network=river_network.loc[river_network['waterway'].isin(['canal','river'])] # select river network by attribute

Query and buffer coastline:

In [None]:
# import mozambique boundary and get bounding box
country_boundary=gpd.read_file(country_boundary_shp).to_crs(output_crs)
# load coastline layer and buffer
shorelines_gdf = get_coastlines(country_boundary.bounds.iloc[0],crs=output_crs,layer='shorelines').to_crs(output_crs)
# select only 2021
shorelines_gdf_2021=shorelines_gdf[shorelines_gdf['year']=='2021'] 

## Morphological filtering and apply all rules
We now apply the filtering and all the rules as implemented in the previous notebook:

In [None]:
# convert to numpy array
np_lc_map=lc_map.squeeze().to_numpy()
# mode filtering for a smoother classification map
np_lc_map_postproc=modal(np_lc_map,footprint=disk(2),mask=np_lc_map!=0)

ds_geobox=lc_map.geobox

# majority filtering for a smoother classification map
np_lc_map_postproc=modal(np_lc_map,footprint=disk(2),mask=np_lc_map!=0)
# get geobox
ds_geobox=lc_map.geobox
# load and reproject hand layer
hand=xr.open_dataset(hand_raster_clipped,engine="rasterio").squeeze()
hand=xr_reproject(hand, ds_geobox, resampling="average")
# convert to numpy array
np_hand=hand.to_array().squeeze().to_numpy()
# rasterise river network layer
river_network_mask=xr_rasterize(gdf=river_network,
                                  da=lc_map.squeeze(),
                                  transform=ds_geobox.transform,
                                  crs=output_crs)
# convert to numpy array
np_river_network_mask=river_network_mask.to_numpy()
# apply the rule
np_lc_map_postproc[((np_lc_map==dict_map['Water body'])&(np_hand<=45))
                   |(np_river_network_mask==1)]=dict_map['Water body']
# get bounding box
bbox=ds_geobox.extent.boundingbox
dc = datacube.Datacube(app='s2_geomedian')
query_geomedian= {
    'time': ('2021'),
    'x': (bbox[0],bbox[2]),
    'y': (bbox[1],bbox[3]),
    'resolution':(-10, 10),
    'crs':output_crs,
    'output_crs': output_crs,
    'measurements':['green','swir_1']
}
ds_geomedian = dc.load(product="gm_s2_annual", **query_geomedian)
ds_MNDWI = calculate_indices(ds=ds_geomedian, index='MNDWI', satellite_mission='s2',drop=True).squeeze()
# reassign water using NDWI calculated from annual S2 geomedian
np_MNDWI=ds_MNDWI['MNDWI'].to_numpy()
np_lc_map_postproc[np_MNDWI>=0]=dict_map['Water body']

# load and reproject google buildings raster
google_buildings=xr.open_dataset(google_building_clipped,engine="rasterio").squeeze()
google_buildings_mask=xr_reproject(google_buildings, ds_geobox, resampling="average")
# convert to numpy array
np_google_buildings_mask=google_buildings_mask.to_array().squeeze().to_numpy()
# load and reproject WSF 2019 layer
wsf2019=xr.open_dataset(wsf2019_raster_clipped,engine="rasterio").squeeze()
wsf2019=xr_reproject(wsf2019, ds_geobox, resampling="nearest")
# convert to numpy array
np_wsf2019=wsf2019.to_array().squeeze().to_numpy()
# apply rule
np_lc_map_postproc[(np_google_buildings_mask==1)|(np_wsf2019==1)]=dict_map['Settlements']
# buffer Settlements areas
urban_buffered=binary_dilation(np_lc_map==dict_map['Settlements'],footprint=disk(5))
# apply rule
np_lc_map_postproc[(urban_buffered==1)&(np_lc_map==dict_map['Aquatic or regularly flooded herbaceous vegetation'])]=dict_map['Field crops']
# rasterise road network layer
road_network_mask=xr_rasterize(gdf=road_network,
                              da=lc_map.squeeze(),
                              transform=ds_geobox.transform,
                              crs=output_crs)
# convert to numpy
np_road_network_mask=road_network_mask.to_numpy()
# apply the rule
np_lc_map_postproc[np_road_network_mask==1]=dict_map['Settlements']


# buffer the road network by 50km
shorelines_gdf_2021.geometry=shorelines_gdf_2021.geometry.buffer(50000) 
# rasterise shoreline layer
shorelines_2021_mask=xr_rasterize(gdf=shorelines_gdf_2021,da=lc_map.squeeze(),
                                  transform=ds_geobox.transform,crs=output_crs) 
# clip shoreline mask layers
shorelines_2021_mask_tile=xr_reproject(shorelines_2021_mask, ds_geobox, resampling="nearest") 


# convert to numpy
np_shorelines_2021_mask=shorelines_2021_mask_tile.squeeze().to_numpy()
# apply rule
np_lc_map_postproc[(np_shorelines_2021_mask==0)&(np_lc_map==dict_map['Mangrove'])]=dict_map['Forest plantations']

# reconstruct dataArray
lc_map_postproc=xr.DataArray(data=np_lc_map_postproc,dims=['y','x'],
                         coords={'y':lc_map.y.to_numpy(), 'x':lc_map.x.to_numpy()})
# set spatial reference
lc_map_postproc.rio.write_crs(output_crs, inplace=True)

To compare the post-processed result with initial prediction without post-processing:

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(20, 8))

colours = {11:'gold', 12:'yellow', 21:'darkred',31:'bisque',41:'lightgreen',44:'blue',51:'black',61:'gray',
           70:'red',71:'steelblue',72:'blueviolet',74:'green',75:'chocolate'}
patches_list=[Patch(facecolor=colour) for colour in colours.values()]

# set color legends and color maps parameters
prediction_values=np.unique(lc_map)
cmap=ListedColormap([colours[k] for k in prediction_values])
norm = BoundaryNorm(list(prediction_values)+[np.max(prediction_values)+1], cmap.N)

# Plot initial classified image
lc_map.plot.imshow(ax=axes[0], 
               cmap=cmap,
               norm=norm,
               add_labels=True, 
               add_colorbar=False,
               interpolation='none')

# Plot post-processed classified image
lc_map_postproc.plot.imshow(ax=axes[1], 
               cmap=cmap,
               norm=norm,
               add_labels=True, 
               add_colorbar=False,
               interpolation='none')

# Remove axis on middle and right plot
axes[1].get_yaxis().set_visible(False)
# add colour legends
axes[1].legend(patches_list, list(dict_map.keys()),
    loc='upper center', ncol =4, bbox_to_anchor=(0.5, -0.1))
# Add plot titles
axes[0].set_title('Classified Image')
axes[1].set_title('Classified Image - Postprocessed')

## Save as geotiff
We can now export our post-processed result to sandbox disk as Cloud-Optimised GeoTIFF:

In [None]:
write_cog(lc_map_postproc, 'Results/Land_cover_prediction_postprocessed_entire_district.tif', overwrite=True)