## Accessing Deltares global flood data on Azure

[Deltares](https://www.deltares.nl/en/) has produced a series of global inundation maps of flood depth using a geographic information systems (GIS)-based inundation model that takes into account water level attenuation and is forced by sea level. Multiple datasets were created using various digital elevation models (DEMs) at multiple resolutions under two different sea level rise (SLR) conditions: current (2018) and 2050. 

This notebook provides an example of accessing global flood data from blob storage on Azure.

This dataset is stored in the West Europe Azure region, so this notebook will run most efficiently on Azure compute located in the same region. If you are using this data for environmental science applications, consider applying for an AI for Earth grant to support your compute requirements.

Complete documentation for this dataset is available at https://aka.ms/ai4edata-deltares-gfm.

### Environment setup

In [None]:
# Install a conda package in the current Jupyter kernel
# import sys
# !conda install --yes --prefix {sys.prefix} openpyxl

In [None]:
import math
import warnings

warnings.simplefilter("ignore")

import fsspec
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr

# Not used directly, but needs to be installed to read NetCDF files with xarray
import h5py
import h5netcdf

from dask.distributed import Client

from shapely.geometry import shape
import geopandas as gpd
import pandas as pd
import openpyxl

### Define an area of interest

The data is 90m (3'') at a global scale, but most relevant in coastal areas. Let's zoom in a on a flood-prone region of South Africa (City of Cape Town) by defining a bounding box and clipping our xarray dataset.

In [None]:
coct_geojson = {
    "type": "Polygon",
    "coordinates": [
        [
            [-33.459480262765084, 18.44984306000178],
            [-34.0349865125429, 18.287794733823475],
            [-34.36664252903477, 18.491041786996263],
            [-34.075946082570105, 19.012892328926387]
        ]
    ],
}

poly = shape(coct_geojson)
miny, minx, maxy, maxx = poly.bounds
print("AoI bounds:", poly.bounds)

### Create a local Dask cluster

Enable parallel reads and processing of data using Dask and xarray.

In [None]:
client = Client(processes=False)
print(f"/proxy/{client.scheduler_info()['services']['dashboard']}/status")

### File access

The entire dataset is made up of several dozen individual netCDF files, each representing an entire global inundation map, but derived from either a diferent source DEM, sea level rise condition, or return period. Return periods are occurence probabilities for floods of a particular magnitude, often referred to as, for example, "a 100 year flood".

To start, we'll load and plot the inundation data produced from the 90m NASADEM/MERITDEM at a 0/2/5/10/25/100/250 year return period for 2050 sea level rise conditions. 

In [None]:
# We define the set of parameters to iterate over

dem_source_list = ["NASADEM", "MERITDEM"]
return_period_list = ["0000", "0002", "0005", "0010", "0025", "0100", "0250"]

In [None]:
# We now import the reference grid shapefile

grid = gpd.read_file("grid_reference_500.shp")

In [None]:
# And we convert it to the same geographic format at the input flood map
grid_coord = grid.to_crs(4326)

# We also add the area of a pixel and clean the resulting data frame
grid_coord['pixel_area'] = grid_coord.geometry.area
grid_coord = grid_coord.loc[:, ['ID', 'geometry', 'pixel_area']]

### Geographic Treatment

We first adapt the imported data to the desired flood map format, as with CoCT's grid.
Then, we associate to each grid pixel its share of flood-prone area and a flood depth level associated with the relative share of each flood points in this area.

In [None]:
def make_url(year, dem_source, return_period):
    root = (
        "https://deltaresfloodssa.blob.core.windows.net/floods/v2021.06"
    )
    path = f"{root}/global/{dem_source}/90m"
    file_name = f"GFM_global_{dem_source}90m_{year}slr_rp{return_period}_masked.nc"

    return f"{path}/{file_name}"

In [None]:
def geo_treatment(dem_source, return_period, flooded, grid_coord, year):

    # We keep only relevant data from source

    flooded_inun = flooded['inun']
    da = flooded_inun.to_dataframe().reset_index()

    # We convert associated points to a GeoDataFrame for geographic treatment

    gdf = gpd.GeoDataFrame(
        da.inun, geometry=gpd.points_from_xy(da.lon,da.lat), crs="EPSG:4326")

    # We clean the output

    gdf = gdf.loc[~np.isnan(gdf['inun'])]

    # We create a square buffer of side 90m (~0.03*(1/60) degrees in our geometry) around our centroids
    # This corresponds to the resolution of the flood map
    gdf_buffer = gdf.buffer(0.03*(1/60), cap_style=3)

    # We plot the output for validation
    fig, ax1 = plt.subplots()
    gdf_buffer.boundary.plot(ax=ax1, color = 'slategrey')
    gdf.plot(ax = ax1, color = 'red')

    # We update the geometry of out GeoDataFrame with the buffer polygons

    gdf_new = gdf
    gdf_new['geometry'] = gdf_buffer

    # We save the associated shapefile

    gdf_new.to_file(dem_source + '_' + return_period + '.shp')

    # We get the intersection of the two data frames and get the associated area
    grid_overlay = gpd.overlay(gdf_new, grid_coord)
    grid_overlay['inter_area'] = grid_overlay.geometry.area

    # We plot the output for validation
    fig, ax1 = plt.subplots()
    grid_coord.boundary.plot(ax = ax1, color = 'slategrey')
    grid_overlay.boundary.plot(ax=ax1, color = 'red')

    # Note that the grid perimeter is a bit wider than CoCT's territorial extent, hence some coastal
    # points appear to be inland: this is not a problem as sea points have zero land availability
    # in the model

    # We first compute the proportion of flood-prone area in each pixel
    sum_flood_area = grid_overlay.groupby('ID')['inter_area'].agg('sum')
    pixel_area = grid_overlay.groupby('ID')['pixel_area'].agg('mean')
    prop_flood_prone = pd.Series(sum_flood_area / pixel_area, name='prop_flood_prone')
    # The following correction only applies to two pixels (out of 401)
    prop_flood_prone[prop_flood_prone > 1] = 1

    # We then get the flood depth per pixel as a weighted average of flood depth levels in each
    # pixel intersections with original flood data
    grid_overlay['w_inun'] = grid_overlay['inun'] * grid_overlay['inter_area']
    sum_w_inun = grid_overlay.groupby('ID')['w_inun'].agg('sum')
    flood_depth = pd.Series(sum_w_inun / sum_flood_area, name='flood_depth')

    output = pd.merge(flood_depth, prop_flood_prone, on='ID')
    print(output)

    # We merge back the output with CoCT's grid to cover the full extent of the city
    # (not only the flooded pixels)
    
    slr = ""
    if year == 2050:
        slr = "1"
    elif year == 2018:
        slr = "0"
    
    result = pd.merge(grid_coord, output, left_on='ID', right_index=True, how='outer')
    result.to_file("C_" + dem_source + "_" + slr + "_" + return_period + ".shp")

    # We also export the results to the same format as for FATHOM data

    result_export = result.loc[:,['flood_depth', 'prop_flood_prone']]
    result_export.flood_depth[np.isnan(result_export['flood_depth'])] = 0
    result_export.prop_flood_prone[np.isnan(result_export['prop_flood_prone'])] = 0
    result_export.to_excel("C_" + dem_source + "_" + slr + "_" + return_period + ".xlsx", index=False)
    
    return result

### Loop over parameters

In [None]:
results = []
i = 0

for year in [2018, 2050]:
    for dem_source in dem_source_list:
        for return_period in return_period_list:
        
            with fsspec.open(make_url(year, dem_source, return_period)) as f:
                ds = xr.open_dataset(f, chunks={"lat": 5000, "lon": 5000})
                ds_coct = ds.sel(lat=slice(miny, maxy), lon=slice(minx, maxx))
                # Select only flooded area
                flooded = ds_coct.where(ds_coct.inun > 0).isel(time=0).compute()
            result = geo_treatment(dem_source, return_period, flooded, grid_coord)
            results.append(result)

            i = i + 1