# Water Indices - Sentinel-2 Full Resolution

This notebook examines how well water indices like NDWI can detect flooding - specifically in Dakar. The default resolution is 10m (-0.00009, 0.00009), but can be changed. Sentinel-2 has the highest resolution of the datasets used here: 10m for the red, green, blue, and nir bands and 20m for all other data.

The hope is that a higher resolution than 30m (as is used in the [Water Indices notebook](../water_inds.ipynb)) may allow these water indicies to detect the flooding.

Images are exported to an `images` directory within this directory.

The only image created is `haz_map_water_median_minus_median_fig.png` which shows the hazard map and the differences between the medians of the water indicies for the flood dates and the medians of the corresponding water indicies from 2009-2019.

# Index

* Import dependencies, setup Dask client, and connect to the data cube
* Load flood hazard data from World Bank
* Show area to load data for
* Load geospatial data
    * Sentinel-2
* Merge data
* Mask out the ocean and lakes and obtain the flood hazard map as an xarray
* Show medians of flood dates water indices minus medians for 2009-2019

## Import dependencies, setup Dask client, and connect to the data cube

In [2]:
from collections import ChainMap

import matplotlib.pyplot as plt
import geopandas as gpd
import xarray as xr
import pandas as pd
import numpy as np
import joblib
import os

import sys
sys.path.append('../..')
from utils.ceos_utils.dc_display_map import display_map
from utils.deafrica_utils.deafrica_bandindices import \
    calculate_indices
from utils.deafrica_utils.deafrica_datahandling import load_ard

import datacube
dc = datacube.Datacube()

In [3]:
# from utils.ceos_utils.dask import create_local_dask_cluster
# client = create_local_dask_cluster()

In [64]:
from dask.distributed import Client

client = Client("tcp://127.0.0.1:46759")
client

0,1
Client  Scheduler: tcp://127.0.0.1:46759  Dashboard: /user/jcrattz/proxy/34655/status,Cluster  Workers: 31  Cores: 62  Memory: 515.40 GB


## Load flood hazard data from World Bank

In [65]:
dakar_flood_hazard = gpd.read_file('../../floodareas/eo4sd_dakar_fhazard_2018/EO4SD_DAKAR_FHAZARD_2018.shp')

**Remove records with no geometry data**

In [66]:
dakar_flood_hazard = dakar_flood_hazard[[dakar_flood_hazard.geometry[i] is not None for i in range(len(dakar_flood_hazard))]]

**Change the CRS to EPSG:4326**

In [67]:
dakar_flood_hazard = dakar_flood_hazard.to_crs("EPSG:4326")

**Get the bounding box of the data**

In [68]:
dakar_bounds = dakar_flood_hazard.bounds
min_lon = dakar_bounds.minx.min()
max_lon = dakar_bounds.maxx.max()
min_lat = dakar_bounds.miny.min()
max_lat = dakar_bounds.maxy.max()
lat = (min_lat, max_lat)
lon = (min_lon, max_lon)

## Show area to load data for

In [69]:
## Dakar, Senegal
# Small test
# lat = (14.8270, 14.8422)
# lon = (-17.2576, -17.2172)
# Citizen Science Study Area
# lat = (14.7711, 14.7993)
# lon = (-17.3706, -17.3366)
# Tip
# lat = (14.6433, 14.7892)
# lon = (-17.5408, -17.4158)
# Full
lat = (14.6285, 14.8725)
lon = (-17.5348, -17.2068)

## Coast of Sengal
# North
# lat = (14.3559, 16.0974)
# lon = (-17.5683, -16.4543)
# Full
# lat = (12.3016, 16.1810)
# lon = (-17.8198, -16.3257)

In [70]:
display_map(lat, lon)

## Load geospatial data

**Specify time range and common load parameters**

In [71]:
years = range(2009, 2020) # (inclusive, exclusive)
time_range = [f"{years[0]}-01-01", f"{years[-1]}-12-31"]

### Flood Times ###

## EO4SD Hazard Map Flood Times ##

# Actual times are these. Ranges we choose for this list (`eo4sd_hazard_map_time_ranges`)
# are to get more data where some may be missing.
# Landsat 5: [2009-10-22, 2010-10-25, 2011-10-12]
# Landsat 8: [2013-10-01, 2014-11-21, 2015-11-08]
# Sentinel-2: [2016-10-30, 2017-10-10, 2018-10-15]
eo4sd_hazard_map_times = np.array([
    "2009-10-22", "2010-10-25", "2011-10-12",
    "2013-10-01", "2014-11-21", "2015-11-08",
    "2016-10-30", "2017-10-10", "2018-10-15"
])

eo4sd_hazard_map_time_ranges = \
[("2009-10-21", "2009-10-24"), ("2010-10-24", "2010-10-27"),
 ("2011-10-11", "2011-10-14"), ("2013-09-30", "2013-10-03"),
 ("2014-11-20", "2014-11-23"), ("2015-11-07", "2015-11-10"),
 ("2016-10-29", "2016-11-01"), ("2017-10-09", "2017-10-13"),
 ("2018-10-14", "2018-10-17")]
# [("2009-10-22", "2009-10-23"), ("2010-10-25", "2010-10-26"),
#  ("2011-10-12", "2011-10-13"), ("2013-10-01", "2013-10-02"),
#  ("2014-11-21", "2014-11-22"), ("2015-11-08", "2015-11-09"),
#  ("2016-10-30", "2016-11-01"), ("2017-10-10", "2017-10-13"),
#  ("2018-10-15", "2018-10-16")]

## End EO4SD Hazard Map Flood Times ##

## Citizen Science Data Times ##

cit_sci_times = \
[]
# [("2009-03-01", "2009-03-30"), ("2009-10-11", "2009-10-18"),
#  ("2012-03-01", "2012-03-15"), ("2012-08-01", "2012-09-30")]

## End Citizen Science Data Times ##

time_ranges_floods = sorted(eo4sd_hazard_map_time_ranges + cit_sci_times)

### End Flood Times ###

common_load_params = \
    dict(output_crs="EPSG:4326",
         # 10m resolution for Sentinel-2 red, green, blue, nir bands
         resolution=(-0.00009,0.00009),
         latitude=lat, longitude=lon,
         group_by='solar_day',
         dask_chunks={'time':40, 
                      'latitude':2000, 
                      'longitude':2000})

### Sentinel-2

In [76]:
s2_data = load_ard(dc=dc, products=['s2_l2a'], 
                       measurements=[
                           'blue', 'green', 'red', 'nir', 'swir_1', 'swir_2'],
                       time=time_range,
                       **common_load_params)
s2_water_data = []
s2_water_median = []
water_inds = ['NDWI', 'AWEI_sh', 'AWEI_ns', 'MNDWI', 'TCW']
for water_ind in water_inds:
    s2_water_ind_data = calculate_indices(s2_data, index=water_ind, collection='s2')[water_ind]
    s2_water_data.append(s2_water_ind_data)
    s2_water_median.append(s2_water_ind_data.median('time'))
s2_water_data = xr.merge(s2_water_data)
s2_water_median = xr.merge(s2_water_median)

s2_water_floods = xr.concat([s2_water_data.sel(time=slice(*time_range_flood)) for time_range_flood in time_ranges_floods], dim='time')
s2_water_floods_median_minus_median = (s2_water_floods.median('time') - \
                                       s2_water_median).persist()

Using pixel quality parameters for Sentinel 2
Finding datasets
    s2_l2a
Applying pixel quality/cloud mask
Returning 199 time steps as a dask array


## Merge data

**Median of water indices for flood dates minus medians for 2009 - 2019**

In [77]:
water_floods_median_minus_median = \
    s2_water_floods_median_minus_median.compute()

**Water for flood dates**

In [78]:
water_floods = s2_water_floods.compute()

## Mask out the ocean and lakes and obtain the flood hazard map as an xarray

In [79]:
s2_land_mask = s2_water_floods.NDWI.mean('time') < 0.05

In [80]:
from utils.deafrica_utils.deafrica_spatialtools import xr_rasterize

flood_hazard_enc = {0:'No Risk', 1:'Low Risk', 2:'Medium Risk', 3:'High Risk'}
flood_hazard_masks = \
{0: xr_rasterize(dakar_flood_hazard[dakar_flood_hazard['RISKCODE_H']==0], 
                 water_floods).astype(np.bool).where(s2_land_mask, False),
 1: xr_rasterize(dakar_flood_hazard[dakar_flood_hazard['RISKCODE_H']==1], 
                 water_floods).astype(np.bool).where(s2_land_mask, False),
 2: xr_rasterize(dakar_flood_hazard[dakar_flood_hazard['RISKCODE_H']==2], 
                 water_floods).astype(np.bool).where(s2_land_mask, False),
 3: xr_rasterize(dakar_flood_hazard[dakar_flood_hazard['RISKCODE_H']==3], 
                 water_floods).astype(np.bool).where(s2_land_mask, False)}

Rasterizing to match xarray.DataArray dimensions (2712, 3646)
Rasterizing to match xarray.DataArray dimensions (2712, 3646)
Rasterizing to match xarray.DataArray dimensions (2712, 3646)
Rasterizing to match xarray.DataArray dimensions (2712, 3646)


In [81]:
flood_hazard_map = None
for val, mask in flood_hazard_masks.items():
    if flood_hazard_map is None:
        flood_hazard_map = xr.full_like(mask, val)
    else:
        flood_hazard_map = flood_hazard_map.where(~mask, val)
flood_hazard_map = flood_hazard_map.where(s2_land_mask)

## Show medians of flood dates water indices minus medians for 2009-2019

In [82]:
import matplotlib as mpl
mpl.rcParams['figure.dpi'] = 400

In [90]:
def create_flood_date_figs_agg(dataset, base_file_name, agg='mean'):
    assert agg in ['mean', 'median'], "The variable agg must be one of ['mean', 'median']"
    
    nrows = 2
    ncols = 1 + int(np.floor(len(dataset.data_vars)/nrows))
    
    fig, ax = plt.subplots(nrows, ncols, figsize=(7*ncols, 5*nrows))
    
    current_ax = ax[0,0]
    flood_hazard_map.plot.imshow(ax=current_ax)
    current_ax.set_title(f'Given Flood Hazard Map')
    for data_var_ind, data_var in enumerate(dataset.data_vars):
        vmin = 0
        vmax = dataset[data_var].quantile(0.95).values
        current_ax = ax[int((data_var_ind+1)/ncols), ((data_var_ind+1)%ncols)]
        dataset[data_var].where(s2_land_mask).plot.imshow(ax=current_ax, cmap='Blues', vmin=vmin, vmax=vmax)
        current_ax.set_title(f'{agg.capitalize()} of flood dates {data_var} - minus median')
    plt.tight_layout()
    if not os.path.exists('images'):
        os.mkdir('images')
    plt.savefig(f'images/{base_file_name}.png')
    plt.clf()
    return None

In [91]:
create_flood_date_figs_agg(water_floods_median_minus_median, 
                               f'haz_map_water_median_minus_median_fig', 
                               agg='median')

<Figure size 8400x4000 with 0 Axes>