# CoCliCo --- Orléans Notebook Demonstration (October 2022)

Coastal Climate Core Services is an European effort to develop an opean web-platform to aid decision making on coastal risk (2021 - 2025). Please have a look at our website to find out more about the project: https://coclicoservices.eu/

During this project several datasets will be made available, which can be explored on the platform as well as accessed via cloud-storage buckets. In this notebook, some examples are provided on how to interact with the data using Python. 

- Notebook author: Floris Calkoen & Etiënne Kras, 12 May 2022
- Notebook update: Etiënne Kras, 16 Sep 2022

## Requirements

The  following Python libraries are required: 

- zarr 
- xarray
- gcsfs

Additional use of the following libraries is recommended: 

- Dask
- GeoPandas
- Rioxarray

These libraries have several dependencies, including: 
- Numpy
- Matplotlib
- Pandas
- gdal (C library)

## JRC Storm Surge Dataset

The data is hosted in cloud buckets, mostly in zarr format. [Zarr](https://zarr.readthedocs.io/en/stable) is a format for storage of chunked, compressed, N-dimensional arrays. 

Here, we use the JRC Storm Surge dataset as an example of how to work with Zarr datasets. The dataset ([Vousdoukas et al. 2016](https://link.springer.com/article/10.1007/s00382-016-3019-5)) describes historical and future storm surge levels under different RCP scenario's (RCP 4.5 and 8.5) at European scale for the 21st century. 

## (OPTIONAL): Install required packages (require for Google Colab)

Running the following cell will install required packages. If this notebook is run on a local environment it is recommended to create an isolated environment. Zarr is required to work with zarr format data, xarray to read the data into Python,  gcsfs to work with data stored on Google cloud buckets and ipytree to show the directory tree ofthe zarr store.

In [None]:
# !pip install zarr -q
# !pip install xarray[complete] -q
# !pip install gcsfs -q
# !pip install ipytree -q
# !pip install pystac-client -q
# !pip install eccodes -q

## Imports 

In [None]:
# Xarray is a Python library to work with labelled multi-dimensional arrays. The
# library is able to directly load data from an url.

import warnings

import matplotlib.pyplot as plt
import pystac_client
import xarray as xr

# load packages
import zarr

warnings.filterwarnings("ignore")

## Connect to the CoCliCo STAC catalog

In [None]:
catalog = pystac_client.Client.open(
    "https://storage.googleapis.com/dgds-data-public/coclico/coclico-stac/catalog.json"
)
catalog

In [None]:
list(catalog.get_children())

## Get the Hypertext Reference (href)

In [None]:
href = catalog.get_child("ssl").assets["data"].href

## Read data with xarray using zarr backend

Might fail in colab with AttributeError: "'EntryPoints' object has no attribute 'get'". In colab there is no package manager, so package versions are hard to control. Since this is a relatively [new issue](https://github.com/python/importlib_metadata/issues/409) we need to find a hacky work around. When creating a local environment this shouldn't be an issue. 

In [None]:
# Printing the dataset object shows the storm surge level consists of three dimensions.
# Every storm surge level is associated with a certain station, scenario and revisting period.
ds = xr.open_zarr(href)
ds

## Preparing a plot example

Here we replicate one of the plots from the paper. 

In [None]:
# return period (x-axis) will be same for all scenario's
return_period = ds["rp"].values

In [None]:
# function to extract ssl values at certain station for certain scenario
f_ssl_values = (
    lambda station, scenario: ds.isel(stations=station)
    .sel(nscenarios=scenario)["ssl"]
    .values
)

In [None]:
# Here the function defined in the previous cell is used to extract storm surge
# levels for station x. Matplotlib is used to generate a figure and axes object.
# The matplotlib axes object (ax) contains a method to add data to the axis. This
# method is used to add the data for all three different scenarios, hence, the
# f_ssl_values function is used three times.

station = 1  # select station

fig, ax = plt.subplots(figsize=(16, 9))
ax.plot(return_period, f_ssl_values(station=station, scenario=0), label="Historical")
ax.plot(return_period, f_ssl_values(station=station, scenario=1), label="RCP45")
ax.plot(return_period, f_ssl_values(station=station, scenario=2), label="RCP85")
ax.set_xlabel("Return period [years]")
ax.set_ylabel("SSL [m]")
ax.set_title(
    "Extreme Storm Surge Level over return period per RCP scenario for Station %s "
    % station
)
plt.grid()
plt.xlim(0, return_period[-1])
plt.legend();

## Some additional info for working with Zarr data 

Instead of directly using xarray we here load the data with the zarr library to 
explain some of its aspects and features

In [None]:
# Like with xarray, the data can be read directly from an url.
zarr_store = zarr.open(r"gs://dgds-data-public/coclico/europe_storm_surge_level.zarr")

In [None]:
# The zarr store object, contains an attribute with describes the structure of the store.
zarr_store.info

In [None]:
# The directory tree can also be printed...
print(zarr_store.tree())

In [None]:
# The zarr store can also be inspected per group
zarr_store["ssl"].info


## Important notes on Zarr from Xarray perspective

Please note: 

1. Zarr can only read/write Zarr groups, but no individual zarr arrays.
2. Zarr arrays do not have dimensions names like NetCDF. Therefore, each Zarr array requires a special array attribute `_ARRAY_DIMENSIONS`. Without this attribute, data cannot be load from zarr into xarray. 

For an extensive discussion see [this entry of the xarray documentation](https://xarray.pydata.org/en/stable/internals/zarr-encoding-spec.html)

The attributes from a certain array can be listed like:

In [None]:
dict(zarr_store["ssl"].attrs)