# CoCliCo --- Delft Notebook Demonstration (October 2023)

Coastal Climate Core Services (CoCliCo) is an European effort to develop an opean web-platform to aid decision making on coastal risk (2021 - 2025). Please have a look at our [website](https://coclicoservices.eu/) to find out more about the project. 

During this project several datasets will be made available, which can be explored on the platform as well as accessed via cloud-storage buckets. In this notebook, some examples are provided on how to interact with the data using Python, specifically for sea level rise projections from the Intergovernmental Panel on Climate Change (IPCC) at national scales.

- Notebook author: Etiënne Kras, 26 July 2023

### Software environment

- Please install the latest mambaforge package manager by downloading the executable from the [MiniForge GitHub repo](https://github.com/conda-forge/miniforge#mambaforge) for your Operating System. 
- Clone the [coclicodata GitHub repo](https://github.com/openearth/coclicodata) on your local desktop using Git or GitHub Desktop.
- Open a Miniforge Prompt (terminal) on your local desktop. 
- Navigate to the coclicodata GitHub repo in the Miniforge Prompt and run `mamba env create -f environment.yml`. 
- This may take a few minutes to complete but once it is finished you will have all required packages to run this notebook installed in your ‘coclico’ environment. 

## IPCC AR5 & AR6 sea level rise projections 

Here, we use [IPCC's](https://www.ipcc.ch/) Fifth Assessment Report (AR5, 2013) relative Sea Surface Height (SSH) data processed by the [Integrated Climate Data Center (ICDC, CEN, University of Hamburg)](https://www.cen.uni-hamburg.de/en/icdc/data/ocean/ar5-slr.html). The data includes 10 geophysical sources that drive long-term changes in relative sea level change; 5 ice components, 3 ocean-related components, a land water storage and glacial isostatic adjustment. Also, we consider IPCC's latest medium confidence relative median regional sea level projections published in the Sixth Assessment Report (AR6, 2021) processed by [NASA's Jet Propulsion Laboratory](https://podaac.jpl.nasa.gov/announcements/2021-08-09-Sea-level-projections-from-the-IPCC-6th-Assessment-Report). The data includes antarctic ice sheet, greenland ice sheet, glaciers, land-water storage, ocean dynamics and vertical land motion as geophysical sources that drive long-term changes. 

The data is hosted in cloud buckets as Cloud Optimized GeoTIFFs (COGs). [COG](https://www.cogeo.org/) is a regular GeoTIFF file (viewed in for instance QGIS) but aimed at being hosted on a HTTP file server, with an internal organization that enables more efficient workflows in the cloud. You can basically ask for parts of the file you need which make post-processing routines very fast.



### Imports

In [None]:
import warnings

# import holoviews as hv
import cartopy.crs as ccrs
import cartopy.feature as cf
import matplotlib.pyplot as plt
import matplotlib.ticker as tck
import numpy as np
import pandas as pd
import pystac_client

# import xarray as xr
import rioxarray as rio
from tqdm import tqdm

# ignore warnings
warnings.filterwarnings("ignore")

### CoCliCo STAC catalog

In [None]:
# load the CoCliCo STAC catalog
catalog = pystac_client.Client.open(
    "https://storage.googleapis.com/dgds-data-public/coclico/coclico-stac/catalog.json"
)
# catalog

# list the datasets present in the catalog, we are interested in the slp5 and slp6 sets
list(catalog.get_children())

### Define parameter space

In [None]:
yr = 2100  # set year
ens = 50  # set ensemble [0-100]
var = "slr"  # set variable
ccs5 = "26"  # set climate change scenario for AR5
ccs6 = "1-26"  # set climate change scenario for AR5

# CONTINUE
# open href items with rio xarray to display them (first as normal plot then with holoviews?)
# use parameter space to change the opened item for the plot
# use the catalog content to make the plot nice (i.e. same bounding boxes, etc)
# import land-water line (rough) to display land?

# look at py-sense to have some examples
# do we need a zarr file to easily query temporal information or can we do that by loading all cog information easily?
# if so, then we would need an algorithm that resets the clicked point to the closest cell center to query from the zarr

# can we do this all in an html or something that people can just slide without running the code??

### Geospatial plot

In [None]:
%%time
# get data
# TODO: how to speed this up for datapoints at the end to the ar5/6_col?

# get AR5 collection and item href
ar5_col = catalog.get_child("slp5")
ar5_item_href = (
    ar5_col.get_item(
        r"rcp=%s/%s_ens%s/%s-01-01_%s-01-01.tif" % (ccs5, var, int(ens), yr, yr + 1)
    )
    .assets["data"]
    .href
)
ar5_item = rio.open_rasterio(ar5_item_href, masked=True)

# get AR6 collection and item href
ar6_col = catalog.get_child("slp6")
ar6_item_href = (
    ar6_col.get_item(r"ssp=%s/%s_ens%s/%s.tif" % (ccs6, var, float(ens), yr))
    .assets["data"]
    .href
)
ar6_item = rio.open_rasterio(ar6_item_href, masked=True)
ar6_item_corr = ar6_item / 1000

# cbar limits
vmin = max(
    min(np.nanmin(ar5_item), np.nanmin(ar6_item_corr)), -0.2
)  # bound to -0.2 if smaller than this value
vmax = max(np.nanmax(ar5_item), np.nanmax(ar6_item_corr))

In [None]:
%matplotlib ipympl
# %matplotlib inline

# TODO: zoom with ipyleaflet bbox converted to this plot?
# TODO: zoom to same extent (sharex, sharey does not work properly) when selecting a boundings box in ipympl

# define figure
fig, (ax1, ax2) = plt.subplots(
    1, 2, figsize=(13, 4), subplot_kw={"projection": ccrs.PlateCarree()}
)  # , sharex=True, sharey=True)
fig.tight_layout()
# plt.gcf().subplots_adjust(bottom=0.05)
plt.gcf().subplots_adjust(left=0.05)

# populate AR5 plot
ax1.set_facecolor("pink")
im5 = ar5_item.plot(
    ax=ax1,
    add_colorbar=False,
    vmin=round(vmin, 2),
    vmax=round(vmax, 2),
    cmap=plt.cm.afmhot_r,
)
ax1.set_title("%s \nRCP%s, %s, %sth percentile" % (ar5_col.title, ccs5, yr, ens))
# ax1.set_xlabel("Longitude [Degrees East]") # TODO: possibly import from file?
# ax1.set_ylabel("Latitude [Degrees North]") # TODO: possibly import from file?
cbar5 = plt.colorbar(im5, shrink=0.675, aspect=30 * 0.675, pad=0.02)
cbar5.set_label(
    "sea level rise [%s]" % ar5_col.extra_fields["deltares:units"]
)  # TODO: possibly import from file?
ax1.add_feature(cf.LAND, facecolor="lightgrey", zorder=15)
ax1.add_feature(cf.COASTLINE, linewidth=0.2, zorder=16)
ax1.add_feature(cf.BORDERS, linewidth=0.1, zorder=16)
gl1 = ax1.gridlines(
    crs=ccrs.PlateCarree(),
    draw_labels=True,
    linewidth=1,
    color="gray",
    alpha=0.2,
    linestyle="--",
)
gl1.xlabels_top = False
gl1.ylabels_right = False
ax1.text(
    -0.08,
    0.5,
    "latitude",
    va="bottom",
    ha="center",
    rotation="vertical",
    rotation_mode="anchor",
    transform=ax1.transAxes,
)
ax1.text(
    0.5,
    -0.15,
    "longitude",
    va="bottom",
    ha="center",
    rotation="horizontal",
    rotation_mode="anchor",
    transform=ax1.transAxes,
)

# populate AR6 plot
ax2.set_facecolor("pink")
im6 = ar6_item_corr.plot(
    ax=ax2,
    add_colorbar=False,
    vmin=round(vmin, 2),
    vmax=round(vmax, 2),
    cmap=plt.cm.afmhot_r,
)
ax2.set_title("%s \nSSP%s, %s, %sth percentile" % (ar6_col.title, ccs6, yr, ens))
# ax2.set_xlabel("Longitude [Degrees East]") # TODO: possibly import from file?
# ax2.set_ylabel("") # leave empty
cbar6 = plt.colorbar(im6, shrink=0.675, aspect=30 * 0.675, pad=0.02)
cbar6.set_label(
    "sea level rise [%s]" % ar6_col.extra_fields["deltares:units"]
)  # TODO: possibly import from file?
ax2.add_feature(cf.LAND, facecolor="lightgrey", zorder=15)
ax2.add_feature(cf.COASTLINE, linewidth=0.2, zorder=16)
ax2.add_feature(cf.BORDERS, linewidth=0.1, zorder=16)
gl2 = ax2.gridlines(
    crs=ccrs.PlateCarree(),
    draw_labels=True,
    linewidth=1,
    color="gray",
    alpha=0.2,
    linestyle="--",
)
gl2.xlabels_top = False
gl2.ylabels_right = False
gl2.ylabels_left = False
ax2.text(
    0.5,
    -0.15,
    "longitude",
    va="bottom",
    ha="center",
    rotation="horizontal",
    rotation_mode="anchor",
    transform=ax2.transAxes,
);

### Temporal plot

### General feedback (Floris) 

1) Put attributes like ENS, YRS and CCS in the STAC. Floris most advanced example of adding COG's to a STAC collection can be found [here](https://github.com/FlorisCalkoen/coastmonitor/blob/main/notebooks/typology/02_dynamic_world_stac.ipynb) Have a look at the `create_item()` and `create_asset()` functions. 
2) By reading the STAC items with a library like odc-stac, stackstac or xpystac you shoul be able to have the band dimension as time with labeled coordinates, which should be faster for indexing. 
3) If that's still too slow, you can consider to rechunk the data spatially instead of temporallily. So the you will have spatial partitions, which each contain all the timestamps. But then it should be put in a zarr store I guess. So that would be the other option; put all data in a zarr store and consider whether its more useful to have spatial or temporal dimensions

### Indexing Xarray datasets

Xarray datasets can be indexed geospatiaally using the rio accessor by geojson strings. 

In [None]:
import geopandas as gpd
import shapely

# select a point to plot timeseries of SLR projections for
point_location = [4.2, 52.8]  # easting, norting

# TODO: enable computing the mean of multiple cells in a polygon or specified set of points and show the SLR projection timeseries
area_location = [4.4, 52.6, 5.4, 53]

point_geom = shapely.Point(point_location)
area_geom = shapely.box(*area_location)

# geometry as json string that can be used to index xarray dataset using the
# rio accessor
shapely.geometry.mapping(area_geom)

## Read STAC as Pandas Dataframe

In [None]:
from typing import Dict, List
from copy import deepcopy

def items_to_dataframe(items: List[Dict]) -> pd.DataFrame:
    """STAC items to Pandas dataframe.

    Args:
        items (List[Dict]): _description_

    Returns:
        pd.DataFrame: _description_
    """
    _items = []
    for i in items:
        _i = deepcopy(i)
        # _i['geometry'] = shape(_i['geometry'])
        # ...  # for example, drop some attributes that you're not interested in
        _items.append(_i)
    df = pd.DataFrame(pd.json_normalize(_items))
    for field in ["properties.datetime"]:
        if field in df:
            df[field] = pd.to_datetime(df[field])
    df = df.sort_values("properties.datetime")
    return df

items = list(ar5_col.get_items())
items_df = items_to_dataframe([i.to_dict() for i in items])

## TODO:

Add more metadata to the STAC catalog that we can use to find the items that we are looking for. So instead of inferring the CCS, ENS and SLR from the links in the STAC they should be present as properties. Maybe under the Deltares/CoCliCo prefix or as additional properties. Then you can do something like this:

```python
indices = items_df.loc[(items_df["SLR" == ...]) & (items_df["CCS"] ==...)].index.to_list()
selected_items = [items[i] for i in indices]
```

## Make the line plot

### Option 1: doing the indexing yourself

In [None]:
# Sample indices from the DataFrame
sample_indices = items_df.sample(n=2).index.tolist()

# Get the items at those indices
selected_items = [items[i] for i in sample_indices]

In [None]:
import xarray as xr
import rioxarray
def preprocess(ds):
    point_location = [4.2, 52.8]  # easting, norting
    point_geom = shapely.Point(point_location)
    point_mapping = shapely.geometry.mapping(point_geom)
    
    ds = (
        ds.rio.clip([point_geom])
        .sel(x=point_location[0], y=point_location[1], method="nearest")
        # .rename({"band": "time"})
    )

    return ds


selected_ds = xr.open_mfdataset(
    [i.assets["data"].href for i in selected_items],
    engine="rasterio",
    concat_dim="band",
    combine="nested",
    preprocess=preprocess,  # this indexes every before merging them
    parallel=True, # this is for Dask
)
# manually add the datetimes to dataset because they are not contained in the tiff
times = [i.properties["datetime"] for i in selected_items]
selected_ds = selected_ds.rename({"band":"time"}).assign_coords(time=times)

# the plot looks a bit weird because we took two random samples
selected_ds["band_data"].plot.line(x="time")

### Option 2:

Use a library like odc-stac, stackstac or xpystac. The plot below ignores a lot of data because it probably combines the data based on the metadata in the STAC. In this metadata there are time duplicates because they contain different scenario's.  

In [None]:
%%time
from odc.stac import stac_load

stac_load(items, chunks={}, lon=[4.19, 4.21], lat=[52.69, 52.71]).squeeze()[
    "data"
].plot()

### Below is previous nb from

In [None]:
# retrieve AR5 and AR6 data at above-defined location
# TODO: check if Zarr usage is easier here, takes quite some time to index a position from CoG files (https://github.com/intake/intake-stac/issues/66)..
# TODO: what takes more time? get_items or reading the point in the tiff image? If the former, might work to use get_item_links and add string storage location manually

# define variables
ens_list = ["5", "50", "95"]  # ensemble list to look into
yrs_list = np.arange(1970, 2200, 10)  # years to look into (step of 10 years from 1970)

# loop over tifs to obtain data for AR5
key_list = ["CCS", "YRS", "ENS", "SLR"]
AR5_dict = {key: [] for key in key_list}
for idx, (i, j) in tqdm(enumerate(zip(ar5_col.get_items(), ar5_col.get_item_links()))):
    enss = str(j).split("/")[1].split("ens")[-1]  # ensemble
    yrs = int(str(j).split("/")[2][0:4])  # yrs
    if enss in ens_list and yrs in yrs_list:  # constraining read ensembles and years
        # print(i.assets["data"].href)
        AR5_dict["CCS"].append(
            str(i).split("/")[0].split("=")[-1]
        )  # climate change scenario for AR5
        AR5_dict["YRS"].append(yrs)  # year, at start of year similar to AR6
        AR5_dict["ENS"].append(enss)  # append ensemble
        # ar5_item = rio.open_rasterio(i.assets["data"].href, masked=True)  # open item
        # AR5_dict["SLR"].append(
        #     ar5_item.sel(x=loc[0], y=loc[1], method="nearest").values[0]
        # )  # match coordinate at center of raster cells

# append to dataframe
df5 = pd.DataFrame(data=AR5_dict)

# loop over tifs to obtain data for AR6
AR6_dict = {key: [] for key in key_list}
for idx, (i, j) in tqdm(enumerate(zip(ar6_col.get_items(), ar6_col.get_item_links()))):
    enss = str(j).split("/")[1].split("ens")[-1]  # ensemble
    yrs = int(str(j).split("/")[2][0:4])  # yrs
    if (
        enss in [str(float(x)) for x in ens_list] and yrs in yrs_list
    ):  # constraining read ensembles and years
        # print(i.assets["data"].href)
        AR6_dict["CCS"].append(
            str(i).split("/")[0].split("=")[-1]
        )  # climate change scenario for AR5
        AR6_dict["YRS"].append(yrs)  # year, at start of year similar to AR6
        AR6_dict["ENS"].append(enss)  # append ensemble
        ar6_item = rio.open_rasterio(i.assets["data"].href, masked=True)  # open item
        AR6_dict["SLR"].append(
            ar6_item.sel(x=loc[0], y=loc[1], method="nearest").values[0] / 1000
        )  # match coordinate at center of raster cells

# append to dataframe
df6 = pd.DataFrame(data=AR6_dict)

In [None]:
%matplotlib ipympl
# %matplotlib inline

# TODO: trial to make use of ChatGPT functionality to make plot with text explanation (https://github.com/gventuri/pandas-ai/blob/main/Notebooks/Getting%20Started.ipynb)

# define figure
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 4), sharey=True, sharex=True)
fig.tight_layout()
plt.gcf().subplots_adjust(bottom=0.15)
plt.gcf().subplots_adjust(left=0.05)
ax = plt.gca()

# specify colors for plots, different for AR5 & AR6 following official IPCC figures
colorsAR5 = ["purple", "cyan", "red"]
colorsAR6 = ["darkblue", "orange", "darkred"]

# AR5
for idx, (scen, grp) in enumerate(df5.groupby(["CCS"])):  # group per scenario
    ens_list = list(grp.groupby("ENS"))
    ens_list[1][1].plot(
        kind="line",
        x="YRS",
        y="SLR",
        label="RCP%s" % (scen[0][0] + "." + scen[0][1]),
        color=colorsAR5[idx],
        ax=ax1,
        alpha=0.5,
    )  # mean 50 percentile
    ax1.fill_between(
        ens_list[0][1].YRS,
        ens_list[0][1].SLR,
        ens_list[2][1].SLR,
        alpha=0.1,
        color=colorsAR5[idx],
        interpolate=True,
    )  # 5-95th percentile shading

ax1.set_title(
    "AR5 sea level rise projections at location (lon, lat): %s, %s" % (loc[0], loc[1])
)
ax1.xaxis.set_minor_locator(tck.AutoMinorLocator())
ax1.yaxis.set_minor_locator(tck.AutoMinorLocator())
ax1.axvline(2020, linestyle="--", color="k", linewidth=1, alpha=0.2)
ax1.axvline(2100, linestyle="--", color="k", linewidth=1, alpha=0.2)
ax1.set_xlim(2000, 2150)
ax1.grid(alpha=0.2)
ax1.set_xlabel("time [year]")
ax1.set_ylabel("sea level rise [m]")

# AR6
for idx, (scen, grp) in enumerate(df6.groupby(["CCS"])):  # group per scenario
    ens_list = list(grp.groupby("ENS"))
    ens_list[1][1].plot(
        kind="line",
        x="YRS",
        y="SLR",
        label="SSP%s" % (scen[0][0:3] + "." + scen[0][3]),
        color=colorsAR6[idx],
        ax=ax2,
        alpha=0.5,
    )  # mean 50 percentile
    ax2.fill_between(
        ens_list[0][1].YRS,
        ens_list[0][1].SLR,
        ens_list[2][1].SLR,
        alpha=0.1,
        color=colorsAR6[idx],
        interpolate=True,
    )  # 5-95th percentile shading

ax2.set_title(
    "AR6 sea level rise projections at location (lon, lat): %s, %s" % (loc[0], loc[1])
)
ax2.xaxis.set_minor_locator(tck.AutoMinorLocator())
ax2.yaxis.set_minor_locator(tck.AutoMinorLocator())
ax2.axvline(2020, linestyle="--", color="k", linewidth=1, alpha=0.2)
ax2.axvline(2100, linestyle="--", color="k", linewidth=1, alpha=0.2)
ax1.set_xlim(2000, 2150)
ax2.grid(alpha=0.2)
ax2.set_xlabel("time [year]")
ax2.set_ylabel("sea level rise [m]");

In [None]:
# TODO: comparison of AR5 / AR6 to the global mean at the selected location?

### Export notebook

In [None]:
# TODO: export to HTML to try out an interactive example in which you can alter the clicked point to look at SLR projections