## Onboard Joint Research Center (JRC) data to OS-C S3

The data is flood depth historical return period data and can be found at [JRC data catalog](https://data.jrc.ec.europa.eu/dataset/1d128b6c-a4ee-4858-9e34-6210707f3c81). The methodology is detailed at ["A new dataset of river flood hazard maps for Europe and the Mediterranean Basin" by  Francesco Dottori, Lorenzo Alfieri, Alessandra Bianchi, Jon Skoien, and Peter Salamon](https://essd.copernicus.org/articles/14/1549/2022/).

The provide six different return periods: 10, 20, 50, 100, 200 and 500 years.

The coordinates system of the map is EPSG:3035 (ETRS89-extended / LAEA Europe) and needs to be translated to Latitud-Longitud coordinate system. See EPSG official website for more information. 

To guess the bound exactly instead of approximate we can use the Spanish bounds lat-lon coordinates for the peninsula and transform them to EPSG 25830. Then repeat for Canary Islands.

## Create Zarr from shape and Affine transformation

<span style="color:blue">Note: the file must be located in /hazard/src/ for the dependencies to work</span>

In [1]:
import sys
import os
import s3fs
import zarr
import numpy as np
import rasterio
import math
import xarray as xr

from pyproj.crs import CRS
from affine import Affine



In [2]:
from hazard.sources.osc_zarr import OscZarr

In [3]:
# https://console-openshift-console.apps.odh-cl1.apps.os-climate.org/k8s/ns/sandbox/secrets/physrisk-s3-keys
# default_staging_bucket = 'redhat-osc-physical-landing-647521352890'
# OSC_S3_ACCESS_KEY, OSC_S3_SECRET_KEY

# Hazard indicators bucket
default_staging_bucket = "physrisk-hazard-indicators"
prefix = "hazard"

# Acess key and secret key are stored as env vars OSC_S3_HI_ACCESS_KEY and OSC_S3_HI_SECRET_KEY, resp.
s3 = s3fs.S3FileSystem(
    anon=False,
    key=os.environ["OSC_S3_HI_ACCESS_KEY"],
    secret=os.environ["OSC_S3_HI_SECRET_KEY"],
)

In [22]:
group_path = os.path.join(
    default_staging_bucket, prefix, "riverflood_JRC_RP_hist.zarr"
).replace("\\", "/")
store = s3fs.S3Map(root=group_path, s3=s3, check=False)
root = zarr.group(store=store, overwrite=True)

In [23]:
s3.ls("physrisk-hazard-indicators/hazard")

['physrisk-hazard-indicators/hazard/riverflood_JRC_RP010_hist.zarr',
 'physrisk-hazard-indicators/hazard/riverflood_JRC_RP_hist.zarr']

In [24]:
# Check the zarr file was created
group_path in s3.ls("physrisk-hazard-indicators/hazard")

True

In [25]:
oscZ = OscZarr(bucket=default_staging_bucket, prefix="hazard", s3=s3, store=store)

In [26]:
path_to_file = r"E:/JRC_RP/floodMap_RP{}/floodmap_EFAS_RP{}_C.tif".format(
    return_period, return_period
)
src = rasterio.open(path_to_file)

transform = src.transform
crs = CRS.from_epsg(3035)
width = src.width
height = src.height
shape = (width, height)

return_periods_str = ["010", "020", "050", "100", "200", "500"]
return_periods = [int(rt) for rt in return_periods_str]

src.close()

In [27]:
oscZ._zarr_create(
    path=group_path,
    shape=shape,
    transform=transform,
    crs=str(crs),
    overwrite=True,
    return_periods=return_periods,
)

<zarr.core.Array '/physrisk-hazard-indicators/hazard/riverflood_JRC_RP_hist.zarr' (6, 63976, 45242) float32>

In [None]:
# Create xr.DataArray from s3 stored zarr object

# This will break arise memory error
z = oscZ.root[group_path]
da = xr.DataArray(data=z)

In [11]:
# Read the file
# da = oscZ.read(path=group_path)
# da

# Return RuntimeError because of coords when creating Datarray

## Steps to populate riverflood_JRC_RP_hist.zarr for 100m resolution

### Step 1: Read JRC flood data

Returns flood depth

In [12]:
def read_one_file(path_to_file):
    """
    Read JRC data.

    Parameters:
        path_to_file (str): full path to tif file.

    Returns:
        fld_depth (numpy array): flood depth at (x1, y1) 3035 EPSG coordinates

    """

    src = rasterio.open(path_to_file)
    fld_depth = src.read()

    return fld_depth

### Step 2: Populate the raster file for every return period

In [26]:
for rt_i, rt in enumerate(return_periods_str):

    path_to_file = r"E:/JRC_RP/floodMap_RP{}/floodmap_EFAS_RP{}_C.tif".format(rt, rt)
    fld_depth = read_one_file(path_to_file)

    da.data[rt_i, :, :] = fld_depth

In [27]:
oscZ.write(path=group_path, da=da)

In [None]:
# Example using root object. Better to use oscZ object

"""
create_dataset(name, **kwargs) method of zarr.hierarchy.Group instance
    Create an array.
    
    Arrays are known as "datasets" in HDF5 terminology. For compatibility
    with h5py, Zarr groups also implement the require_dataset() method.
    
    Parameters
    ----------
    name : string
        Array name.
    data : array-like, optional
        Initial data.
    shape : int or tuple of ints
        Array shape.
    chunks : int or tuple of ints, optional
        Chunk shape. If not provided, will be guessed from `shape` and
        `dtype`.
    dtype : string or dtype, optional
        NumPy dtype.
    compressor : Codec, optional
        Primary compressor.
    fill_value : object
        Default value to use for uninitialized portions of the array.



root.create_dataset(name='prueba',
                    data = np.array([[0,1], [1,6]]),
                    shape = (2,2),
                    chunks = (1000, 1000),
                    dtype = 'f4')

trans_members = [
    transform.a,
    transform.b,
    transform.c,
    transform.d,
    transform.e,
    transform.f,
]
mat3x3 = [x * 1.0 for x in trans_members] + [0.0, 0.0, 1.0] # Why adding this ??
root.attrs["crs"] = str(crs)
root.attrs["transform_mat3x3"] = mat3x3 
if return_periods is not None:
    root.attrs["index_values"] = return_periods
    root.attrs["index_name"] = "return period (years)"

# Read the file
root['prueba']
"""

In [None]:
# Code to remove a file inside a bucket

""""
import boto3
boto_c = boto3.client('s3', aws_access_key_id=os.environ["OSC_S3_ACCESS_KEY"], aws_secret_access_key=os.environ["OSC_S3_SECRET_KEY"])

to_remove = boto_c.list_objects_v2(Bucket=default_staging_bucket, Prefix='hazard/hazard_MV_prueba.zarr')['Contents']

keys = [item['Key'] for item in to_remove]

for key_ in keys:
    boto_c.delete_object(Bucket=default_staging_bucket, Key=key_)
"""