# RUCIO and STAC Integration Sample

This notebook is the second part of the pilot of how STAC and RUCIO integrates for the InterTwin Project. Here, we showcase how we can generate STAC JSONs using the Raster2STAC library from the already extracted datasets provided by in the InterTwin DESI datalake, which serves a data federation for all the datasets used within the project for different thematic use cases. In this example, the WORLDCOVER datasets by ESA was used as an example. See further details in the following code cells.

#### Quick Resources
- [Available datasets in the InterTwin DataLake](https://confluence.egi.eu/display/interTwin/Data+Samples+from+the+Use+Cases)
- [Tutorial on how to use Rucio](https://confluence.egi.eu/display/interTwin/Tutorial+on+how+to+interact+with+Rucio+and+the+data+lake)
- [Raster2STAC library from EURAC Research](https://pypi.org/project/raster2stac/)

NOTE: To download datasets using Rucio on a debian-based OS, you should use Docker and ensure the right certificates are installed. See Dockerfile: `/mnt/CEPH_PROJECTS/InterTwin/stac/dev/Dockerfile` for updated list of compatible certificates for a debian-based OS.

In [1]:
import xarray as xr
import rioxarray
import numpy as np
import pathlib

WORLDCOVER Datasets

In [2]:
esa_worldcover_tiffs = sorted(pathlib.Path("/home/rbalogun/intertwin/RUCIO_STAC/ESA_WorldCover/ESA_WorldCover/").glob("*.tif"))
esa_worldcover = xr.open_mfdataset(esa_worldcover_tiffs, engine="rasterio", parallel=True)
esa_worldcover = esa_worldcover.rename_vars({"band_data": 'world_cover'})
esa_worldcover = esa_worldcover.drop_vars("band").squeeze("band")
esa_worldcover = esa_worldcover.expand_dims(dim={"time": ["2020-12-31"]}, axis=0)
esa_worldcover

Unnamed: 0,Array,Chunk
Bytes,19.31 GiB,4.00 MiB
Shape,"(1, 72000, 72000)","(1, 1024, 1024)"
Dask graph,5184 chunks in 22 graph layers,5184 chunks in 22 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 19.31 GiB 4.00 MiB Shape (1, 72000, 72000) (1, 1024, 1024) Dask graph 5184 chunks in 22 graph layers Data type float32 numpy.ndarray",72000  72000  1,

Unnamed: 0,Array,Chunk
Bytes,19.31 GiB,4.00 MiB
Shape,"(1, 72000, 72000)","(1, 1024, 1024)"
Dask graph,5184 chunks in 22 graph layers,5184 chunks in 22 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


The Raster2STAC library automatically generates the STAC JSON from the dataset and uploads the COGs to S3 bucket, which is then linked as the publicly available datasets.

In [4]:
import sys
import json
sys.path.append("/home/rbalogun/raster-to-stac")

from raster2stac import Raster2STAC
import xarray as xr
import os

rs2stac = Raster2STAC(
    data = esa_worldcover,
    title = "World Cover",
    description = "WorldCover provides the first global land cover products for 2020 and 2021 at 10 m resolution, developed and validated in near-real time based on Sentinel-1 and Sentinel-2 data.",
    keywords = ["land use", "land cover", "world cover", "sentinel-1", "sentinel-2"],
    providers=[
        {
            "url": "https://esa-worldcover.org/en",
            "name": "European Space Agency",
            "roles": [
                "producer"
            ]
        },
        {
            "url": "http://www.eurac.edu",
            "name": "Eurac Research - Institute for Earth Observation",
            "roles": [
                "host"
            ]
        }
    ],
    license="CC-BY-NC-4.0",
    sci_doi = "10.5281/zenodo.5571936",
    sci_citation = "Zanaga, D., Van De Kerchove, R., De Keersmaecker, W., Souverijns, N., Brockmann, C., Quast, R., Wevers, J., Grosu, A., Paccini, A., Vergnaud, S., Cartus, O., Santoro, M., Fritz, S., Georgieva, I., Lesiv, M., Carter, S., Herold, M., Li, Linlin, Tsendbazar, N.E., Ramoino, F., Arino, O., 2021. ESA WorldCover 10 m 2020 v100. https://doi.org/10.5281/zenodo.5571936.",
    collection_id = "WORLDCOVER", 
    collection_url = "http://10.8.244.74:8082/collections/",
    output_folder = "/mnt/CEPH_PROJECTS/InterTwin/stac/WORLDCOVER",
    s3_upload = True,
    bucket_name = "eurac-eo",
    bucket_file_prefix = "WORLDCOVER",
    aws_access_key = os.environ.get("AWS_ACCESS_KEY"),
    aws_secret_key = os.environ.get("AWS_SECRET_KEY"),
    aws_region = "s3-eu-west-1"
)

rs2stac.generate_cog_stac()

The aim of this integration is to provide a STAC JSON that contain links to the S3 bucket storage accessible to everyone and an alternative access to the datalake through rucio DID. This way, the team has multiple access points to the dataset and can co-develop methods for providing an authenticated access to the rucio datalake when datasets are requested through STAC.