<span style="color:red; font-family:Helvetica Neue, Helvetica, Arial, sans-serif; font-size:2em;">An Exception was encountered at '<a href="#papermill-error-cell">In [7]</a>'.</span>

In [1]:
from edc import check_compatibility
check_compatibility("user-0.21.0")

## xcube-gen and xcube-geodb Hands-On Workshop &Phi;-Week 2020
### Alicja Balfanz and Helge Dzierzon from Brockmann Consult

In this Notebook we present how to:
* Access data via xcube-sh with a short excursion to the xcube Generator User Interface
* Use xcube-geodb

__Before getting started:__ Setup the environment variables

In [2]:
from edc import setup_environment_variables
setup_environment_variables()

API credentials have automatically been injected for your active subscriptions.  
The following environment variables are now available:

The following additional environment variables have been loaded from `~/custom.env`:
* `AWS_BUCKET`
* `DAPA_URL`
* `DB_HOST`, `DB_NAME`, `DB_PASSWORD`, `DB_USER`
* `OGC_EDC_URL`
* `REFERENCE_DATA`


##  1) Data access via xcube-sh

Please Note!

The following example shows the use of xcube and xcube Generator User Interface by accissing Sentinel-2 data from SentinelHub. However, SentinelHub is not the only available source! 
Currently there are three Datastores accessible: 
* Climate Data Store 
* SENTINEL Hub  
* ESA CCI Open Data Portal

__Step 1__: Import packages for xcube-sh 

In [3]:
# xcube_sh imports
from xcube_sh.cube import open_cube
from xcube_sh.config import CubeConfig

# various utilities
import json
import xarray as xr
import shapely.geometry
import IPython.display
import zarr

import warnings
warnings.filterwarnings('ignore')
warnings.simplefilter('ignore')

__Step 2__: Setting CubeConfig parameters for generating S2 cubes

In [4]:
dataset_name = 'S2L2A' # dataset name
band_names = ['B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B08', 'SCL', 'CLD'] # band names
tile_size = [500, 500] # tile size 
# geometry (bounding box for the area covering Hamburg)
x1 = 9.86
y1 = 53.45
x2 = 10.14
y2 = 53.62
bbox = x1, y1, x2, y2
spatial_res = 0.00018 # spatial resolution (approx. 20 m in degree)
time_tolerance='2H'

A description about the dataset and the bands may be found here: https://docs.sentinel-hub.com/api/latest/data/sentinel-2-l2a/ .

__Step 3:__ Setting cube configurations for 2018 summer season (May - August)

In [5]:
cube_config_2018 = CubeConfig(dataset_name = dataset_name,
                              band_names = band_names,
                              tile_size = tile_size, 
                              bbox = bbox,
                              spatial_res = spatial_res,
                              time_range = ['2018-05-01', '2018-08-31'],
                              time_tolerance = time_tolerance)

__Step 4:__ View bounding box to check seleceted area

In [6]:
IPython.display.GeoJSON(shapely.geometry.box(*bbox).__geo_interface__)

<IPython.display.GeoJSON object>

__Step 5:__ Open data cube

<span id="papermill-error-cell" style="color:red; font-family:Helvetica Neue, Helvetica, Arial, sans-serif; font-size:2em;">Execution using papermill encountered an exception here and stopped:</span>

In [7]:
cube_S2_2018 = open_cube(cube_config_2018)
cube_S2_2018

ValueError: Both client_id and client_secret must be provided.
Consider setting environment variables SH_CLIENT_ID and SH_CLIENT_SECRET.
For more information refer to https://docs.sentinel-hub.com/api/latest/#/API/authentication

__Step 6:__ Plot a single scene for a specific timestamp and band

In [None]:
cube_S2_2018.B04.sel(time='2018-07-25 12:00:00', method='nearest').plot.imshow(vmin=0, vmax=0.2, cmap='Greys_r', figsize=(14, 10))

__Step 7:__ Plot a timeseries for a certain location

In [None]:
cube_S2_2018.B04.sel(lat=53.56, lon=10.00, method='nearest').plot()

If one wants a subest, e.g. because the region of interest is smaler, this can be achieved by using xcube python api: 

In [None]:
# xcube imports
from xcube.core.compute import compute_cube
from xcube.core.maskset import MaskSet
from xcube.core.geom import mask_dataset_by_geometry
from xcube.core.geom import clip_dataset_by_geometry

There are two ways of selecting the area of interest:
* By masking - this keeps only the data of the WKT Polygon or shapefile
* By clipping - this creates a new bounding box, which ensures that the area of interest is within the subset but keeps the surrounding data (*this is commented out in the following cells*)

In [None]:
# Great tool to quickly generate WKT Polygons: https://arthur-e.github.io/Wicket/sandbox-gmaps3.html
polygon = "POLYGON((9.994461289534694 53.580963332925194,10.0002977763511 53.57280950628081,9.995147935042507 53.5625130540729,9.987766495833522 53.55374380185022,9.993946305403835 53.55088830422569,10.002529374251491 53.5553754282651,10.016605607161647 53.56190130476089,10.018493882308132 53.568528117753054,10.00853752244485 53.57535774603685,10.016090623030788 53.578211592240635,10.013000718245632 53.579740359179034,10.00682090867532 53.579230776342285,10.003387681136257 53.58249200035297,9.994461289534694 53.580963332925194))"

In [None]:
cube_S2_2018_masked = mask_dataset_by_geometry(cube_S2_2018, geometry=polygon)
# cube_S2_2018_clipped = clip_dataset_by_geometry(cube_S2_2018, geometry=polygon)

In [None]:
cube_S2_2018_masked
# cube_S2_2018_clipped

In [None]:
cube_S2_2018_masked.B04.sel(time='2018-07-25 12:00:00', method='nearest').plot.imshow(vmin=0, vmax=0.2, cmap='Greys_r', figsize=(14, 10))
# cube_S2_2018_clipped.B04.sel(time='2018-07-25 12:00:00', method='nearest').plot.imshow(vmin=0, vmax=0.2, cmap='Greys_r', figsize=(14, 10))

Save cube as zarr:

In [None]:
# cube_S2_2018_masked.to_zarr('Alster_cube_2018_masked.zarr')
# cube_S2_2018_clipped.to_zarr('Alster_cube_2018.zarr')

## How to generate a datacube with the help of xcube Generator user interface

__This section is not hands-on!__

If you want to check out the xcube Generator yourself later, please take a look at the [xcube Generator User Guide](https://eurodatacube.com/documentation/xcube-gen-ui-guide) to give you more detail. 

__Open a cube from s3 bucket:__  
The below cell shows how to open a generated datacube which is stored in a public bucket with the help of xcube 

In [None]:
from xcube.core.dsio import open_cube
cube_from_s3 = open_cube('s3://edc-phi-week-2020/xcube-gen-1b11e7c1-94f9-4246-833e-a2a12d301a3e.zarr')

In [None]:
cube_from_s3

__Open a cube stored loccaly, here in the workspace:__

In [None]:
cube = xr.open_zarr('Alster_cube_2018.zarr')

Discover, which flags are available within the datacube:

In [None]:
mask = MaskSet(cube.SCL)
mask

In [None]:
mask.clouds_high_probability

Make a datacube with all clouds masked out:

In [None]:
cube_wo_clouds = cube.where(xr.ufuncs.logical_not(mask.cloud_shadows | mask.snow_or_ice | 
                                                  mask.saturated_or_defective | mask.cirrus | 
                                                  mask.clouds_low_probability_or_unclassified | 
                                                  mask.clouds_medium_probability | mask.clouds_high_probability))

Keep only the waterbodies of the datacube:

In [None]:
cube_water_wo_clouds = cube_wo_clouds.where(mask.water)

If you want to take a look at the water mask, please uncomment the following line: 

In [None]:
# mask.water.sel(time='2018-07-20 12:00:00', method='nearest').plot.imshow(figsize=(14, 10))

Take a look at a time stamp without any masking:

In [None]:
cube.B04.sel(time='2018-07-20 12:00:00', method='nearest').plot.imshow(vmin=0, vmax=0.2, cmap='Greys_r', figsize=(14, 10))

Take a look at a time stamp of the waterbodies and all clouds masked out:

In [None]:
cube_water_wo_clouds.B04.sel(time='2018-07-20 12:00:00', method='nearest').plot.imshow(vmin=0, vmax=.2, figsize=(14, 10))

## 2) Using the xcube-geodb

The xcube-geodb consists of a Restful service and a Python client. Let's import the Python dependencies to access the geodb service.

__Step 1__: Import the geodb client

In [None]:
from xcube_geodb.core.geodb import GeoDBClient

__Step 2__: Instantiate the client

In [None]:
geodb = GeoDBClient()

__Step 3__: As a subscriber a user name is associated with you. Check your user name.

In [None]:
geodb.whoami

__Step 4__: Let's see what collections you have access to

In [None]:
ds = geodb.get_my_collections()
ds

__Step 5__: In the next step you will create a collection. The properties are defined using OGC standards for properties. However, types are given using postgresql types.

In [None]:
#geodb.drop_collection("alster")

collections = {
        "alster": 
        {
            "crs": 4326,
            "properties": 
            {
                "date": "date", 
                "chl": "float",
                "chl_min": "float",
                "chl_max": "float",
                "status": "text"
            }
        }
    }


geodb.create_collections(collections)

__Step 6__: Let's check whether the collection has been created

In [None]:
ds = geodb.get_my_collections()
ds

__Step 7__: No collection is useful without data. We have created a sample dataset for you which you no load from an S3 AWS object store and open it into a GeoDataFrame. The data is obtained from a local water authority (HU).

In [None]:
import geopandas
import pandas
gdf = geopandas.read_file('https://edc-phi-week-2020.s3.eu-central-1.amazonaws.com/hands-on/alster.geojson')
gdf = gdf.where(pandas.notnull(gdf), None)
gdf

In [None]:
geodb.insert_into_collection('alster', gdf)

__Step 8__: Load the inserted data from the geoDB

In [None]:
# geodb.get_collection('alster', database='phi_week')

In [None]:
gdf = geodb.get_collection('alster')
gdf['date'] = pandas.to_datetime(gdf['date'])
gdf

In [None]:
feat = gdf['geometry'][0]

__Step 9__: Combine data from Sentinel

In [None]:
cube_S2_2018['B3o1'] = cube_S2_2018.B03/ cube_S2_2018.B01
cube_S2_2018['B3o2'] = cube_S2_2018.B03/ cube_S2_2018.B02
cube_S2_2018

In [None]:
cube_S2_2018_point_low = cube_S2_2018.where((cube_S2_2018.B3o1 < 10) & (cube_S2_2018.SCL==6)).sel(lat=feat.y, lon=feat.x, method='nearest')
cube_S2_2018_point_high = cube_S2_2018.where((cube_S2_2018.B3o2 < 10) & (cube_S2_2018.SCL==6)).sel(lat=feat.y, lon=feat.x, method='nearest')

__Step 10__: Group point time series data by date

In [None]:
agg = gdf.query("date >= '2018-05-01' and date <= '2018-09-01'").groupby(['date']).mean()
agg

__Step 11__: Plot result

In [None]:
import matplotlib.pyplot as plt

In [None]:
fig, ax1 = plt.subplots(constrained_layout=True, figsize=(10, 10))
ax1.set_ylabel('Sentinel 2 colour ratio', color='red')  
ax1.plot(cube_S2_2018_point_low.time, cube_S2_2018_point_low.B3o1.values, label = "B3o1", color='red', marker='.', linestyle = 'None', markersize=10)
ax1.plot(cube_S2_2018_point_high.time, cube_S2_2018_point_high.B3o2.values, label = "B3o2", color='red', marker='.', linestyle = 'None', markersize=10)

ax2 = ax1.twinx()  
ax2.plot(agg.index, agg.chl_max, label = "Chlorophyll max. (tot) [µg/l]", color='green', marker='.', linestyle = 'None',  markersize=10)
ax2.set_ylabel('in-situ chl max', color='green')  


fig.legend()

