# Options for interactive exploration in openEO

An important usecase for openEO platform is interactively exploring and processing data. 
This notebook tries to illustrate what is already working, and what improvements can be useful.
It is mainly based on feedback from users, and working on actual EO use cases.

## Collection viewing

A use case often starts with a pure visual exploration of the source data, which is very relevant to get a feeling of:
- spatial/temporal coverage: a lot of data providers do not mirror the full archive, so this is super important!
- spatial resolution: what type of spatial features are visible in the source datasets
- temporal resolution and the effect of clouds on our area of interest
- general data quality: will it need masking, or filtering?

For this case, I often fall back to already existing viewers.

In the example below, the second 'alternate' link actually links to the viewer that allows this.

_Improvement proposal_: convention to specify viewer (WMTS?) links, so our viewing components can actually pick them up?

_relevant docs_: https://open-eo.github.io/openeo-python-client/data_access.html#data-discovery

In [2]:
import openeo
o = openeo.connect("https://openeo.cloud").authenticate_basic()

In [4]:
o.describe_collection("TERRASCOPE_S2_TOC_V2")

## Basic principles

How to find out basic information, if not clear from metadata above, is listed here:

https://open-eo.github.io/openeo-python-client/data_access.html#exploring-collections

Working in an interactive manner always depends on some form of data reduction:

https://open-eo.github.io/openeo-python-client/data_access.html#data-reduction

So basically, you just start with a relatively small area, and grow it when you have a working graph. For interactive use, don't use batch jobs! They are much slower by design.

The maximum output size that a backend can handle will depend a lot on the use case. But for processing at 10m resolution, we want to aim at supporting the area of a Sentinel-2 tile (100kmx100km), using batch jobs.
This data can then be downloaded in visualized in a tool like QGis.

### Improvements needed

- Memory improvements for NetCDF in Terrascope backend: support exporting 1 MGRS tile without going out of memory
- Alternative solution: support Zarr, as a faster way to export array data

## Timeseries case
For temporal trajectories over an area of interest, using NetCDF is flexible, but can also be slower. 

A very good alternative is already reducing the spatial dimensions on the backend, which can be parallellized, and drastically shrinks the dataset:

https://open-eo.github.io/openeo-python-client/basics.html#example-retrieving-aggregated-timeseries

This relies on:

https://processes.openeo.org/#aggregate_spatial

In the VITO backend, we also made this work for e.g. 100000 polygons. So you can send a shp or geojson file to the backend. This enables easy exploration and analysisn, while avoiding generation, processing and download of pixels that are not really used at all.

### Improvements needed

The json based data format of aggregate_spatial is not yet standardized. We also support some NetCDF based alternatives at VITO, but unsure how good tooling support for that is.




## Data reduction by resampling

One of the reasons why Google Earth Engine can generate fast results, is that it downsamples data by default. So if a user requests a relatively large area, instead of trying to generate a multi-gigabyte result, it resamples it to a size of e.g. 1024x1024 pixels, which of course solves a lot of problems with generating an output file.

In openEO, we explicitly decided not to have such automatic behaviour, because it makes it harder to ensure that backends return the same result. Instead, users have to explicitly resample.

Note that I do not use this in general, but it does make the example of computing backscatter over a large area more robust. Also, the original geotiffs in full resolution will also still be part of the result.


In [None]:
collection      = 'SENTINEL1_GRD'
spatial_extent  = {'west':12.03762,'east':12.511386,'south':41.908324,'north':42.133792,'crs':'EPSG:4326'}
temporal_extent = ["2021-02-21", "2021-03-22"]
bands           = ["VV","VH"]

s1 = o.load_collection(collection,spatial_extent=spatial_extent,bands=bands,temporal_extent=temporal_extent)

s1bs = s1.ard_normalized_radar_backscatter(elevation_model="COPERNICUS_30")
s1bs = s1bs.resample_spatial(resolution=1000, method='average')
job = s1bs.execute_batch(out_format="NetCDF")

In [10]:
job = o.job('9e38f2ea-cfd9-4d46-a1b0-a81d1fd4e352')
job.get_results()

In [12]:
job.get_results().download_files()

[PosixPath('/data/users/Public/driesj/openeo/SRR1_notebooks/out'),
 PosixPath('/data/users/Public/driesj/openeo/SRR1_notebooks/s1_rtc_031858_N41E012_2021_03_10_MULTIBAND.tif'),
 PosixPath('/data/users/Public/driesj/openeo/SRR1_notebooks/s1_rtc_031858_N41E012_2021_03_10_metadata.json'),
 PosixPath('/data/users/Public/driesj/openeo/SRR1_notebooks/s1_rtc_031858_N42E012_2021_03_10_MULTIBAND.tif'),
 PosixPath('/data/users/Public/driesj/openeo/SRR1_notebooks/s1_rtc_031858_N42E012_2021_03_10_metadata.json'),
 PosixPath('/data/users/Public/driesj/openeo/SRR1_notebooks/s1_rtc_031AC9_N41E012_2021_03_15_MULTIBAND.tif'),
 PosixPath('/data/users/Public/driesj/openeo/SRR1_notebooks/s1_rtc_031AC9_N41E012_2021_03_15_metadata.json'),
 PosixPath('/data/users/Public/driesj/openeo/SRR1_notebooks/s1_rtc_031AC9_N42E012_2021_03_15_MULTIBAND.tif'),
 PosixPath('/data/users/Public/driesj/openeo/SRR1_notebooks/s1_rtc_031AC9_N42E012_2021_03_15_metadata.json'),
 PosixPath('/data/users/Public/driesj/openeo/SRR1_not