# Single step join of monthly Sentinel-2 data to points of interest

This notebook shows how to use the Geo Engine to join monthly Sentinel-2 data to points of interest. 

The following packages are used:

In [1]:
import geopandas as gpd
import geoengine as ge




Initialise the Geo Engine session:

In [2]:
ge.initialize("http://localhost:3030/api", credentials=("admin@localhost", "admin1234"))

In [3]:
session = ge.get_session()
user_id = session.user_id
session

Server:              http://localhost:3030/api
User Id:             d5328854-6190-4af9-ad69-4e74b0961ac9
Session Id:          b936567b-981a-4741-ad15-12b38058e3cc
Session valid until: 2023-04-01T10:29:22.661Z

To track the quota usage, first get the current quota:

In [4]:
used_quota_start = ge.get_quota(user_id)['used']
used_quota_start

0

Set the area of interest. It is defined as a bounding box in EPSG:32632.
It is locted in NRW, Germany and covers the area between Willingen, Lippstadt and Werl.

In [5]:
bounds_array = [421395,  5681078, 476201, 5727833]
xmin = bounds_array[0]
ymin = bounds_array[1]
xmax = bounds_array[2]
ymax = bounds_array[3]

(xmin, ymin, xmax, ymax)

(421395, 5681078, 476201, 5727833)

Using the bounding box, a time interval and a resolution, we define the area of interest as a temporal raster space-time cube.

In [6]:
from datetime import datetime
time_start = datetime(2021, 1, 1)
time_end = datetime(2021, 12, 31)

study_area = ge.api.RasterQueryRectangle(
    spatialBounds=ge.SpatialPartition2D(xmin, ymin, xmax, ymax).to_api_dict(),
    timeInterval=ge.TimeInterval(time_start, time_end).to_api_dict(),
    spatialResolution=ge.SpatialResolution(10.0, 10.0).to_api_dict(),
)
study_area

{'spatialBounds': {'upperLeftCoordinate': {'x': 421395, 'y': 5727833},
  'lowerRightCoordinate': {'x': 476201, 'y': 5681078}},
 'timeInterval': {'start': '2021-01-01T00:00:00.000+00:00',
  'end': '2021-12-31T00:00:00.000+00:00'},
 'spatialResolution': {'x': 10.0, 'y': 10.0}}

Read the prepared point data from the gpkg file. Then upload the data to the Geo Engine.

In [7]:
points_df = gpd.read_file("group_sample_frac1_inspireId_utm32n.gpkg")
points_id = ge.upload_dataframe(points_df, "group_sample_frac1_inspireId")
points_id

ca5a82ce-05a1-4834-bb8f-d2f58284a8a4

Create a point source operator that reads the point data:

In [8]:
points_source_operator = ge.unstable.workflow_operators.OgrSource(str(points_id))
points_source_operator.to_workflow_dict()

{'type': 'Vector',
 'operator': {'type': 'OgrSource',
  'params': {'data': {'type': 'internal',
    'datasetId': 'ca5a82ce-05a1-4834-bb8f-d2f58284a8a4'},
   'attributeProjection': None,
   'attributeFilters': None}}}

Now use the convenience function `s2_cloud_free_aggregated_band` to create a wokflow for each band [02, 03, 04, 08 and NDVI] that downloads the Sentinel-2 data of the bands and creates cloud free monthly means. 

Then use the workflows as input to the `RasterVectorJoin` operator to create a workflow that attaches the Sentinel-2 data to the points.

In [9]:
s2_agg_operators = {}    
sentinel_bands = ["B02", "B03", "B04", "B08", "NDVI"]

for band in sentinel_bands:
    s2_agg_operators[band] = ge.unstable.workflow_blueprints.s2_cloud_free_aggregated_band(
        band,
        granularity="months",
        window_size=1,
        aggregation_type="mean"
    )

#projected_points = geoengine.unstable.workflow_operators.Reprojection(ge_point_source, target_spatial_reference="EPSG:32632")

points_with_s2_cloud_free = ge.unstable.workflow_operators.RasterVectorJoin(
    raster_sources=[x for x in s2_agg_operators.values() ],
    vector_source=points_source_operator,
    new_column_names=[x for x in s2_agg_operators.keys()],
)

points_with_s2_cloud_free.to_dict()
        

{'type': 'RasterVectorJoin',
 'params': {'names': ['B02', 'B03', 'B04', 'B08', 'NDVI'],
  'temporalAggregation': 'none',
  'featureAggregation': 'mean'},
 'sources': {'vector': {'type': 'OgrSource',
   'params': {'data': {'type': 'internal',
     'datasetId': 'ca5a82ce-05a1-4834-bb8f-d2f58284a8a4'},
    'attributeProjection': None,
    'attributeFilters': None}},
  'rasters': [{'type': 'TemporalRasterAggregation',
    'params': {'aggregation': {'type': 'mean', 'ignoreNoData': True},
     'window': {'granularity': 'months', 'step': 1}},
    'sources': {'raster': {'type': 'Expression',
      'params': {'expression': ' if (B == 3 || (B >= 7 && B <= 11)) { NODATA } else { A }',
       'outputType': 'U16',
       'mapNoData': False},
      'sources': {'a': {'type': 'GdalSource',
        'params': {'data': {'type': 'external',
          'providerId': '5779494c-f3a2-48b3-8a2d-5fbba8c5b6c5',
          'layerId': 'UTM32N:B02'}}},
       'b': {'type': 'GdalSource',
        'params': {'data': {'t

Register the workflow with the Geo Engine:

In [10]:
workflow = ge.register_workflow(points_with_s2_cloud_free.to_workflow_dict())
workflow

e952913a-a46c-5949-9d49-33c572bc913c

Create the start and end time of the time interval we want to query:

In [11]:
from datetime import datetime


start_dt = datetime(2021, 1, 1, 0, 0, 0)
end_dt = datetime(2022, 1, 1, 0, 0, 0)

start_dt, end_dt

(datetime.datetime(2021, 1, 1, 0, 0), datetime.datetime(2022, 1, 1, 0, 0))

Now query the workflow with the area of interest and the time interval:

Finally, store the result in a gpkg file:

In [12]:
gp_res = await workflow.vector_stream_into_geopandas(
    ge.QueryRectangle(
        spatial_bounds=ge.BoundingBox2D(
            xmin=xmin,
            ymin=ymin,
            xmax=xmax,
            ymax=ymax,
        ),
        time_interval=ge.TimeInterval(
            start=start_dt,
            end=end_dt,
        ),
        resolution=ge.SpatialResolution(
            10.0,
            10.0,
        ),
        srs="EPSG:32632",
))

gp_res.to_file("gp_res_10_frac1_monthly_utm32n_one_workflow.gpkg", driver="GPKG")
gp_res

Get the current quota and print the difference to the initial quota:

In [None]:
used_quota_total = ge.get_quota(user_id)['used'] - used_quota_start
used_quota_total

322910