Unexpected use of GroupBy's `sort_key` to label output dimensions when applying custom grouping #1153

robbibt · 2021-07-13T01:56:13Z

Expected behaviour

I would like to use a custom GroupBy object to specify custom sorting when grouping data. For example, when loading datasets from multiple annual geomedian products, I would like datasets to be sorted by Landsat platform (e.g. LANDSAT-5, LANDSAT-7, LANDSAT-8) when data is grouped so that I can prioritise data from one satellite over the other.

For example, in this example I would like to create a GroupBy that uses a custom sort_by_platform function to prioritise data by alphabetical platform order:

def sort_by_platform(ds):
    return ds.metadata.platform
    
platform_grouper = GroupBy(dimension='time',
                       group_by_func=_extract_time_from_ds,
                       units='seconds since 1970-01-01 00:00:00',
                       sort_key=sort_by_platform)

I expect to be able to use this custom GroupBy to load data with a normal time dimension, with the only difference being the internal sorting used when combining datasets.

Actual behaviour

When I supply this custom GroupBy to dc.load, the output of my custom sorting function now replaces the time dimension in my dataset:

It is not possible to specify a custom sorting without the output of this function being used to rename the time dimension.

This coupling of axis value from group sorted order appears to have been previously flagged in the following to-do: https://github.com/opendatacube/datacube-core/blob/develop/datacube/api/core.py#L464

Steps to reproduce the behaviour

import datacube
import datetime
from datacube.model import Range, Dataset
from datacube.utils.dates import normalise_dt
import collections

dc = datacube.Datacube()

#################
# Find datasets #
#################

# Set query
query = {
    'x': (146.592962, 146.712962),
    'y': (-38.80321, -38.683209999999995),
    'time': ('1987', '2020')
}

# Find datasets for all three geomedians
dss_ls5 = dc.find_datasets(product='ls5_nbart_geomedian_annual', **query)
dss_ls7 = dc.find_datasets(product='ls7_nbart_geomedian_annual', **query)
dss_ls8 = dc.find_datasets(product='ls8_nbart_geomedian_annual', **query)

##################
# Custom GroupBy #
##################

def _extract_time_from_ds(ds: Dataset) -> datetime.datetime:
    return normalise_dt(ds.center_time)

def sort_by_platform(ds):
    return ds.metadata.platform


GroupBy = collections.namedtuple(
    'GroupBy', ['dimension', 'group_by_func', 'units', 'sort_key'])
platform_grouper = GroupBy(dimension='time',
                           group_by_func=_extract_time_from_ds,
                           units='seconds since 1970-01-01 00:00:00',
                           sort_key=sort_by_platform)

#############
# Load data #
#############

ds = dc.load(datasets=dss_ls5 + dss_ls7 + dss_ls8,
             measurements=['swir1'],
             dask_chunks={},
             group_by=platform_grouper,
             **query)

ds.time

Environment information

Which datacube --version are you using? '1.8.4.dev81+g80d466a2'
What datacube deployment/enviornment are you running against? DEA Sandbox, standard image

The text was updated successfully, but these errors were encountered:

robbibt added the bug label Jul 13, 2021

uchchwhash mentioned this issue Jul 16, 2021

Group by enhancements #1157

Merged

3 tasks

uchchwhash closed this as completed in #1157 Jul 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected use of GroupBy's `sort_key` to label output dimensions when applying custom grouping #1153

Unexpected use of GroupBy's `sort_key` to label output dimensions when applying custom grouping #1153

robbibt commented Jul 13, 2021 •

edited

Unexpected use of GroupBy's sort_key to label output dimensions when applying custom grouping #1153

Unexpected use of GroupBy's sort_key to label output dimensions when applying custom grouping #1153

Comments

robbibt commented Jul 13, 2021 • edited

Expected behaviour

Actual behaviour

Steps to reproduce the behaviour

Environment information

Unexpected use of GroupBy's `sort_key` to label output dimensions when applying custom grouping #1153

Unexpected use of GroupBy's `sort_key` to label output dimensions when applying custom grouping #1153

robbibt commented Jul 13, 2021 •

edited