Skip to content

CF expansion and alignment#8

Merged
m-mohr merged 32 commits intostac-extensions:mainfrom
MetServiceDev:az_cfexpansion
Mar 17, 2026
Merged

CF expansion and alignment#8
m-mohr merged 32 commits intostac-extensions:mainfrom
MetServiceDev:az_cfexpansion

Conversation

@drandyziegler
Copy link
Copy Markdown
Contributor

Addressed the following

  • aligned field names with what CF convention uses for attributes (e.g. standard_name, units, ...) making the elements more recognisable for people familiar with CF
  • added long_name and cell_methods fields
  • added a CF Object to clearly describe the vertical dimension that the elements belong to. This can be just the definition of the vertical axis but can also include a list of the vertical values that the data is available for.

Updated the schema accordingly but noticed that the test does not pick up on typos in the "vertical_dimension" object. No idea why.

@drandyziegler drandyziegler marked this pull request as ready for review March 25, 2025 22:50
@drandyziegler
Copy link
Copy Markdown
Contributor Author

Hi @Fred-Leclercq @m-mohr
This is what I suggest.
Just realised that we could probably keep both approaches active by retaining the cf:parameter definition and adding a e.g. cf:element schema in parallel.

@m-mohr
Copy link
Copy Markdown
Collaborator

m-mohr commented Mar 25, 2025

Why is it so important to follow exactly the property names in CF? Can't we just use name and unit for backward compatibility?

Are all these properties required in the metadata? I'm not a fan of just dumping everything into STAC without justification with a usecase. If there's a good usecase for every field, fine to have them all. But otherwise STAC philosophy to keep it small and simple as much as possible.

@drandyziegler
Copy link
Copy Markdown
Contributor Author

drandyziegler commented Mar 26, 2025

Thanks for your feedback @m-mohr .
I found a very confusing using "name" as a field name which corresponds to the defined attribute "standard_name" in CF. Why would we not use the exact same label if there is nothing preventing us from doing so. The same for "unit" which is called "units" in CF. If this extension is to embed CF convention then we should follow CF names as identical as possible would be my expectation and need a good reason when not to.

We need a STAC catalogue that we can search for the correct field. "air_temperature" as standard_name is insufficient, this can be an instantaneous measurement, or an hourly mean/maximum/minimum, or a 24-hour mean/maximum/minimum. Same for rainfall and many other fields. The cell_methods fields allows to filter for the required "air_temperature" fields.

Not everybody is deeply familiar with reading "cell_methods" field, so the long name is a detailed layman description of what the element is representing. This describes the data to a wider audience than CF experts.

Staying with air_temperature as an example, the vertical dimension can be in metres (above ground, above model level, above mean sea level), in pressure units (Pascal) or in model sigma levels. We need to find the data that has elements at the right vertical height and these additional fields allow to filter for this.

As suggested, happy to add this structure as a cf:elements or cf:exact next to the cf:parameter definition to maintain backward compatibility.

@Fred-Leclercq
Copy link
Copy Markdown
Collaborator

I’m inclined to accept this pull request, but I’d like to review it in more detail. Most likely by the end of this week or early next week. @m-mohr any objections or remarks?

@m-mohr
Copy link
Copy Markdown
Collaborator

m-mohr commented Apr 1, 2025

Yeah, I've never been a fan of just dumping every information possible into STAC without any kind of alignment.

For example, STAC defines a unit field that is known by the ecosystem. Why does this field needs a different unit field with a different name and semantic?

Similarly for long_name, that seems to be meant as a title or description ("for example, be used for labeling plots"), we have the title and description fields in common metadata. Why is this not aligned?

standard_name breaks compatibility just for the sake of a different name. Why is name not working to keep compatible with existing implementations? The values can be as in CF, it's really just the name in the JSON...

vertical_dimension: looks like a usecase for the datacube extension.

asset_variable_name: If something only applies to a specific asset, the metadata should be in the asset.

(Then there's weirdness that creeping in from CF's design: If cell_methods is a list ("comprising a list"), why is it a string an not an array?)

So right now I just see dumping CF information into STAC without an attempt to properly align with STAC, so in the state it is right now I'd be -1 on this PR, sorry. I'm happy to add the information that is required for your usecase, but it should fit into the STAC ecosystem. If people don't align, everybody in the end just has a JSON encoding of their proprietary stuff and STAC gets more or less useless. The intention is that everyone can get easily familiar with what we describe in STAC, not just recognisable for people familiar with CF.

@m-mohr
Copy link
Copy Markdown
Collaborator

m-mohr commented Apr 1, 2025

An alternative idea that just came to mind is to actually just define fields such as search was mentioned in the PR introduction.
Search in arrays of objects has not been well supported in the API tooling. Having my previous comment in mind, I'm wondering whether a different approach would make sense. We could potentially also embed the CF specific fields separately.

For example, we could define the following fields

  • cf:standard_name - maps to standard_name in CF
  • cf:cell_methods - maps to cell_methods I guess, this one is not so clear to me (docs?)

And reuse existing fields:

  • title (or descriptioN) - maps to long_name in CF
  • unit - maps to units in CF

asset_variable_name is not needed any longer.

These fields can no be used in any place, for example in items, bands, datacube extension, assets etc. So if a specific item uses one specific standard name, just embed cf:standard_name in the properties. Easily searchable.
If an asset describes a specific standard name, use it in assets. I think we'd need to check the approach with a couple of examples.

I want to enable your usecases, I'm just not sure whether the proposed approach really is well-suited for STAC.

I think it could be easier to discuss and try this in a call. Maybe the STAC community call?

@drandyziegler
Copy link
Copy Markdown
Contributor Author

drandyziegler commented Apr 2, 2025

Thanks for the feedback @m-mohr . Very helpful to get your insights into the considerations for STAC extensions as a whole.
Happy to discuss in person in a call. How can I find out when and where the STAC community calls are?

A few points for further discussion:

  • I'd see a difference between a general-purpose extension (e.g. auth or eo for any type of electro-optical device) and an extension that embeds a particular well-defined scheme (e.g. cmip6, xarray, landsat). Latter tries to embed an external standard/convention into STAC and sticks to the exact wording/labels used within that convention/standard. For example xarray-stac uses "open_kwargs" instead of "open_keyword_arguments" which would be much more user friendly. However, user-friendliness is not the aim here but making an external labelling scheme accessible within STAC without ambiguity. I'd see the same apply to CF where words in CF have a particular meaning. If familiar with CF there is clarity what STAC is describing. If unfamiliar with CF then when going to the CF site and learning about it, again, there is a one-to-one match making it intuitive and unambiguous how to use it even without having to consult the STAC extension specification for each mapping.
  • Despite the data I'm interested in cataloguing is not organised on a cube but each datapoint could be at a different vertical positions the datacube extension might work. However, I need to describe the variables precisely using CF convention and the datacube extension does not provide CF fields as it is currently defined. There can be different type of "heights" in the same asset and collection.

Maybe I can list a set of variables that we typically find in a dataset for thoughts how to catalogue it best in STAC.
As you can see the first 5 variables are indistinguishable by standard_name and units. Heights for these is measured in metres but there is nothing that explicitly defines this as the vertical unit. Also the last three cannot be separated by standard_name and units but only by height which is in meters for the first and in hPa for the latter two. Again there are currently no units for the vertical dimension defined.
We need to search for any of these parameters which only together define an environmental variable precisely (standard_name, cell_methods, vertical placement) in a catalogue to know which assets to get and open.

f10 1h min

  • standard_name: wind_speed
  • long_name: minimum wind speed in 1 hour at 10 m agl
  • units: kt
  • height: 10.
  • cell_methods: "time: minimum (interval: -1 h)"

f10 1h mean

  • standard_name: wind_speed
  • long_name: mean wind speed in 1 hour at 10 m agl
  • units: kt
  • height: 10.
  • cell_methods: "time: mean (interval: -1 h)"

f10 24h min

  • standard_name: wind_speed
  • long_name: minimum wind speed in 24 hour at 10 m agl
  • units: kt
  • height: 10.
  • cell_methods: "time: minimum (interval: -24 h)"

f10 24h mean

  • standard_name: wind_speed
  • long_name: mean wind speed in 24 hour at 10 m agl
  • units: kt
  • height: 10.
  • cell_methods: "time: mean (interval: -24 h)"

f60

  • standard_name: wind_speed
  • long_name: wind speed at 60 m agl
  • units: kt
  • height: 60.

t2m

  • standard_name: air_temperature
  • long_name: air temperature at 2 m agl
  • units: degC
  • height: 2.

t850hPa

  • standard_name: air_temperature
  • long_name: air temperature at 850 hPa
  • units: degC
  • air_pressure: 850.

t500hPa

  • standard_name: air_temperature
  • long_name: air temperature at 500 hPa
  • units: degC
  • air_pressure: 500.

Hopefully this relevant example can help with guiding a STAC solution.

@m-mohr
Copy link
Copy Markdown
Collaborator

m-mohr commented Apr 3, 2025

Happy to discuss in person in a call. How can I find out when and where the STAC community calls are?

The STAC community calls are every other Monday, 17:00 CEST. Join https://groups.google.com/a/cloudnativegeo.org/g/stac-community to get an invite.

I'd see a difference between a general-purpose extension (e.g. auth or eo for any type of electro-optical device) and an extension that embeds a particular well-defined scheme (e.g. cmip6, xarray, landsat).

That's the point, I guess. All extensions should be general-purpose in the core. If a more specific variant is needed, they should inherit from the general-purpose extension. I'm not a fan of the xmip6, xarray and landsat extensions.

For example xarray-stac uses "open_kwargs" instead of "open_keyword_arguments" which would be much more user friendly.

Yes, I fully agree with you. The xarray extension is not well-designed, it's just a lazy way to dump "proprietary" stuff into STAC. If every programming language starts doing this we end up with open_kwargs, open_args_for_r, parameters_in_js, etc. Not good.

However, user-friendliness is not the aim here but making an external labelling scheme accessible within STAC without ambiguity.

There's no ambiguity if the extension is written properly and specifies a mapping to the extisting scheme.
For non-CFers it's not necessarily user-friendly. For CFers it may be, but why are they are not sticking to CF and where the metadata is today? Just my opinion though....

I appreciate the examples. Is any of the variables a separate asset? In this case I think my previous proposal would work well, listing the properties independantly, not in an array.
If it's in a single netCDF asset, I'd think the datacube extension should be able to describe it.

So if for example f10 1h min and t500hPa are each an asset, it could/should look more like this:

{
	"assets": {
		"f10_1h_min": {
			"href": "f10_1h_min.nc",
			"type": "application/netcdf",
			"cf:standard_name": "wind_speed",
			"cf:height": 10, # this should probably be generalized, height is not CF specific
			"description": "minimum wind speed in 1 hour at 10 m agl",
			"unit": "kt",
			"cf:cell_methods": "time: minimum (interval: -1 h)"
		},
		"t500hPa": {
			"href": "t500hPa.nc",
			"type": "application/netcdf",
			"cf:standard_name": "air_temperature",
			"cf:air_pressure": 500, # this should probably be generalized, air pressure is not CF specific
			"description": "air temperature at 500 hPa",
			"unit": "degC",
		}
	}
}

If it's a single netCDF asset, you'd use the datacube extension and provide the fields similarly in variables instead of in assets. Datacube extension variables are open to be extended by the CF extension.

A bit of discussion is probably needed for height and air_pressure, they should probably not be defined in the CF extension.

@drandyziegler
Copy link
Copy Markdown
Contributor Author

drandyziegler commented Apr 16, 2025

If it's a single netCDF asset, you'd use the datacube extension and provide the fields similarly in variables instead of in assets. Datacube extension variables are open to be extended by the CF extension.

The different variables all sit in a single NetCDF. But even if not then it's probably not ideal to define different vertical dimensions with a "cf:" prefix since there are many (and there could be more in the future), and they are all defined as standard_names themselves anyway.

However, trying to use the datacube extension I would get something like this to define elements, its vertical position and the time interval that the cell_methods is applied over.

What do you think of the below @m-mohr ?
And will this still work efficiently in a search to find the variable of interest with the correct cf standard_name, height, cell_methods and time interval?

  "properties": {
    "datetime": "2020-12-11T22:38:32Z",
    "cube:variables": [
      {
        "sea_surface_temperature": {
          "type": "data",
          "cf:standard_name": "sea_surface_temperature",
          "description": "Average temperature on sea surface for preceding 24 hours",
          "unit": "K",
          "cf:cell_methods": "time: mean",
          "cube:dimensions": {
            "time_interval": {
              "type": "temporal",
              "description": "time interval that cell_methods is applied over",
              "values": [-24],
              "unit": "h"
            }
          }
        },
        "wind_speed_at_10m": {
          "type": "data",
          "cf:standard_name": "wind_speed",
          "description": "minimum wind speed in 1 hour at 10 m agl",
          "units": "kt",
          "cf:cell_methods": "time: minimum",
          "cube:dimensions": {
            "height": {
              "type": "spatial",
              "axis": "z",
              "cf:standard_name": "height",
              "description": "Height above ground level",
              "unit": "m",
              "values": [10]
            },
            "time_interval": {
              "type": "temporal",
              "description": "time interval that cell_methods is applied over",
              "values": [-60],
              "unit": "min"
            }
          }
        },
        "temp_at_500hPa": {
          "type": "data",
          "cf:standard_name": "air_temperature",
          "description": "air temperature at 500 hPa",
          "units": "degC",
          "cube:dimensions": {
            "height": {
              "type": "spatial",
              "axis": "z",
              "cf:standard_name": "air_pressure",
              "description": "Air pressure",
              "unit": "hPa",
              "values": [500]
            }
          }
        }
      }
    ],
  }

@m-mohr
Copy link
Copy Markdown
Collaborator

m-mohr commented Apr 16, 2025

The different variables all sit in a single NetCDF.

Sure, that's fine and would work similar to what you are showing in your example.

But even if not then it's probably not ideal to define different vertical dimensions with a "cf:" prefix since there are many (and there could be more in the future), and they are all defined as standard_names themselves anyway.

I'm not sure I understand this sentence...

Example

The example looks good, except that you embedded the dimension information directly into the variables, those are listed externally in the data cube extension. I'm confused how the dimensions can have the same name but then have different values and units at the same time.

And will this still work efficiently in a search to find the variable of interest with the correct cf standard_name, height, cell_methods and time interval?

No, but it would also not work with the current CF extension or your proposal in this PR. STAC search has a hard time searching through arrays of objects or assets. So neither of the proposed solutions would work well for search yet.

@drandyziegler
Copy link
Copy Markdown
Contributor Author

drandyziegler commented Apr 17, 2025

The example looks good, except that you embedded the dimension information directly into the variables, those are listed externally in the data cube extension. I'm confused how the dimensions can have the same name but then have different values and units at the same time.

The name of the vertical definition is meaningless, only the CF fields within the object define the vertical dimension precisely. I gave these a neutral name below to reflect this.

But how would I pull out the time or vertical dimension which is specific to each of the cube:variables ? If I pull this out then I would need a field to link the vertical dimension with the variable, that would make a search even more difficult since the label that can be used for the vertical dimension is meaningless without the CF fields that define it. Hope you can help @m-mohr .

"properties": {
   "datetime": "2020-12-11T22:38:32Z",
   "cube:variables": [
     {
       "sea_surface_temperature": {
         "type": "data",
         "cf:standard_name": "sea_surface_temperature",
         "description": "Average temperature on sea surface for preceding 24 hours",
         "unit": "K",
         "cf:cell_methods": "time: mean",
         "cube:dimensions": {
           "time_interval1": {
             "type": "temporal",
             "description": "time interval that cell_methods is applied over",
             "values": [-24],
             "unit": "h"
           }
         }
       },
       "wind_speed_at_10m": {
         "type": "data",
         "cf:standard_name": "wind_speed",
         "description": "minimum wind speed in 1 hour at 10 m agl",
         "units": "kt",
         "cf:cell_methods": "time: minimum",
         "cube:dimensions": {
           "vertical_dimension1": {
             "type": "spatial",
             "axis": "z",
             "cf:standard_name": "height",
             "description": "Height above ground level",
             "unit": "m",
             "values": [10]
           },
           "time_interval2": {
             "type": "temporal",
             "description": "time interval that cell_methods is applied over",
             "values": [-60],
             "unit": "min"
           }
         }
       },
       "temp_at_500hPa": {
         "type": "data",
         "cf:standard_name": "air_temperature",
         "description": "air temperature at 500 hPa",
         "units": "degC",
         "cube:dimensions": {
           "vertical_dimension2": {
             "type": "spatial",
             "axis": "z",
             "cf:standard_name": "air_pressure",
             "description": "Air pressure",
             "unit": "hPa",
             "values": [500]
           }
         }
       }
     }
   ],
 }

@m-mohr
Copy link
Copy Markdown
Collaborator

m-mohr commented Apr 17, 2025

Not sure whether this solves your issue, but this is at least compliant to the datacube extension:

{
  "datetime": "2020-12-11T22:38:32Z",
  "cube:dimensions": {
    "time_interval1": {
      "type": "temporal",
      "description": "time interval that cell_methods is applied over",
      "values": [-24],
      "unit": "h"
    },
    "vertical_dimension1": {
      "type": "spatial",
      "axis": "z",
      "cf:standard_name": "height",
      "description": "Height above ground level",
      "unit": "m",
      "values": [10]
    },
    "time_interval2": {
      "type": "temporal",
      "description": "time interval that cell_methods is applied over",
      "values": [-60],
      "unit": "min"
    },
    "vertical_dimension2": {
      "type": "spatial",
      "axis": "z",
      "cf:standard_name": "air_pressure",
      "description": "Air pressure",
      "unit": "hPa",
      "values": [500]
    }
  },
  "cube:variables": [
    {
      "sea_surface_temperature": {
        "type": "data",
        "cf:standard_name": "sea_surface_temperature",
        "description": "Average temperature on sea surface for preceding 24 hours",
        "unit": "K",
        "cf:cell_methods": "time: mean",
        "dimensions": ["time_interval1"]
      },
      "wind_speed_at_10m": {
        "type": "data",
        "cf:standard_name": "wind_speed",
        "description": "minimum wind speed in 1 hour at 10 m agl",
        "units": "kt",
        "cf:cell_methods": "time: minimum",
        "dimensions": ["vertical_dimension1",  "time_interval2"]
      },
      "temp_at_500hPa": {
        "type": "data",
        "cf:standard_name": "air_temperature",
        "description": "air temperature at 500 hPa",
        "units": "degC",
        "dimensions": ["vertical_dimension2"]
      }
    }
  ]
}

@drandyziegler
Copy link
Copy Markdown
Contributor Author

drandyziegler commented May 28, 2025

Sorry for the not coming back for a while. Doing multiple roles at work at the moment.

Agree that normalising vertical and time dimension into separate definitions is cleaner and in line with the datacube extension.
I might be mistaken but does this mean I would need to search through the catalogue for these deeply nested fields two or three times to find the datasets that have the right variable, vertical height and time interval?
If this was the case then that's not very practical/performant I'd expect and might defeat the purpose of a catalogue with a cleaner metadata model implementation.
Or is there a way to do a "joint" search via the API that I'm not aware of @m-mohr ?

@m-mohr
Copy link
Copy Markdown
Collaborator

m-mohr commented May 28, 2025

Depends on your database implementation, the STAC API specification itself would allow it via custom queryables.

Alternatively, duplicate the information into the existing cf:parameter property, I think that was the primary purpose of the original extension anyway.

@drandyziegler
Copy link
Copy Markdown
Contributor Author

drandyziegler commented May 28, 2025

Hm. But then we are back full circle where we started since the original extension does not provide the ability to define a time aggregation dimension for the cell_methods nor allows to specify the vertical dimension and value. These are all essential together for finding the right environmental variable. Unless I misunderstand what you are suggesting how to use cf:parameter @m-mohr .

Re: API custom queryable. Could you sketch how the custom query would look like? Somehow it would need to search for the variable first, then from there read in the dimension labels, and then lookup the dimensions with the correct labels to filter for the relevant parameters for time and height. And since the labels are arbitrary this does not look like an easy way to specify. How would you see this working?

@m-mohr
Copy link
Copy Markdown
Collaborator

m-mohr commented Jul 3, 2025

Hm. But then we are back full circle where we started since the original extension does not provide the ability to define a time aggregation dimension for the cell_methods nor allows to specify the vertical dimension and value.

No, I don't think so. Due to the common metadata model, you can use the fields in multiple places and as such can summarize at a higher level than the variables for search purposes.

Re: API custom queryable. Could you sketch how the custom query would look like? Somehow it would need to search for the variable first, then from there read in the dimension labels, and then lookup the dimensions with the correct labels to filter for the relevant parameters for time and height. And since the labels are arbitrary this does not look like an easy way to specify. How would you see this working?

Could you sketch how it would be working with your proposal? I'm not 100% sure I understand it yet.
Queryables are just names for search criteria, which can be arbitrarily complex and do various things in the background. Depends on your underlying implementation pretty much.

Co-authored-by: Emmanuel Mathot <emmanuel.mathot@gmail.com>
@drandyziegler
Copy link
Copy Markdown
Contributor Author

drandyziegler commented Jul 7, 2025

Could you sketch how it would be working with your proposal? I'm not 100% sure I understand it yet. Queryables are just names for search criteria, which can be arbitrarily complex and do various things in the background. Depends on your underlying implementation pretty much.

I try @m-mohr .
Let's use this example

{
  "datetime": "2020-12-11T22:38:32Z",
  "cube:dimensions": {
    "time_interval1": {
      "type": "temporal",
      "description": "time interval that cell_methods is applied over",
      "values": [-24],
      "unit": "h"
    },
    "vertical_dimension1": {
      "type": "spatial",
      "axis": "z",
      "cf:standard_name": "height",
      "description": "Height above ground level",
      "unit": "m",
      "values": [10]
    }
  },
  "cube:variables": [
      "wind_speed_at_10m": {
        "type": "data",
        "cf:standard_name": "wind_speed",
        "description": "minimum wind speed in 1 hour at 10 m agl",
        "units": "kt",
        "cf:cell_methods": "time: minimum",
        "dimensions": ["vertical_dimension1",  "time_interval2"]
      }
    }
  ]
}

For finding which collections have a minimum wind speed in 1 hour at 10 metres above ground level I would need to extract information in the following orders:

  1. search through all collections that have cube:variables.*.cf:standard_name=wind_speed (assuming I can do a "*/wildcard" search like that since the label of the variable object in a datacube could be anything)
  2. for each found collection with a wind_speed standard_name I would need to do the following reads in sequence
    a. read cube_variables.wind_speed_at_10m.dimensions the dimension labels associated with the wind_speed variable (in this examples it would be vertical_dimension1 and time_interval2)
    b. Then read cube:dimension.vertical_dimension1.unit and then cube:dimensions.vertical_dimension1.values and see if it is equivalent to 10m
    c. Then read cube:dimension:vertical_dimension1.unit and then cube:dimension.time_interval1.values and see if it equivalent to 24 hours.

How would a queryable look like that embeds this kind of logic and extracts some of the necessary information (like the label of the variables object and the dimension labels) per collection and datacube?
Hope this helps.

Copy link
Copy Markdown
Collaborator

@m-mohr m-mohr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, but please clean-up the PR. There are so many weird files in the extension that don't belong there (.jsonbackup, and other weird extensions). I paused the review for now until it's cleaned up.

@drandyziegler
Copy link
Copy Markdown
Contributor Author

Thanks for the feedback @m-mohr
The weird files were test files and I didn't notice that they were added to the repo by my IDE. They are gone now.
Also adjusted schema as suggested.

@m-mohr
Copy link
Copy Markdown
Collaborator

m-mohr commented Dec 1, 2025

@drandyziegler I can't push my changes to your branch.
Can you enable this checkbox when you edit the PR?

{90CF1D55-94FB-4C97-BD19-4C3554DEF411}

(I'm not sure if you can enable this after creating the PR, in the worst case I can just merge it into a branch here make my edits and create a new PR form that branch.)

@drandyziegler
Copy link
Copy Markdown
Contributor Author

@m-mohr Can't find this option in this PR. Might be something that needs to be set when creating the PR as you suspect.

@drandyziegler
Copy link
Copy Markdown
Contributor Author

@m-mohr I added you as a collaborator to the project. Hope this helped.

@m-mohr
Copy link
Copy Markdown
Collaborator

m-mohr commented Dec 1, 2025

Thanks, the CI errors are fixed.

Reamining todos:

  • Update the schema, especially to include validation in bands and the datacube extension
  • Clarify null in cf:cell_methods or fix the example

@drandyziegler
Copy link
Copy Markdown
Contributor Author

Thanks, the CI errors are fixed.

Reamining todos:

  • Update the schema, especially to include validation in bands and the datacube extension
  • Clarify null in cf:cell_methods or fix the example

Thanks @m-mohr .
I think I straightened out the cell_methods fields but I'm out of my depth with what you mean with including validation in bands and datacube extension. Do you have an example of another extension that does that (or something similar) and I can compare and copy?

@m-mohr
Copy link
Copy Markdown
Collaborator

m-mohr commented Dec 4, 2025

Thanks. I can try to do that in the next days.

Can you add a bit more info about what null means in this case?
Especially the example [null, 'minimum'] is confusing to me. What does this mean?
Is that first axis no method, second axis minimum value?

@drandyziegler
Copy link
Copy Markdown
Contributor Author

Yes @m-mohr . A cell_methods is applied over a dimension. @emmanuelmathot suggested in the comment here #8 (comment) to have this as an array that lines up with the order in the dimensions array. I added a bit more to the README to explain this better.

This array approach has two shortcomings:

  • some methods (like bilinear interpolation) is applied over both spatial dimensions simultaneously
  • the order of methods applied matters but there is no requirement for sorting dimensions this way.

In these cases a string representation is the alternative way of describing cell_methods. I also added an example to the README for that.

Copy link
Copy Markdown
Collaborator

@m-mohr m-mohr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approving for now, I still need to update the schema but will do in a separate PR.

@m-mohr m-mohr requested a review from emmanuelmathot January 26, 2026 13:03
@m-mohr m-mohr mentioned this pull request Jan 26, 2026
@drandyziegler
Copy link
Copy Markdown
Contributor Author

Hi @emmanuelmathot
Would you have some time to review this PR?
Thanks.
Andy

Copy link
Copy Markdown
Member

@emmanuelmathot emmanuelmathot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with me for now as well. Need a schema update for a proper release

@m-mohr
Copy link
Copy Markdown
Collaborator

m-mohr commented Mar 17, 2026

Okay, let's merge this for now.

The JSON Schema issue is tracked in #9. It is assigned to me, but I'm pretty busy the next months so if anyone gets to it before me, feel free to take over.

@m-mohr m-mohr merged commit 60b499e into stac-extensions:main Mar 17, 2026
1 check passed
@m-mohr m-mohr deleted the az_cfexpansion branch March 17, 2026 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants