nodata for product definition doesnt match dtype #1348

pindge · 2022-11-07T23:23:53Z

related to: GeoscienceAustralia/dea-config#1110

247 | ga_srtm_dem1sv1_0 | {"platform": {"code": "Space Shuttle Endeavour"}, "instrument": {"name": "SIR"}, "product_type": "DEM"} |                 1 | {"name": "ga_srtm_dem1sv1_0", "storage": {"crs": "EPSG:4326", "resolution": {"latitude": -0.00027777777778, "longitude": 0.00027777777778}}, "metadata": {"platform": {"code": "Space Shuttle Endeavour"}, "instrument": {"name": "SIR"}, "product_type": "DEM"}, "description": "DEM 1sec Version 1.0", "measurements": [{"name": "dem", "dtype": "float32", "units": "metre", "nodata": -340282350000000000000000000000000000000}, {"name": "dem_s", "dtype": "float32", "units": "metre", "nodata": -340282350000000000000000000000000000000}, {"name": "dem_h", "dtype": "float32", "units": "metre", "nodata": -340282350000000000000000000000000000000}], "metadata_type": "eo"} | 2020-09-25 04:29:16.760552+00 | ows      | 2021-07-01 01:07:29.571853+00
(1 row)

eo3-validate type handling needs update

The text was updated successfully, but these errors were encountered:

jeremyh · 2022-11-07T23:48:25Z

Our validator is correctly failing the product --- should this issue go to ODC core?

pindge · 2022-11-07T23:49:07Z

@SpacemanPaul suggested this is not core's issue, it was just transferred here from core

SpacemanPaul · 2022-11-07T23:54:24Z

I'm not that strongly opinionated on the matter.

What would the metadata fix be though? -340282350000000000000000000000000000000 -> -3.4028235e+38 ?

What does a nodata value on a floating point variable even mean?

pindge · 2022-11-09T00:52:49Z

nodata supported type can be anything as per odc schema https://github.com/opendatacube/datacube-core/blob/develop/datacube/model/schema/dataset-type-schema.yaml#L97-L100

nodata:
              oneOf:
                - type: number
                - enum: [NaN, Inf, -Inf]
                -

dtype restriction https://github.com/opendatacube/datacube-core/blob/develop/datacube/model/schema/dataset-type-schema.yaml#L54-L55

pindge · 2022-11-09T03:01:43Z

#1347

Kirill888 · 2022-11-09T04:36:03Z

What does a nodata value on a floating point variable even mean?

same as for an int: bit pattern reserved to mean absence of valid observation, just cause float32 already has standard bit-patterns that mean exactly that, doesn't stop some from using some finite values instead and sometimes along with proper float-point NaNs. All masking code is significantly complicated due to float nodata that could be nan or some finite float value, or absent, which is the same as nan.

Kirill888 · 2022-11-09T05:03:07Z

The real value is probably: -340282346638528859811704183484516925440 which is numpy.finfo(numpy.float32).min, but it lost precision when converting to string via scientific notation.

Our validator is correctly failing the product --- should this issue go to ODC core?

@jeremyh

test in numpy_value_fits_dtype

        return np.all(np.array([value], dtype=dtype) == [value])

doesn't check what it claims, it checks that value can round-trip to numpy and back, and that's only appropriate for integer types. Value as supplied, is probably incorrect because of precision loss, but it is surely not outside of valid range for float32 number.

pindge · 2022-11-09T21:40:23Z

checking the dataset in middle of the ocean

import datacube
dc = datacube.Datacube()
x = 136.43551
y = -37.25564
buffer_x = 0.03
buffer_y = 0.03
query = dict(
    product="ga_srtm_dem1sv1_0",
    x=(x - buffer_x, x + buffer_x),
    y=(y - buffer_y, y + buffer_y),
)
ds = dc.load(**query)
ds

the output of stored nodata

dem
(time, latitude, longitude)
float32
-3.403e+38 ... -3.403e+38
array([[[-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
        [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
        [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
        ...,
        [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
        [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
        [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38]]], dtype=float32)
dem_s
(time, latitude, longitude)
float32
-3.403e+38 ... -3.403e+38
array([[[-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
        [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
        [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
        ...,
        [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
        [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
        [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38]]], dtype=float32)
dem_h
(time, latitude, longitude)
float32
-3.403e+38 ... -3.403e+38
array([[[-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
        [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
        [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
        ...,
        [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
        [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
        [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38]]], dtype=float32)

pindge · 2022-11-09T21:54:08Z

ds.dem[0][1][1]
xarray.DataArray'dem'
array(-3.4028235e+38, dtype=float32)
Coordinates:
time
()
datetime64[ns]
2014-12-15T14:58:44
latitude
()
float64
-37.23
longitude
()
float64
136.4
spatial_ref
()
int32
4326
Attributes:
units :
metre
nodata :
-340282350000000000000000000000000000000
crs :
EPSG:4326
grid_mapping :
spatial_ref

robbibt · 2022-11-09T22:59:55Z

Example of datacube's mask_invalid_data function failing to mask nodata for this product. If this worked as intended, I would expect to see all -3.4028235e+38 values in the dataset masked out and set to np.NaN:

import datacube
from datacube.utils.masking import mask_invalid_data

dc = datacube.Datacube()
x = 136.43551
y = -37.25564
buffer_x = 0.03
buffer_y = 0.03
query = dict(
    product="ga_srtm_dem1sv1_0",
    x=(x - buffer_x, x + buffer_x),
    y=(y - buffer_y, y + buffer_y),
)
ds = dc.load(**query)

# Attempt to mask invalid data
ds_masked = mask_invalid_data(ds.dem_h)
ds_masked

Nodata values are not correctly replaced with np.NaN:

Kirill888 · 2022-11-10T01:06:56Z

@robbibt can you try patching .nodata value on xarray variable to -340282346638528859811704183484516925440 and then try masking again? Also value above, might pass validation as it should round-trip to/from numpy without loss.

Kirill888 · 2022-11-10T01:11:27Z

What about nodata tags in the imagery itself, what are they set to? Also is nodata coming from files or is it coming from load, load will place whatever incorrect rounded value recorded in the product spec, and then separate to that data itself might contain non-rounded values of nodata.

robbibt · 2022-11-10T02:26:21Z

@Kirill888 That works!

import datacube
from datacube.utils.masking import mask_invalid_data

dc = datacube.Datacube()
x = 136.43551
y = -37.25564
buffer_x = 0.03
buffer_y = 0.03
query = dict(
    product="ga_srtm_dem1sv1_0",
    x=(x - buffer_x, x + buffer_x),
    y=(y - buffer_y, y + buffer_y),
)
ds = dc.load(**query)

# Attempt to mask invalid data
ds.dem_h.attrs['nodata'] = -340282346638528859811704183484516925440
ds_masked = mask_invalid_data(ds.dem_h)
ds_masked

What about nodata tags in the imagery itself

Not sure what you mean by this, do you mean the nodata attributes on the different bands of the dataset?

Kirill888 · 2022-11-10T05:28:42Z

Not sure what you mean by this, do you mean the nodata attributes on the different bands of the dataset?

Is there nodata marker on the TIFF image itself (rio info path_to_image), and is it set to a correct value or is it truncated, like in the product definition.

stale · 2023-03-18T04:33:17Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

pindge transferred this issue from opendatacube/datacube-core Nov 7, 2022

pindge mentioned this issue Nov 9, 2022

change error to warning for nodata value mismatch opendatacube/eo-datasets#282

Closed

pindge transferred this issue from opendatacube/eo-datasets Nov 9, 2022

pindge assigned SpacemanPaul Nov 9, 2022

pindge mentioned this issue Nov 9, 2022

eo3-validate failing to recognise float32 for nodata opendatacube/eo-datasets#283

Open

robbibt mentioned this issue Jan 31, 2023

Fix nodata value in ga_srtm_dem1sv1_0 product GeoscienceAustralia/dea-config#1141

Merged

stale bot added the wontfix label Mar 18, 2023

SpacemanPaul added pinned and removed wontfix labels Mar 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nodata for product definition doesnt match dtype #1348

nodata for product definition doesnt match dtype #1348

pindge commented Nov 7, 2022 •

edited

jeremyh commented Nov 7, 2022

pindge commented Nov 7, 2022

SpacemanPaul commented Nov 7, 2022 •

edited

pindge commented Nov 9, 2022

pindge commented Nov 9, 2022

Kirill888 commented Nov 9, 2022

Kirill888 commented Nov 9, 2022 •

edited

pindge commented Nov 9, 2022

pindge commented Nov 9, 2022

robbibt commented Nov 9, 2022 •

edited

Kirill888 commented Nov 10, 2022

Kirill888 commented Nov 10, 2022

robbibt commented Nov 10, 2022 •

edited

Kirill888 commented Nov 10, 2022

stale bot commented Mar 18, 2023

nodata for product definition doesnt match dtype #1348

nodata for product definition doesnt match dtype #1348

Comments

pindge commented Nov 7, 2022 • edited

jeremyh commented Nov 7, 2022

pindge commented Nov 7, 2022

SpacemanPaul commented Nov 7, 2022 • edited

pindge commented Nov 9, 2022

pindge commented Nov 9, 2022

Kirill888 commented Nov 9, 2022

Kirill888 commented Nov 9, 2022 • edited

pindge commented Nov 9, 2022

pindge commented Nov 9, 2022

robbibt commented Nov 9, 2022 • edited

Kirill888 commented Nov 10, 2022

Kirill888 commented Nov 10, 2022

robbibt commented Nov 10, 2022 • edited

Kirill888 commented Nov 10, 2022

stale bot commented Mar 18, 2023

pindge commented Nov 7, 2022 •

edited

SpacemanPaul commented Nov 7, 2022 •

edited

Kirill888 commented Nov 9, 2022 •

edited

robbibt commented Nov 9, 2022 •

edited

robbibt commented Nov 10, 2022 •

edited