Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nodata for product definition doesnt match dtype #1348

Open
pindge opened this issue Nov 7, 2022 · 15 comments
Open

nodata for product definition doesnt match dtype #1348

pindge opened this issue Nov 7, 2022 · 15 comments
Assignees
Labels

Comments

@pindge
Copy link
Contributor

pindge commented Nov 7, 2022

related to: GeoscienceAustralia/dea-config#1110

247 | ga_srtm_dem1sv1_0 | {"platform": {"code": "Space Shuttle Endeavour"}, "instrument": {"name": "SIR"}, "product_type": "DEM"} |                 1 | {"name": "ga_srtm_dem1sv1_0", "storage": {"crs": "EPSG:4326", "resolution": {"latitude": -0.00027777777778, "longitude": 0.00027777777778}}, "metadata": {"platform": {"code": "Space Shuttle Endeavour"}, "instrument": {"name": "SIR"}, "product_type": "DEM"}, "description": "DEM 1sec Version 1.0", "measurements": [{"name": "dem", "dtype": "float32", "units": "metre", "nodata": -340282350000000000000000000000000000000}, {"name": "dem_s", "dtype": "float32", "units": "metre", "nodata": -340282350000000000000000000000000000000}, {"name": "dem_h", "dtype": "float32", "units": "metre", "nodata": -340282350000000000000000000000000000000}], "metadata_type": "eo"} | 2020-09-25 04:29:16.760552+00 | ows      | 2021-07-01 01:07:29.571853+00
(1 row)

eo3-validate type handling needs update

@pindge pindge transferred this issue from opendatacube/datacube-core Nov 7, 2022
@jeremyh
Copy link
Contributor

jeremyh commented Nov 7, 2022

Our validator is correctly failing the product --- should this issue go to ODC core?

@pindge
Copy link
Contributor Author

pindge commented Nov 7, 2022

@SpacemanPaul suggested this is not core's issue, it was just transferred here from core

@SpacemanPaul
Copy link
Contributor

SpacemanPaul commented Nov 7, 2022

I'm not that strongly opinionated on the matter.

What would the metadata fix be though? -340282350000000000000000000000000000000 -> -3.4028235e+38 ?

What does a nodata value on a floating point variable even mean?

@pindge
Copy link
Contributor Author

pindge commented Nov 9, 2022

@pindge
Copy link
Contributor Author

pindge commented Nov 9, 2022

#1347

@pindge pindge transferred this issue from opendatacube/eo-datasets Nov 9, 2022
@Kirill888
Copy link
Member

What does a nodata value on a floating point variable even mean?

same as for an int: bit pattern reserved to mean absence of valid observation, just cause float32 already has standard bit-patterns that mean exactly that, doesn't stop some from using some finite values instead and sometimes along with proper float-point NaNs. All masking code is significantly complicated due to float nodata that could be nan or some finite float value, or absent, which is the same as nan.

@Kirill888
Copy link
Member

Kirill888 commented Nov 9, 2022

The real value is probably: -340282346638528859811704183484516925440 which is numpy.finfo(numpy.float32).min, but it lost precision when converting to string via scientific notation.

Our validator is correctly failing the product --- should this issue go to ODC core?

@jeremyh

test in numpy_value_fits_dtype

        return np.all(np.array([value], dtype=dtype) == [value])

doesn't check what it claims, it checks that value can round-trip to numpy and back, and that's only appropriate for integer types. Value as supplied, is probably incorrect because of precision loss, but it is surely not outside of valid range for float32 number.

@pindge
Copy link
Contributor Author

pindge commented Nov 9, 2022

checking the dataset in middle of the ocean

import datacube
dc = datacube.Datacube()
x = 136.43551
y = -37.25564
buffer_x = 0.03
buffer_y = 0.03
query = dict(
    product="ga_srtm_dem1sv1_0",
    x=(x - buffer_x, x + buffer_x),
    y=(y - buffer_y, y + buffer_y),
)
ds = dc.load(**query)
ds

the output of stored nodata

dem
(time, latitude, longitude)
float32
-3.403e+38 ... -3.403e+38
array([[[-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
        [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
        [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
        ...,
        [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
        [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
        [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38]]], dtype=float32)
dem_s
(time, latitude, longitude)
float32
-3.403e+38 ... -3.403e+38
array([[[-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
        [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
        [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
        ...,
        [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
        [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
        [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38]]], dtype=float32)
dem_h
(time, latitude, longitude)
float32
-3.403e+38 ... -3.403e+38
array([[[-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
        [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
        [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
        ...,
        [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
        [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
        [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38]]], dtype=float32)

@pindge
Copy link
Contributor Author

pindge commented Nov 9, 2022

ds.dem[0][1][1]
xarray.DataArray'dem'
array(-3.4028235e+38, dtype=float32)
Coordinates:
time
()
datetime64[ns]
2014-12-15T14:58:44
latitude
()
float64
-37.23
longitude
()
float64
136.4
spatial_ref
()
int32
4326
Attributes:
units :
metre
nodata :
-340282350000000000000000000000000000000
crs :
EPSG:4326
grid_mapping :
spatial_ref

@robbibt
Copy link
Contributor

robbibt commented Nov 9, 2022

Example of datacube's mask_invalid_data function failing to mask nodata for this product. If this worked as intended, I would expect to see all -3.4028235e+38 values in the dataset masked out and set to np.NaN:

import datacube
from datacube.utils.masking import mask_invalid_data

dc = datacube.Datacube()
x = 136.43551
y = -37.25564
buffer_x = 0.03
buffer_y = 0.03
query = dict(
    product="ga_srtm_dem1sv1_0",
    x=(x - buffer_x, x + buffer_x),
    y=(y - buffer_y, y + buffer_y),
)
ds = dc.load(**query)

# Attempt to mask invalid data
ds_masked = mask_invalid_data(ds.dem_h)
ds_masked

Nodata values are not correctly replaced with np.NaN:
image

@Kirill888
Copy link
Member

@robbibt can you try patching .nodata value on xarray variable to -340282346638528859811704183484516925440 and then try masking again? Also value above, might pass validation as it should round-trip to/from numpy without loss.

@Kirill888
Copy link
Member

What about nodata tags in the imagery itself, what are they set to? Also is nodata coming from files or is it coming from load, load will place whatever incorrect rounded value recorded in the product spec, and then separate to that data itself might contain non-rounded values of nodata.

@robbibt
Copy link
Contributor

robbibt commented Nov 10, 2022

@Kirill888 That works!

import datacube
from datacube.utils.masking import mask_invalid_data

dc = datacube.Datacube()
x = 136.43551
y = -37.25564
buffer_x = 0.03
buffer_y = 0.03
query = dict(
    product="ga_srtm_dem1sv1_0",
    x=(x - buffer_x, x + buffer_x),
    y=(y - buffer_y, y + buffer_y),
)
ds = dc.load(**query)

# Attempt to mask invalid data
ds.dem_h.attrs['nodata'] = -340282346638528859811704183484516925440
ds_masked = mask_invalid_data(ds.dem_h)
ds_masked

image

What about nodata tags in the imagery itself

Not sure what you mean by this, do you mean the nodata attributes on the different bands of the dataset?

@Kirill888
Copy link
Member

Not sure what you mean by this, do you mean the nodata attributes on the different bands of the dataset?

Is there nodata marker on the TIFF image itself (rio info path_to_image), and is it set to a correct value or is it truncated, like in the product definition.

@stale
Copy link

stale bot commented Mar 18, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Mar 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants