Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YAML area configuration does not allow to specify the dtype and dtype is ignored in equality comparisons #590

Open
gerritholl opened this issue Mar 15, 2024 · 6 comments

Comments

@gerritholl
Copy link
Collaborator

Today I learned that an areadefinition has a dtype attribute. This seems to be not very well documented and not very well supported, but it's relevant. It is used at least in get_lonlats(), which is used in resampling, so changing the dtype can have considerable memory implications and make the difference between having enough RAM or not.

Code Sample, a minimal, complete, and verifiable piece of code

import pyresample

ar = pyresample.AreaDefinition(
        "area_id", "descr", "proj_id", 4326, 100, 100,
       [-100, -100, 100, 100], dtype="float32")
s = ar.dump()
print(s)
ar2 = pyresample.area_config.load_area_from_string(s)
print("equals?", ar == ar2)
print("dtypes", ar.dtype, ar2.dtype)
s2 = """area_id:
  description: descr
  projection:
    EPSG: 4326
  shape:
    height: 100
    width: 100
  area_extent:
    lower_left_xy: [-100, -100]
    upper_right_xy: [100, 100]
  dtype: float32
"""
ar3 = pyresample.area_config.load_area_from_string(s2)
print(ar3.dtype)

Problem description

Executing the code reveals several problems:

  • ar.dump() does not output the dtype
  • ar and ar2 are considered equal despite having different dtypes
  • ar3, loaded from a string that does encode the dtype, gets float64 despite the YAML definition stating float32.

It does not appear to be possible to specify the dtype in the YAML configuration (nor in pyresample.create_area_def()).

Expected Output

area_id:
  description: descr
  projection:
    EPSG: 4326
  shape:
    height: 100
    width: 100
  area_extent:
    lower_left_xy: [-100, -100]
    upper_right_xy: [100, 100]
  dtype: float32

equals? True
dtypes float32 <class 'numpy.float32'>
<class 'numpy.float32'>

Actual Result, Traceback if applicable

area_id:
  description: descr
  projection:
    EPSG: 4326
  shape:
    height: 100
    width: 100
  area_extent:
    lower_left_xy: [-100, -100]
    upper_right_xy: [100, 100]

equals? True
dtypes float32 <class 'numpy.float64'>
/data/gholl/checkouts/pyresample/pyresample/area_config.py:91: UserWarning: Unused/unexpected area definition parameter(s) for area_id: params={'dtype': 'float32'}
  area_list = parse_area_file(area_file_name, *regions)
<class 'numpy.float64'>

Versions of Python, package at hand and relevant dependencies

pyresample main (v1.28.2-2-g711f354)

@djhoese
Copy link
Member

djhoese commented Mar 15, 2024

I would consider dtype in the init method deprecated. At least it will not be included in future versions of the area definition class or at least not the ones I have planned.

@gerritholl
Copy link
Collaborator Author

How would you recommend user controlling the dtype when resampling? Keyword argument to resample?

@djhoese
Copy link
Member

djhoese commented Mar 15, 2024

I'm tempted to say resampling should always use 64-bit floats for projected areas and optionally/possibly do 32-bit floats for lon/lats (degrees) coordinate systems.

@gerritholl
Copy link
Collaborator Author

I run out of memory with 64-bit floats, but my code completes fine when I force them to be 32 bits (projected area).

@djhoese
Copy link
Member

djhoese commented Mar 15, 2024

Are the lon/lats being forced to 32-bit in your changes or the x/y coordinates? We've (Panu and me at least) discussed in the past on slack that 64-bit is required for x/y meter accuracy in most projections, but is way overkill for lon/lat degrees. At least that's my memory.

@djhoese
Copy link
Member

djhoese commented Mar 15, 2024

Also, if this is running out of memory (OOM) with the "nearest" resampler then it is likely the generation of the KDTree that is the major contributor to the memory usage as the entire thing has to exist in memory at once (no dask chunking). The KDTree involves 3 axes (x, y, z) of geocentric coordinates...but I thought that was always 64-bit so I might be wrong about the dtype of the lon/lats contributing to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants