Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blackbody functions not dask friendly, trigger early dask computation #201

Closed
gerritholl opened this issue Oct 24, 2023 · 3 comments
Closed
Labels

Comments

@gerritholl
Copy link

The functions in the blackbody module are not dask-friendly. Their use with dask arrays leads to an early dask computation.

Code Sample, a minimal, complete, and verifiable piece of code

import dask.config
import dask.array as da
from pyspectral.blackbody import blackbody_rad2temp
from satpy.tests.utils import CustomScheduler

with dask.config.set(scheduler=CustomScheduler(max_computes=0)):
    blackbody_rad2temp(3.9e-6, da.array([324000, 325000]))

Problem description

Fails with RuntimeError, because blackbody_rad2temp computes the dask arrays.

Expected Output

I expect no output.

Actual Result, Traceback if applicable

Traceback (most recent call last):
  File "/data/gholl/checkouts/protocode/mwe/pyspectral-compute.py", line 7, in <module>
    blackbody_rad2temp(3.9e-6, da.array([324000, 325000]))
  File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/pyspectral/blackbody.py", line 51, in blackbody_rad2temp
    rad = np.array(radiance, dtype='float64')
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/dask/array/core.py", line 1700, in __array__
    x = self.compute()
        ^^^^^^^^^^^^^^
  File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/dask/base.py", line 342, in compute
    (result,) = compute(self, traverse=False, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/dask/base.py", line 628, in compute
    results = schedule(dsk, keys, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/gholl/checkouts/satpy/satpy/tests/utils.py", line 288, in __call__
    raise RuntimeError("Too many dask computations were scheduled: "
RuntimeError: Too many dask computations were scheduled: 1

Versions of Python, package at hand and relevant dependencies

Python 3.11, pyspectral latest main (v0.12.5-9-gfd32fe9).

This limitation causes pytroll/satpy#2613.

@djhoese
Copy link
Member

djhoese commented Oct 24, 2023

It seems this pattern of np.array(X, dtype=Y) is used a lot in blackbody.py. This is bad for multiple reasons.

  1. As mentioned, this requires converting dask arrays into numpy arrays. This would also apply to anyone trying to pass cupy (CUDA-based GPU arrays) or any other numpy subclass.
  2. It copies the data being passed to it even if it is already a numpy array.

Does anyone know if these arrays need to be 64-bit floats? I'm tempted to say this should just be np.asarray with no floating point dtype or if the 64-bit floats is needed do np.asarray(X, dtype=np.float64) and it will only do the copy and type conversion if necessary (my understanding at least).

@simonrp84
Copy link
Member

As an aside, another issue here (which I've mentioned on slack) is that some of these functions (tb2radiance for example) perform computations for each pixel and at each spectral response function point. Then once it's done that it integrates over the SRF.

For Himawari, this means there's an internal array of 5500x5500x3370 points...which is way too big to fit in the memory. I meant to make an issue for it but forgot - and as this is related it seems like a good place to note it :-)

@pnuu
Copy link
Member

pnuu commented Nov 23, 2023

Closed by #203

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

No branches or pull requests

4 participants