Warning Message: FutureWarning: The `numpy.moveaxis` function is not implemented by Dask array #690

shanicetbailey · 2019-08-09T20:30:34Z

I keep getting a warning message that repeats itself until 1) I am kicked out of the server when using clusters; or 2) the kernel just keeps running when not using clusters and repeating the error message, to the point where I have to interrupt the cell. I think it might have something to do with some updates that went through yesterday affecting the numpy/dask interface, but I'm not completely sure.

Reproducible Code:

import xarray as xr
import dask
import dask.array as dsa
import numpy as np
import intake
from xmitgcm.llcreader.llcmodel import faces_dataset_to_latlon

I also get a warning message after I run the cell importing all the necessary packages:

/srv/conda/envs/notebook/lib/python3.7/site-packages/tqdm/autonotebook/__init__.py:18: TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)
  " (e.g. in jupyter console)", TqdmExperimentalWarning)
/srv/conda/envs/notebook/lib/python3.7/site-packages/intake/source/discovery.py:136: FutureWarning: The drivers ['esm_metadatastore'] do not specify entry_points and were only discovered via a package scan. This may break in a future release of intake. The packages should be updated.
  FutureWarning)

ecco_url = 'https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/ocean.yaml'
ecco_cat = intake.Catalog(ecco_url)
ds = ecco_cat["ECCOv4r3"].to_dask()

ds_ll = faces_dataset_to_latlon(ds)
ds_ll

Here is the error message (the cell runs for a while if not using clusters):

/srv/conda/envs/notebook/lib/python3.7/site-packages/dask/array/core.py:1263: FutureWarning: The `numpy.moveaxis` function is not implemented by Dask array. You may want to use the da.map_blocks function or something similar to silence this warning. Your code may stop working in a future release.
  FutureWarning,
/srv/conda/envs/notebook/lib/python3.7/site-packages/dask/array/core.py:1263: FutureWarning: The `numpy.moveaxis` function is not implemented by Dask array. You may want to use the da.map_blocks function or something similar to silence this warning. Your code may stop working in a future release.
  FutureWarning,
/srv/conda/envs/notebook/lib/python3.7/site-packages/dask/array/core.py:1263: FutureWarning: The `numpy.moveaxis` function is not implemented by Dask array. You may want to use the da.map_blocks function or something similar to silence this warning. Your code may stop working in a future release.
  FutureWarning,

I'd appreciate some insight to resolve this issue, thank you.

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2019-08-09T20:37:05Z

I think the warnings can be ignored. intake/intake-esm#121 is solving the one from intake.

It looks like dask/dask#4822 is implementing moveaxis on dask.array. I'll see where that's at.

shanicetbailey · 2019-08-09T20:42:43Z

You mean the warnings after importing the packages? Yea I figured since I am able to run subsequent cells.

Ok, any updates would be appreciated, thanks!

TomAugspurger · 2019-08-09T21:10:19Z

Sorry, yes, I meant the warnings on import.

For now, you can also probably safely ignore the moveaxis warning. I'm guessing it'll be fixed in Dask soon.

shanicetbailey · 2019-08-09T21:18:09Z

Unfortunately, I'm not able to run subsequent cells since the kernel seems to be preoccupied resolving the ds_ll cell and eventually shuts me out of the server each time it is run - so I'm not able to move on until this gets resolved. But yes, hopefully someone will fix it soon.

TomAugspurger · 2019-08-09T21:41:14Z

Oh, sorry I missed that part of your post. I'm trying this out on ocean.pangeo.io

I am kicked out of the server when using clusters

My notebook kernel died on the line ds_ll = faces_dataset_to_latlon(ds) too.

I think it might have something to do with some updates that went through yesterday affecting the numpy/dask interface, but I'm not completely sure.

Yeah, you're right. I was completely wrong about https://github.com/pangeo-data/pangeo-cloud-federation/issues/364#issuecomment-520064580. In NumPy 1.16, np.moveaxis(dask.array.Array) returned a Dask Array.

In [11]: np.__version__
Out[11]: '1.16.0'

In [12]: a = np.random.random((4, 4, 4))

In [13]: np.moveaxis(da.from_array(a, 2), 1, 2)
Out[13]: dask.array<transpose, shape=(4, 4, 4), dtype=float64, chunksize=(2, 2, 2)>

With NumPy 1.17, that returns a NumPy array.

In [6]: a = np.random.random((4, 4, 4))

In [7]: np.moveaxis(da.from_array(a, 2), 1, 2)
/Users/taugspurger/sandbox/dask/dask/array/core.py:1264: FutureWarning: The `numpy.moveaxis` function is not implemented by Dask array. You may want to use the da.map_blocks function or something similar to silence this warning. Your code may stop working in a future release.
  FutureWarning,
Out[7]:
array([[[0.883594  , 0.83361276, 0.11596388, 0.42493785],
        [0.29075857, 0.3312683 , 0.70986969, 0.76634831],
        [0.61024485, 0.038276  , 0.14124975, 0.20009608],
        [0.74891671, 0.28027278, 0.62557011, 0.32603486]],

       [[0.45846013, 0.65317719, 0.14381856, 0.67333014],
        [0.18534854, 0.53083362, 0.01030157, 0.8822557 ],
        [0.55225587, 0.45671406, 0.58132645, 0.72099828],
        [0.64439194, 0.01546631, 0.136054  , 0.45866154]],

       [[0.9110986 , 0.71479734, 0.41174671, 0.63004493],
        [0.90519822, 0.07737934, 0.72285197, 0.25865702],
        [0.49462467, 0.56716872, 0.8396765 , 0.63395948],
        [0.58644267, 0.62561324, 0.00824153, 0.90913008]],

       [[0.51209298, 0.11582602, 0.89098367, 0.95992173],
        [0.35492695, 0.8645212 , 0.53640816, 0.12354237],
        [0.80328269, 0.50222311, 0.93996505, 0.23952077],
        [0.57965991, 0.00851389, 0.71330849, 0.20458262]]])

So I think we end up trying to load all 134GB of the data onto the worker running your notebook. Not good.

It may end up breaking things, but if you need a quick solution, you can export the environment variable NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=0. What cluster are you running this on?

shanicetbailey · 2019-08-09T21:45:01Z

I am using ocean.pangeo.io as well and I just use the Dask dashboard provided in the jupyter lab and scale my workers anywhere from 4-6 workers.

TomAugspurger · 2019-08-09T21:53:27Z

OK, if you add

import os

os.environ['NUMPY_EXPERIMENTAL_ARRAY_FUNCTION'] = '0'

before importing numpy (not sure if before or after matters), and then

from dask_kubernetes import KubeCluster

cluster = KubeCluster(n_workers=10, env={"NUMPY_EXPERIMENTAL_ARRAY_FUNCTION": "0"})

when you create the cluster, things will hopefully work. Trying it out now.

TomAugspurger · 2019-08-09T21:55:01Z

Oh, yeah that definitely worked, since ds_ll = faces_dataset_to_latlon(ds) returned immediately.

rabernat · 2019-08-10T10:31:20Z

I am so pleased that

@stb2145 provided such a great bug report which allowed us to easily reproduce the error
@TomAugspurger responded so quickly to find a workaround

In general, I am quite puzzled by this behavior from numpy. The 1.17 release seems like a step in the wrong direction. Duck array functionality that worked in 1.16 is now broken in 1.17, without this special environment variable. The numpy docs seem to suggest the opposite

Dispatch with the __array_function__ protocol has been implemented but is not yet enabled by default:

In NumPy 1.16, you need to set the environment variable NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=1 before importing NumPy to test NumPy function overrides.

In NumPy 1.17, the protocol will be enabled by default, but can be disabled with NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=0.

Eventually, expect to array_function to always be enabled.

This seems backwards from what we are experiencing:

In NumPy 1.16, np.moveaxis dispatches to dask by default
In NumPy 1.17, np.moveaxis dispatches to numpy by default

Over in dask/dask#2559, @shoyer noted that

np.rollaxis and np.moveaxis already work on dask arrays, since they are defined in terms of the transpose method

Something related to the new __array_function__ protocol caused that reasoning to no longer work, potentially breaking any downstream user code that relied on lazy evaluation of np.rollaxis and np.moveaxis.

I wonder if it is worth opening an issue in numpy to alert the broader community to this.

shanicetbailey · 2019-08-10T13:47:06Z

I think perhaps we should, wouldn't hurt! Though @TomAugspurger provided an easy workaround, the root problem is still active, and we should let numpy community be aware of this so someone can potentially provide some more insight or fix to this.

shoyer · 2019-08-10T16:54:28Z

np.moveaxis used an older form of dispatching (common in many simple NumPy functions), by checking for the existence for particular methods/attributes by name.

__array_function__ side steps all of that, and requires implementing dispatching for all NumPy functions again. We had a very long discussion about trying to make falling back to NumPy's own implementation work, but unfortunately couldn't make it work. See this section of NEP-18, especially bullet 3.

Inside Dask, we chose to issue a warning and fall-back to casting to NumPy arrays unwrapped functions were encountered. The alternative would be to raise an error, but I don't know how much more useful that would be here. To avoid warnings or errors, dask will need to reimplement this function. This is discussed in the relevant PR (dask/dask#4822) but the issue itself is out of date now.

TomAugspurger · 2019-08-12T16:21:33Z

FYI dask/dask#4822 was just merged, so the next version of dask (like 2.2.1) will work nicely with np.moveaxis, regardless of the numpy version.

FWIW, as a library maintainer, I'm happy to have all of NumPy's dispatching unified under __array_function__, rather than scattered around various hasattr(object, 'moveaxis')-type checks. In theory, this breakage could have been avoided if that Dask PR had been merged before NumPy 1.17 came out. It's unfortunate that this one slipped through the cracks, but I hope we'll get through the __array_function__ growing pains relatively quickly.

rabernat · 2019-08-12T16:53:01Z

meta comment: this discussion would have been impossible (or at least a lot slower) on discourse (see #677).

shanicetbailey · 2019-08-16T13:06:20Z

@TomAugspurger, your workaround fix for creating a cluster after importing os doesn't work when I include the env={"NUMPY_EXPERIMENTAL_ARRAY_FUNCTION": "0"}) part. It seems to be blocking the workers from starting. But I am able to run the ds_ll cell if I omit that specification and run the Kubernetes cluster cell!

TomAugspurger · 2019-08-16T13:25:20Z

Hmm that's strange. I just tried it out on ocean.pangeo.io with cluster = KubeCluster(n_workers=10, env={"NUMPY_EXPERIMENTAL_ARRAY_FUNCTION": "0"}), and my workers came up fine.

Do you specify anything else when you create the KubeCluster?

FYI, this is fixed on Dask master now. We're doing a 2.2.1 release today hopefully, so the workaround won't be necessary once the cluster is updated to use it.

shanicetbailey · 2019-08-16T13:32:34Z

No, I just copied and pasted the same code. Will try again now.

That's great to hear!

shanicetbailey · 2019-08-16T21:01:26Z

I tried it again (with the environment specification) and waited around 10 min for workers to load and the dask dashboard was still blank - but when I omitted that part, it took about 2 minutes for workers to load.

Looking forward to that dask update!

stale · 2019-10-15T21:09:23Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2019-10-22T21:45:21Z

This issue has been automatically closed because it had not seen recent activity. The issue can always be reopened at a later date.

jhamman transferred this issue from pangeo-data/pangeo-cloud-federation Aug 11, 2019

rabernat mentioned this issue Sep 18, 2019

numpy / dask verison compatibility bug xgcm/xhistogram#12

Closed

shanicetbailey mentioned this issue Sep 18, 2019

Updating dask pangeo-data/pangeo-stacks#77

Closed

rabernat mentioned this issue Oct 1, 2019

Quick fix for the broken build xgcm/xgcm#155

Merged

4 tasks

stale bot added the stale label Oct 15, 2019

stale bot closed this as completed Oct 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Warning Message: FutureWarning: The `numpy.moveaxis` function is not implemented by Dask array #690

Warning Message: FutureWarning: The `numpy.moveaxis` function is not implemented by Dask array #690

shanicetbailey commented Aug 9, 2019

TomAugspurger commented Aug 9, 2019 •

edited

Loading

shanicetbailey commented Aug 9, 2019

TomAugspurger commented Aug 9, 2019

shanicetbailey commented Aug 9, 2019

TomAugspurger commented Aug 9, 2019

shanicetbailey commented Aug 9, 2019

TomAugspurger commented Aug 9, 2019 •

edited

Loading

TomAugspurger commented Aug 9, 2019

rabernat commented Aug 10, 2019 •

edited

Loading

shanicetbailey commented Aug 10, 2019

shoyer commented Aug 10, 2019

TomAugspurger commented Aug 12, 2019

rabernat commented Aug 12, 2019

shanicetbailey commented Aug 16, 2019

TomAugspurger commented Aug 16, 2019

shanicetbailey commented Aug 16, 2019

shanicetbailey commented Aug 16, 2019

stale bot commented Oct 15, 2019

stale bot commented Oct 22, 2019

Warning Message: FutureWarning: The numpy.moveaxis function is not implemented by Dask array #690

Warning Message: FutureWarning: The numpy.moveaxis function is not implemented by Dask array #690

Comments

shanicetbailey commented Aug 9, 2019

Reproducible Code:

TomAugspurger commented Aug 9, 2019 • edited Loading

shanicetbailey commented Aug 9, 2019

TomAugspurger commented Aug 9, 2019

shanicetbailey commented Aug 9, 2019

TomAugspurger commented Aug 9, 2019

shanicetbailey commented Aug 9, 2019

TomAugspurger commented Aug 9, 2019 • edited Loading

TomAugspurger commented Aug 9, 2019

rabernat commented Aug 10, 2019 • edited Loading

shanicetbailey commented Aug 10, 2019

shoyer commented Aug 10, 2019

TomAugspurger commented Aug 12, 2019

rabernat commented Aug 12, 2019

shanicetbailey commented Aug 16, 2019

TomAugspurger commented Aug 16, 2019

shanicetbailey commented Aug 16, 2019

shanicetbailey commented Aug 16, 2019

stale bot commented Oct 15, 2019

stale bot commented Oct 22, 2019

Warning Message: FutureWarning: The `numpy.moveaxis` function is not implemented by Dask array #690

Warning Message: FutureWarning: The `numpy.moveaxis` function is not implemented by Dask array #690

TomAugspurger commented Aug 9, 2019 •

edited

Loading

TomAugspurger commented Aug 9, 2019 •

edited

Loading

rabernat commented Aug 10, 2019 •

edited

Loading