Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numpy raises warning in xarray.coding.times.cast_to_int_if_safe #7942

Closed
4 tasks done
mx-moth opened this issue Jun 26, 2023 · 2 comments · Fixed by #7827
Closed
4 tasks done

Numpy raises warning in xarray.coding.times.cast_to_int_if_safe #7942

mx-moth opened this issue Jun 26, 2023 · 2 comments · Fixed by #7827
Labels

Comments

@mx-moth
Copy link
Contributor

mx-moth commented Jun 26, 2023

What happened?

In recent versions of numpy, calling numpy.asarray(arr, dtype=numpy.int64) will raise a warning if the input array contains numpy.nan values. This line of code is used in xarray.coding.times.cast_to_int_if_safe(num):

def cast_to_int_if_safe(num) -> np.ndarray:
    int_num = np.asarray(num, dtype=np.int64)
    if (num == int_num).all():
        num = int_num
    return num

The function still returns the correct True/False values regardless of the warning.

What did you expect to happen?

No warning to be printed

Minimal Complete Verifiable Example

import numpy
import xarray

one_day = numpy.timedelta64(1, 'D')
nat = numpy.timedelta64('nat')

timedelta_values = (numpy.arange(5) * one_day).astype('timedelta64[ns]')
timedelta_values[2] = nat
timedelta_values[4] = nat

dataset = xarray.Dataset(data_vars={
    'timedeltas': xarray.DataArray(data=timedelta_values, dims=['x'])
})
dataset.to_netcdf('out.nc')

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

$ python3 safe_cast.py
/home/hea211/projects/emsarray/.conda/lib/python3.10/site-packages/xarray/coding/times.py:618: RuntimeWarning: invalid value encountered in cast
  int_num = np.asarray(num, dtype=np.int64)

$ ncdump out.nc
netcdf out {
dimensions:
        x = 5 ;
variables:
        double timedeltas(x) ;
                timedeltas:_FillValue = NaN ;
                timedeltas:units = "days" ;
data:

 timedeltas = 0, 1, _, 3, _ ;
}

Anything else we need to know?

I saw the numpy.can_cast function and tried to use that to solve the issue (see PR #7834), however this function did not do what I expected it to.

A search for other solutions to see whether an array of floating point values is representable as integers turned up Numpy: Check if float array contains whole numbers on Stack Overflow. There are a few solutions given in that question, although each has its drawbacks. The most complete solution appears to be is_integer_ufunc, which is a ufunc written in C. Unfortunately this is not installable via pip/conda, and is not included in numpy.

Environment

In [2]: import xarray as xr
...: xr.show_versions()
/home/hea211/projects/emsarray/.conda/lib/python3.10/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS

commit: None
python: 3.10.0 (default, Mar 3 2022, 09:58:08) [GCC 7.5.0]
python-bits: 64
OS: Linux
OS-release: 5.15.0-73-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_AU.UTF-8
LOCALE: ('en_AU', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.9.1

xarray: 2023.4.2
pandas: 2.0.1
numpy: 1.24.3
scipy: None
netCDF4: 1.6.3
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: 1.3.7
dask: 2023.4.1
distributed: 2023.4.1
matplotlib: 3.7.1
cartopy: 0.21.1
seaborn: None
numbagg: None
fsspec: 2023.5.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 65.6.3
pip: 22.3.1
conda: None
pytest: 7.3.1
mypy: 1.3.0
IPython: 8.12.0
sphinx: 4.3.2

@mx-moth mx-moth added bug needs triage Issue that has not been reviewed by xarray team member labels Jun 26, 2023
@mx-moth
Copy link
Contributor Author

mx-moth commented Jun 26, 2023

Most of the solutions in the linked Stack Overflow answer check whether the values have any fractional components. This might not be sufficient to catch all unrepresentable values. It is possible to represent whole numbers using floats that are larger than the maximum value of a int64.

A quick fix for the exact situation causing this warning would be to add a numpy.isfinite() check before casting:

if not numpy.all(numpy.isfinite(num)):
    return num

@kmuehlbauer kmuehlbauer removed the needs triage Issue that has not been reviewed by xarray team member label Sep 1, 2023
@kmuehlbauer
Copy link
Contributor

@mx-moth Something must have gone wrong with my comment earlier today.

Your example here with timedelta64 is similar to what #7827 aims to fix for datetime64.

I've checked and the proposed solution can be adapted for timedeltas too. This would then effectively prevent the cast to float in presence of NaT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants