numpy 1.8 compatibility for NAN functions #150

keflavich · 2014-10-05T18:33:44Z

Fixes the issue noted in #149 thanks to @astrofrog's solution to the SO question.

astrofrog · 2014-10-05T18:52:18Z

So did other functions change too? As in, do we need to include all these here, or can we import some from any version of numpy?

keflavich · 2014-10-05T18:54:48Z

Good question. I had just assumed they all changed. nansum is the only one that affected the tests. I took the lazy route of not checking at all.... I should check.

astrofrog · 2014-10-05T19:06:34Z

Maybe you could just do a diff with the 1.9 version?

keflavich · 2014-10-06T18:46:32Z

As far as I can tell, only nansum, nanmin, nanmax changed:

diff --git a/numpy/lib/nanfunctions.py b/numpy/lib/nanfunctions.py
index c3aeaf6..f5ac35e 100644
--- a/numpy/lib/nanfunctions.py
+++ b/numpy/lib/nanfunctions.py
@@ -18,11 +18,11 @@ from __future__ import division, absolute_import, print_function

 import warnings
 import numpy as np
-
+from numpy.lib.function_base import _ureduce as _ureduce

 __all__ = [
     'nansum', 'nanmax', 'nanmin', 'nanargmax', 'nanargmin', 'nanmean',
-    'nanvar', 'nanstd'
+    'nanmedian', 'nanpercentile', 'nanvar', 'nanstd'
     ]


@@ -228,7 +228,7 @@ def nanmin(a, axis=None, out=None, keepdims=False):
         # Check for all-NaN axis
         mask = np.all(mask, axis=axis, keepdims=keepdims)
         if np.any(mask):
-            res = _copyto(res, mask, np.nan)
+            res = _copyto(res, np.nan, mask)
             warnings.warn("All-NaN axis encountered", RuntimeWarning)
     return res

@@ -327,7 +327,7 @@ def nanmax(a, axis=None, out=None, keepdims=False):
         # Check for all-NaN axis
         mask = np.all(mask, axis=axis, keepdims=keepdims)
         if np.any(mask):
-            res = _copyto(res, mask, np.nan)
+            res = _copyto(res, np.nan, mask)
             warnings.warn("All-NaN axis encountered", RuntimeWarning)
     return res

@@ -426,8 +426,8 @@ def nansum(a, axis=None, dtype=None, out=None, keepdims=0):
     Return the sum of array elements over a given axis treating Not a
     Numbers (NaNs) as zero.

-    FutureWarning: In Numpy versions <= 1.8 Nan is returned for slices that
-    are all-NaN or empty. In later versions zero will be returned.
+    In Numpy versions <= 1.8 Nan is returned for slices that are all-NaN or
+    empty. In later versions zero is returned.

     Parameters
     ----------
@@ -503,16 +503,7 @@ def nansum(a, axis=None, dtype=None, out=None, keepdims=0):

     """
     a, mask = _replace_nan(a, 0)
-
-    if mask is None:
-        return np.sum(a, axis=axis, dtype=dtype, out=out, keepdims=keepdims)
-    mask = np.all(mask, axis=axis, keepdims=keepdims)
-    tot = np.sum(a, axis=axis, dtype=dtype, out=out, keepdims=keepdims)
-    if np.any(mask):
-        tot = _copyto(tot, np.nan, mask)
-        warnings.warn("In Numpy 1.9 the sum along empty slices will be zero.",
-                      FutureWarning)
-    return tot
+    return np.sum(a, axis=axis, dtype=dtype, out=out, keepdims=keepdims)


 def nanmean(a, axis=None, dtype=None, out=None, keepdims=False):
@@ -610,6 +601,335 @@ def nanmean(a, axis=None, dtype=None, out=None, keepdims=False):
     return avg


+def _nanmedian1d(arr1d, overwrite_input=False):
+    """
+    Private function for rank 1 arrays. Compute the median ignoring NaNs.
+    See nanmedian for parameter usage
+    """
+    c = np.isnan(arr1d)
+    s = np.where(c)[0]
+    if s.size == arr1d.size:
+        warnings.warn("All-NaN slice encountered", RuntimeWarning)
+        return np.nan
+    elif s.size == 0:
+        return np.median(arr1d, overwrite_input=overwrite_input)
+    else:
+        if overwrite_input:
+            x = arr1d
+        else:
+            x = arr1d.copy()
+        # select non-nans at end of array
+        enonan = arr1d[-s.size:][~c[-s.size:]]
+        # fill nans in beginning of array with non-nans of end
+        x[s[:enonan.size]] = enonan
+        # slice nans away
+        return np.median(x[:-s.size], overwrite_input=True)
+
+
+def _nanmedian(a, axis=None, out=None, overwrite_input=False):
+    """
+    Private function that doesn't support extended axis or keepdims.
+    These methods are extended to this function using _ureduce
+    See nanmedian for parameter usage
+
+    """
+    if axis is None or a.ndim == 1:
+        part = a.ravel()
+        if out is None:
+            return _nanmedian1d(part, overwrite_input)
+        else:
+            out[...] = _nanmedian1d(part, overwrite_input)
+            return out
+    else:
+        # for small medians use sort + indexing which is still faster than
+        # apply_along_axis
+        if a.shape[axis] < 400:
+            return _nanmedian_small(a, axis, out, overwrite_input)
+        result = np.apply_along_axis(_nanmedian1d, axis, a, overwrite_input)
+        if out is not None:
+            out[...] = result
+        return result
+
+def _nanmedian_small(a, axis=None, out=None, overwrite_input=False):
+    """
+    sort + indexing median, faster for small medians along multiple dimensions
+    due to the high overhead of apply_along_axis
+    see nanmedian for parameter usage
+    """
+    a = np.ma.masked_array(a, np.isnan(a))
+    m = np.ma.median(a, axis=axis, overwrite_input=overwrite_input)
+    for i in range(np.count_nonzero(m.mask.ravel())):
+        warnings.warn("All-NaN slice encountered", RuntimeWarning)
+    if out is not None:
+        out[...] = m.filled(np.nan)
+        return out
+    return m.filled(np.nan)
+
+def nanmedian(a, axis=None, out=None, overwrite_input=False, keepdims=False):
+    """
+    Compute the median along the specified axis, while ignoring NaNs.
+
+    Returns the median of the array elements.
+
+    .. versionadded:: 1.9.0
+
+    Parameters
+    ----------
+    a : array_like
+        Input array or object that can be converted to an array.
+    axis : int, optional
+        Axis along which the medians are computed. The default (axis=None)
+        is to compute the median along a flattened version of the array.
+        A sequence of axes is supported since version 1.9.0.
+    out : ndarray, optional
+        Alternative output array in which to place the result. It must have
+        the same shape and buffer length as the expected output, but the
+        type (of the output) will be cast if necessary.
+    overwrite_input : bool, optional
+       If True, then allow use of memory of input array (a) for
+       calculations. The input array will be modified by the call to
+       median. This will save memory when you do not need to preserve
+       the contents of the input array. Treat the input as undefined,
+       but it will probably be fully or partially sorted. Default is
+       False. Note that, if `overwrite_input` is True and the input
+       is not already an ndarray, an error will be raised.
+    keepdims : bool, optional
+        If this is set to True, the axes which are reduced are left
+        in the result as dimensions with size one. With this option,
+        the result will broadcast correctly against the original `arr`.
+
+
+
+    Returns
+    -------
+    median : ndarray
+        A new array holding the result. If the input contains integers, or
+        floats of smaller precision than 64, then the output data-type is
+        float64.  Otherwise, the output data-type is the same as that of the
+        input.
+
+    See Also
+    --------
+    mean, median, percentile
+
+    Notes
+    -----
+    Given a vector V of length N, the median of V is the middle value of
+    a sorted copy of V, ``V_sorted`` - i.e., ``V_sorted[(N-1)/2]``, when N is
+    odd.  When N is even, it is the average of the two middle values of
+    ``V_sorted``.
+
+    Examples
+    --------
+    >>> a = np.array([[10.0, 7, 4], [3, 2, 1]])
+    >>> a[0, 1] = np.nan
+    >>> a
+    array([[ 10.,  nan,   4.],
+       [  3.,   2.,   1.]])
+    >>> np.median(a)
+    nan
+    >>> np.nanmedian(a)
+    3.0
+    >>> np.nanmedian(a, axis=0)
+    array([ 6.5,  2.,  2.5])
+    >>> np.median(a, axis=1)
+    array([ 7.,  2.])
+    >>> b = a.copy()
+    >>> np.nanmedian(b, axis=1, overwrite_input=True)
+    array([ 7.,  2.])
+    >>> assert not np.all(a==b)
+    >>> b = a.copy()
+    >>> np.nanmedian(b, axis=None, overwrite_input=True)
+    3.0
+    >>> assert not np.all(a==b)
+
+    """
+    a = np.asanyarray(a)
+    # apply_along_axis in _nanmedian doesn't handle empty arrays well,
+    # so deal them upfront
+    if a.size == 0:
+        return np.nanmean(a, axis, out=out, keepdims=keepdims)
+
+    r, k = _ureduce(a, func=_nanmedian, axis=axis, out=out,
+                    overwrite_input=overwrite_input)
+    if keepdims:
+        return r.reshape(k)
+    else:
+        return r
+
+
+def nanpercentile(a, q, axis=None, out=None, overwrite_input=False,
+                  interpolation='linear', keepdims=False):
+    """
+    Compute the qth percentile of the data along the specified axis, while
+    ignoring nan values.
+
+    Returns the qth percentile of the array elements.
+
+    Parameters
+    ----------
+    a : array_like
+        Input array or object that can be converted to an array.
+    q : float in range of [0,100] (or sequence of floats)
+        Percentile to compute which must be between 0 and 100 inclusive.
+    axis : int or sequence of int, optional
+        Axis along which the percentiles are computed. The default (None)
+        is to compute the percentiles along a flattened version of the array.
+        A sequence of axes is supported since version 1.9.0.
+    out : ndarray, optional
+        Alternative output array in which to place the result. It must
+        have the same shape and buffer length as the expected output,
+        but the type (of the output) will be cast if necessary.
+    overwrite_input : bool, optional
+        If True, then allow use of memory of input array `a` for
+        calculations. The input array will be modified by the call to
+        percentile. This will save memory when you do not need to preserve
+        the contents of the input array. In this case you should not make
+        any assumptions about the content of the passed in array `a` after
+        this function completes -- treat it as undefined. Default is False.
+        Note that, if the `a` input is not already an array this parameter
+        will have no effect, `a` will be converted to an array internally
+        regardless of the value of this parameter.
+    interpolation : {'linear', 'lower', 'higher', 'midpoint', 'nearest'}
+        This optional parameter specifies the interpolation method to use,
+        when the desired quantile lies between two data points `i` and `j`:
+            * linear: `i + (j - i) * fraction`, where `fraction` is the
+              fractional part of the index surrounded by `i` and `j`.
+            * lower: `i`.
+            * higher: `j`.
+            * nearest: `i` or `j` whichever is nearest.
+            * midpoint: (`i` + `j`) / 2.
+
+    keepdims : bool, optional
+        If this is set to True, the axes which are reduced are left
+        in the result as dimensions with size one. With this option,
+        the result will broadcast correctly against the original `arr`.
+
+
+    Returns
+    -------
+    nanpercentile : scalar or ndarray
+        If a single percentile `q` is given and axis=None a scalar is
+        returned.  If multiple percentiles `q` are given an array holding
+        the result is returned. The results are listed in the first axis.
+        (If `out` is specified, in which case that array is returned
+        instead).  If the input contains integers, or floats of smaller
+        precision than 64, then the output data-type is float64. Otherwise,
+        the output data-type is the same as that of the input.
+
+    See Also
+    --------
+    nanmean, nanmedian, percentile, median, mean
+
+    Notes
+    -----
+    Given a vector V of length N, the q-th percentile of V is the q-th ranked
+    value in a sorted copy of V.  The values and distances of the two
+    nearest neighbors as well as the `interpolation` parameter will
+    determine the percentile if the normalized ranking does not match q
+    exactly. This function is the same as the median if ``q=50``, the same
+    as the minimum if ``q=0``and the same as the maximum if ``q=100``.
+
+    Examples
+    --------
+    >>> a = np.array([[10., 7., 4.], [3., 2., 1.]])
+    >>> a[0][1] = np.nan
+    >>> a
+    array([[ 10.,  nan,   4.],
+       [  3.,   2.,   1.]])
+    >>> np.percentile(a, 50)
+    nan
+    >>> np.nanpercentile(a, 50)
+    3.5
+    >>> np.nanpercentile(a, 50, axis=0)
+    array([[ 6.5,  4.5,  2.5]])
+    >>> np.nanpercentile(a, 50, axis=1)
+    array([[ 7.],
+           [ 2.]])
+    >>> m = np.nanpercentile(a, 50, axis=0)
+    >>> out = np.zeros_like(m)
+    >>> np.nanpercentile(a, 50, axis=0, out=m)
+    array([[ 6.5,  4.5,  2.5]])
+    >>> m
+    array([[ 6.5,  4.5,  2.5]])
+    >>> b = a.copy()
+    >>> np.nanpercentile(b, 50, axis=1, overwrite_input=True)
+    array([[ 7.],
+           [ 2.]])
+    >>> assert not np.all(a==b)
+    >>> b = a.copy()
+    >>> np.nanpercentile(b, 50, axis=None, overwrite_input=True)
+    array([ 3.5])
+
+    """
+
+    a = np.asanyarray(a)
+    q = np.asanyarray(q)
+    # apply_along_axis in _nanpercentile doesn't handle empty arrays well,
+    # so deal them upfront
+    if a.size == 0:
+        return np.nanmean(a, axis, out=out, keepdims=keepdims)
+
+    r, k = _ureduce(a, func=_nanpercentile, q=q, axis=axis, out=out,
+                    overwrite_input=overwrite_input,
+                    interpolation=interpolation)
+    if keepdims:
+        if q.ndim == 0:
+            return r.reshape(k)
+        else:
+            return r.reshape([len(q)] + k)
+    else:
+        return r
+
+
+def _nanpercentile(a, q, axis=None, out=None, overwrite_input=False,
+                   interpolation='linear', keepdims=False):
+    """
+    Private function that doesn't support extended axis or keepdims.
+    These methods are extended to this function using _ureduce
+    See nanpercentile for parameter usage
+
+    """
+    if axis is None:
+        part = a.ravel()
+        result = _nanpercentile1d(part, q, overwrite_input, interpolation)
+    else:
+        result = np.apply_along_axis(_nanpercentile1d, axis, a, q,
+                                     overwrite_input, interpolation)
+
+    if out is not None:
+        out[...] = result
+    return result
+
+
+def _nanpercentile1d(arr1d, q, overwrite_input=False, interpolation='linear'):
+    """
+    Private function for rank 1 arrays. Compute percentile ignoring NaNs.
+    See nanpercentile for parameter usage
+
+    """
+    c = np.isnan(arr1d)
+    s = np.where(c)[0]
+    if s.size == arr1d.size:
+        warnings.warn("All-NaN slice encountered", RuntimeWarning)
+        return np.nan
+    elif s.size == 0:
+        return np.percentile(arr1d, q, overwrite_input=overwrite_input,
+                             interpolation=interpolation)
+    else:
+        if overwrite_input:
+            x = arr1d
+        else:
+            x = arr1d.copy()
+        # select non-nans at end of array
+        enonan = arr1d[-s.size:][~c[-s.size:]]
+        # fill nans in beginning of array with non-nans of end
+        x[s[:enonan.size]] = enonan
+        # slice nans away
+        return np.percentile(x[:-s.size], q, overwrite_input=True,
+                             interpolation=interpolation)
+
+
 def nanvar(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False):
     """
     Compute the variance along the specified axis, while ignoring NaNs.

keflavich · 2014-10-06T21:10:27Z

I guess numpy 1.7 had nanmean etc. in a different place? I'll have to hunt that down tomorrow.

keflavich · 2014-10-07T09:14:28Z

Argh. This approach is no good.

keflavich · 2014-10-07T18:16:18Z

@astrofrog @ChrisBeaumont I should probably squash this, but what do you think about this solution? It shouldn't impact performance except for numpy >= 1.9, and there only minimally

from 1.8-1.9 add numpy 1.8 nanfunctions to ensure np1.8 = np1.9 behavior udpate tests moments no longer uses np.nansum (np_compat.nansum instead) remove all but nansum, nanmin, nanmax import things that needed to be imported (oops) correct imports and add support functions for numpy 1.7. If this fails on 1.6, 3.2, I suggest we drop support for them completely copying over numpy functions was way too much and would have lead to a huge maintenance burden. Instead, just overload nansum and leave the rest untouched remove np_compat; we're just going to change the tests remove np_compat 1.9 nanish override; everything else gets to stay the same

ChrisBeaumont · 2014-10-07T20:11:04Z

spectral_cube/_moments.py

@@ -161,7 +162,11 @@ def moment_cubewise(cube, order, axis):
    data = cube._get_filled_data() * cube._pix_size()[axis]

    if order == 0:
-        return np.nansum(data, axis=axis)
+        result = np.nansum(data, axis=axis)


I would package this into a custom nansum utility function, for better separation of concerns.

numpy 1.8 compatibility for NAN functions

keflavich mentioned this pull request Oct 5, 2014

don't convert to np.float - can double data size! #149

Merged

keflavich force-pushed the np18nan branch from daa2f97 to 9b881d8 Compare October 7, 2014 18:19

ChrisBeaumont reviewed Oct 7, 2014
View reviewed changes

separate concerns as per @ChrisBeaumont's suggestion

ccaabd4

keflavich added a commit that referenced this pull request Oct 8, 2014

Merge pull request #150 from keflavich/np18nan

24730e1

numpy 1.8 compatibility for NAN functions

keflavich merged commit 24730e1 into radio-astro-tools:master Oct 8, 2014

keflavich deleted the np18nan branch October 8, 2014 07:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

numpy 1.8 compatibility for NAN functions #150

numpy 1.8 compatibility for NAN functions #150

keflavich commented Oct 5, 2014

astrofrog commented Oct 5, 2014

keflavich commented Oct 5, 2014

astrofrog commented Oct 5, 2014

keflavich commented Oct 6, 2014

keflavich commented Oct 6, 2014

keflavich commented Oct 7, 2014

keflavich commented Oct 7, 2014

ChrisBeaumont Oct 7, 2014

numpy 1.8 compatibility for NAN functions #150

numpy 1.8 compatibility for NAN functions #150

Conversation

keflavich commented Oct 5, 2014

astrofrog commented Oct 5, 2014

keflavich commented Oct 5, 2014

astrofrog commented Oct 5, 2014

keflavich commented Oct 6, 2014

keflavich commented Oct 6, 2014

keflavich commented Oct 7, 2014

keflavich commented Oct 7, 2014

ChrisBeaumont Oct 7, 2014

Choose a reason for hiding this comment