Are there any differences in how xarray and netCDF4 treat masked data?

In [1]:
import netCDF4 as nc
import xarray as xr
import numpy as np

In [2]:
filename='/data/hdd/glorys/v4/v4/MONTHLY_1993/rotated/GLORYS2V4_ORCA025_199301_cardinal_velocity.nc'

# netCDF4

In [3]:
f = nc.Dataset(filename)
v = f.variables['east_vel']
v

<class 'netCDF4._netCDF4.Variable'>
float32 east_vel(time_counter, deptht, y, x)
    _FillValue: 9.96921e+36
    axis: 
    coordinates: 
    long_name: ocean current in eastward direction
    missing_value: 9.96921e+36
    online_operation: N/A
    savelog10: 0.0
    standard_name: ocean current eastward
    units: m s-1
    valid_max: 10.0
    valid_min: -10.0
    short_name: east_vel
unlimited dimensions: time_counter
current shape = (1, 75, 1021, 1442)
filling on

In [4]:
print(type(v[:]))
print(np.ma.max(v[:]))
print(np.ma.min(v[:]))

<class 'numpy.ma.core.MaskedArray'>
1.50334
-1.28856


In [5]:
print(type(v[:].data))
print(np.ma.max(v[:].data))
print(np.ma.min(v[:].data))

<class 'numpy.ndarray'>
9.96921e+36
-5.23684e+36


In [6]:
f.close()

* netCDF4 loads data as masked arrays
* data with values `_FillValue` or `missing_value` are masked
* data outside of `valid_min`, `valid_max` are masked

# xarray

In [7]:
ds = xr.open_dataset(filename)
w = ds.east_vel
w

<xarray.DataArray 'east_vel' (time_counter: 1, deptht: 75, y: 1021, x: 1442)>
[110421150 values with dtype=float32]
Coordinates:
  * deptht        (deptht) float32 0.50576 1.55586 2.66768 3.85628 5.14036 ...
  * time_counter  (time_counter) datetime64[ns] 1993-01-16T12:00:00
  * x             (x) int32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ...
  * y             (y) int32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ...
Attributes:
    axis:              
    long_name:         ocean current in eastward direction
    online_operation:  N/A
    savelog10:         0.0
    standard_name:     ocean current eastward
    units:             m s-1
    valid_max:         10.0
    valid_min:         -10.0
    short_name:        east_vel

Check on information xarray used when decoding CF conventions. See http://xarray.pydata.org/en/stable/io.html#reading-encoded-data

In [8]:
w.encoding

{'_FillValue': 9.96921e+36,
 'chunksizes': (1, 75, 1021, 1442),
 'complevel': 4,
 'contiguous': False,
 'coordinates': '',
 'dtype': dtype('float32'),
 'fletcher32': False,
 'missing_value': 9.96921e+36,
 'original_shape': (1, 75, 1021, 1442),
 'shuffle': True,
 'source': '/data/hdd/glorys/v4/v4/MONTHLY_1993/rotated/GLORYS2V4_ORCA025_199301_cardinal_velocity.nc',
 'zlib': True}

In [9]:
print(type(w.values))
print(np.ma.max(w.values))
print(np.ma.min(w.values))

<class 'numpy.ndarray'>
nan
nan


* In xarray DataArray.values is not a masked numpy array.
* As a result, `mp.ma.max()` and `np.ma.min()` do not work as expected.

Idea: cast as numpy masked array

In [10]:
wm = np.ma.masked_invalid(w.values)
print(type(wm))
print(np.ma.max(wm))
print(np.ma.min(wm))

<class 'numpy.ma.core.MaskedArray'>
7.0493e+36
-5.23684e+36


In [11]:
wm2 = np.ma.masked_outside(wm, w.valid_min, w.valid_max)
print(type(wm2))
print(np.ma.max(wm2))
print(np.ma.min(wm2))

<class 'numpy.ma.core.MaskedArray'>
1.50334
-1.28856


* data with values `_FillValue` and/or `missing_value` are masked 
* data with values outside of `valid_min` and `valid_max` are not masked 

**Solution**
1. Regenerate glorys rotated velocities. Set values outside of `valid_min` and `valid_max` to `_FillValue`.
2. We may need to cast data as numpy masked arrays before calling `np.ma.min()` etc in `stats.py` (Other ideas welcome...)
