Skip to content
This repository has been archived by the owner on Apr 30, 2021. It is now read-only.

Getting scalar values of the weighted average when input is 2D and weights is a 2D array #28

Closed
andersy005 opened this issue Jan 14, 2019 · 8 comments

Comments

@andersy005
Copy link
Contributor

#!/usr/bin/env python

import xarray as xr
import esmlab

# Read in a CESM climatology file
in_files = []
for n in range(0,12):
  in_files.append("/glade/work/mclong/woa2013v2/POP_gx1v7/woa13_all_n{:02d}_gx1v7.nc".format(n))
ds = xr.open_mfdataset(in_files, decode_times=False)

TAREA = ds.TAREA.isel(time=0)
field = ds.NO3.isel(z_t=0).mean('time')
print(esmlab.statistics.weighted_mean(field, TAREA))
@mnlevy1981
Copy link
Contributor

A better version of the script:

#!/usr/bin/env python

import xarray as xr
import numpy as np
import esmlab

# Read in WOA observational data (in 12 files)
in_files = []
for n in range(0,12):
  in_files.append("/glade/work/mclong/woa2013v2/POP_gx1v7/woa13_all_n{:02d}_gx1v7.nc".format(n))
ds = xr.open_mfdataset(in_files, decode_times=False)

# Reduce dataset
# 1. Want NO3 averaged over all files
field = ds.NO3.isel(z_t=0).mean('time')
# 2. TAREA is identical across files, so just use the first time dimension of it
TAREA = ds.TAREA.isel(time=0)

# Statistics
print("field min: {:.2f}".format(np.nanmin(field)))
print("field max: {:.2f}".format(np.nanmax(field)))
print("field weighted mean:\n{}".format(esmlab.statistics.weighted_mean(field, TAREA)))

Output:

/glade/work/mlevy/miniconda3/envs/NPL-conda/lib/python2.7/site-packages/dask/array/numpy_compat.py:28: RuntimeWarning: invalid value encountered in divide
  x = np.divide(x1, x2, out)
field min: 0.00
field max: 33.95
field weighted mean:
<xarray.DataArray ()>
dask.array<shape=(), dtype=float64, chunksize=()>
Coordinates:
    z_t      float64 500.0
    time     float64 6.0

(I don't think the numpy_compat.py warning affects anything... numpy doesn't complain about it with nanmin and nanmax, and I can still plot field in other scripts)

@andersy005
Copy link
Contributor Author

@mnlevy1981, to make sure I am getting this, what should be the expected output's dimensions (shape) of esmlab.statistics.weighted_mean(field, TAREA)?

With

# 1. Want NO3 averaged over all files
field = ds.NO3.isel(z_t=0).mean('time')
# 2. TAREA is identical across files, so just use the first time dimension of it
TAREA = ds.TAREA.isel(time=0)

field, and TAREA are as follows:

In [20]: field.shape
Out[20]: (384, 320)

In [21]: TAREA.shape
Out[21]: (384, 320)

@mnlevy1981
Copy link
Contributor

It should be a scalar quantity. I just realized I could come up with an even simpler example using 1D arrays rather than reading data from disk, but I need to run... I'll try to find time tonight to post more. But the gist is that if

TAREA = [1, 1, 2]
field = [1, 3, 8]

Then I expect the weighted mean to be 5: the weighted sum of field is 20, and the weights sum to 4 so the weighted mean is 20/4 = 5.

@andersy005
Copy link
Contributor Author

andersy005 commented Jan 14, 2019

I think I figured it out:

  1. Due to xr.open_mfdataset(), xarray loads data into dask arrays instead of in memory NumPy arrays. The implication of this is that any subsequent operation is likely to return an xarray object with dask arrays.
  2. There are two solutions to this:
    • Force xarray to load the entire data set in memory with
ds = xr.open_mfdataset(in_files, decode_times=False).load()

Or

  • Tell Dask to compute and return and an in-memory result
w_mean = esmlab.statistics.weighted_mean(field, TAREA).data.compute()
In [47]: field
Out[47]:
<xarray.DataArray 'NO3' (nlat: 384, nlon: 320)>
array([[      nan,       nan,       nan, ...,       nan,       nan,       nan],
       [      nan,       nan,       nan, ...,       nan,       nan,       nan],
       [22.465448, 22.383131, 22.292166, ...,       nan,       nan,       nan],
       ...,
       [      nan,       nan,       nan, ...,       nan,       nan,       nan],
       [      nan,       nan,       nan, ...,       nan,       nan,       nan],
       [      nan,       nan,       nan, ...,       nan,       nan,       nan]],
      dtype=float32)
Coordinates:
    TLAT     (nlat, nlon) float64 -79.22 -79.22 -79.22 ... 72.2 72.19 72.19
    TLONG    (nlat, nlon) float64 -39.44 -38.31 -37.19 ... -41.08 -40.65 -40.22
    z_t      float64 500.0
Dimensions without coordinates: nlat, nlon

In [48]: TAREA
Out[48]:
<xarray.DataArray 'TAREA' (nlat: 384, nlon: 320)>
dask.array<shape=(384, 320), dtype=float64, chunksize=(384, 320)>
Coordinates:
    TLAT     (nlat, nlon) float64 -79.22 -79.22 -79.22 ... 72.2 72.19 72.19
    TLONG    (nlat, nlon) float64 -39.44 -38.31 -37.19 ... -41.08 -40.65 -40.22
    time     float64 6.0
Dimensions without coordinates: nlat, nlon
Attributes:
    long_name:  area of T cells
    units:      cm^2

In [49]: w_mean = esmlab.statistics.weighted_mean(field, TAREA)

In [51]: w_mean
Out[51]:
<xarray.DataArray ()>
dask.array<shape=(), dtype=float64, chunksize=()>
Coordinates:
    z_t      float64 500.0
    time     float64 6.0

In [52]: w_mean.data
Out[52]: dask.array<truediv, shape=(), dtype=float64, chunksize=()>

In [53]: w_mean.data.compute()
Out[53]: 5.146317884602825

In [54]: np.nanmean(field)
Out[54]: 5.7132335

@andersy005
Copy link
Contributor Author

@mnlevy1981,
Oooh one more thing, if you want your result to be an xarray object,

do:

w_mean = esmlab.statistics.weighted_mean(field, TAREA).load()

instead of:

w_mean = esmlab.statistics.weighted_mean(field, TAREA).data.compute()

@mnlevy1981
Copy link
Contributor

Thanks! The line

print("field weighted mean: {:.2f}".format(esmlab.statistics.weighted_mean(field, TAREA).load().values))

Works as expected. Is accessing values directly the preferred method of getting a scalar out of an xarray object? It looks like

print("field weighted mean: {:.2f}".format(esmlab.statistics.weighted_mean(field, TAREA).load().item()))

is another option.

@andersy005
Copy link
Contributor Author

andersy005 commented Jan 15, 2019

I think this is the appropriate way:

print("field weighted mean: {:.2f}".format(esmlab.statistics.weighted_mean(field, TAREA).load().values))

xref: http://xarray.pydata.org/en/stable/generated/xarray.DataArray.values.html#xarray.DataArray.values

@andersy005
Copy link
Contributor Author

andersy005 commented Jan 25, 2019

@mnlevy1981, should I close this? Or free feel to close it if you are satisfied with the conclusion from our last week's meeting.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants