Implement integrate #2653

fujiisoup · 2019-01-05T11:22:10Z

Closes Add trapz to DataArray for mathematical integration #1288
Tests added
Fully documented, including whats-new.rst for all changes and api.rst for new API

I would like to add integrate, which is essentially an xarray-version of np.trapz.
I know there was variety of discussions in #1288, but I think it would be nice to limit us within that numpy provides by np.trapz,
i.e.,

only for trapz not rectangle or simps
do not care np.nan
do not support bounds
Most of them (except for 1) can be solved by combining several existing methods.

doc/whats-new.rst

xarray/core/dataarray.py

xarray/core/duck_array_ops.py

xarray/core/dataset.py

shoyer · 2019-01-06T09:17:56Z

xarray/core/dataset.py

+                coord_var, datetime_unit=datetime_unit)
+
+        variables = OrderedDict()
+        coord_names = []


coords_names should always be a set, since you pass it into _replace_vars_and_dims. Actually, this is a great example of a bug mypy would have caught! (see #2655)

shoyer · 2019-01-06T09:18:18Z

xarray/core/dataset.py

+                    variables[k] = v
+                    coord_names.append(k)
+            else:
+                if (k in self.data_vars and dim in v.dims):


nit: no need for extra parenthesis here

xarray/core/dataset.py

# Conflicts: # doc/computation.rst

shoyer · 2019-01-06T21:34:13Z

xarray/core/dataset.py

@@ -3867,6 +3867,79 @@ def differentiate(self, coord, edge_order=1, datetime_unit=None):
                variables[k] = v
        return self._replace_vars_and_dims(variables)

+    def integrate(self, dim, datetime_unit=None):


Should we make the default dim=None do integration over all dimensions?

shoyer · 2019-01-06T21:35:17Z

xarray/core/dataset.py

+        from .variable import Variable
+
+        if dim not in self.variables and dim not in self.dims:
+            raise ValueError('Coordinate {} does not exist.'.format(dim))


I think splitting these checks into two would be a little clearer:

"cannot integrate over dimension {} because it does not exist" (for dim not in self.dims)

"cannot integrate over dimension {} because there is no corresponding coordinate" (for dim not in self.variables)

shoyer · 2019-01-06T21:38:57Z

xarray/core/dataset.py

+        coord_var = self[dim].variable
+        if coord_var.ndim != 1:
+            raise ValueError('Coordinate {} must be 1 dimensional but is {}'
+                             ' dimensional'.format(dim, coord_var.ndim))


Likewise, maybe:

"cannot integrate over dimension {} because the corresponding coordinate is not a 1d array along that dimension: it has dimensions {}"

shoyer · 2019-01-06T21:43:30Z

xarray/core/dataset.py

+            raise ValueError('Coordinate {} does not exist.'.format(dim))
+
+        coord_var = self[dim].variable
+        if coord_var.ndim != 1:


I think this is currently not possible due to xarray's data model, but it's a good idea to add this anyways given that we want to change this soon (e.g., see #2405).

I would recommend adjusting this to if coord_var.dims != (dim,), which is a little stricter.

I first thought that it would be nice if we could integrate even along non-dimensional (1d) coordinate (as interpolate_na, differential do), but it also sounds something too much.
How do you think?

Yes, that seems reasonable to support

Then, coord is a better argument rather than dim?
Or we use dim for argument but support integration along non-dimensional coordinate with a slight avoidance of correctness, as it is more consistent with other reduction methods?

I don't have a strong opinion here.

Well, differentiate uses coord, so maybe integrate should too?

OK, +1 for consistency with differentiate.

shoyer · 2019-01-06T21:44:46Z

xarray/core/dataset.py

+                else:
+                    variables[k] = v
+        return self._replace_vars_and_dims(variables,
+                                           coord_names=set(coord_names))


Maybe define this as a set instead, and use coord_names.add(k) instead of append?

shoyer · 2019-01-10T17:36:43Z

xarray/core/dataset.py

@@ -3867,7 +3867,7 @@ def differentiate(self, coord, edge_order=1, datetime_unit=None):
                variables[k] = v
        return self._replace_vars_and_dims(variables)

-    def integrate(self, dim, datetime_unit=None):
+    def integrate(self, coord, datetime_unit=None):


Should coord=None have the default behavior of integrating over all dimensions? Or would that be confusing in some way?

I personally think it would be a little confusing because the result may change depending on which coordinate is used for integrate, e.g. if the DataArray has a dimension without coordinate but another one-dimensional coordinate, it is not very clear which should be used.

It would be a little convenient for 1d arrays, but aswe disallow default argument for diff, I like to disallow default argument here too.

dcherian · 2019-01-22T20:17:31Z

Looks like this is ready to merge?

shoyer · 2019-01-31T17:31:31Z

Oops, I forgot about this.

@fujiisoup thanks for this awesome contribution! We should issue 0.12.0 soon to get this out in the world

* master: remove xfail from test_cross_engine_read_write_netcdf4 (pydata#2741) Reenable cross engine read write netCDF test (pydata#2739) remove bottleneck dev build from travis, this test env was failing to build (pydata#2736) CFTimeIndex Resampling (pydata#2593) add tests for handling of empty pandas objects in constructors (pydata#2735) dropna() for a Series indexed by a CFTimeIndex (pydata#2734) deprecate compat & encoding (pydata#2703) Implement integrate (pydata#2653) ENH: resample methods with tolerance (pydata#2716) improve error message for invalid encoding (pydata#2730) silence a couple of warnings (pydata#2727)

fujiisoup added 2 commits January 5, 2019 12:07

added integrate.

b0e843f

Docs

968b6d0

max-sixty reviewed Jan 5, 2019

View reviewed changes

doc/whats-new.rst Show resolved Hide resolved

max-sixty reviewed Jan 5, 2019

View reviewed changes

xarray/core/dataarray.py Outdated Show resolved Hide resolved

shoyer reviewed Jan 5, 2019

View reviewed changes

xarray/core/duck_array_ops.py Outdated Show resolved Hide resolved

shoyer reviewed Jan 5, 2019

View reviewed changes

xarray/core/dataset.py Outdated Show resolved Hide resolved

xarray/core/dataset.py Show resolved Hide resolved

fujiisoup added 2 commits January 6, 2019 09:24

Update via comment

721ff5a

Merge branch 'master' into trapz

3decc22

shoyer reviewed Jan 6, 2019

View reviewed changes

fujiisoup added 3 commits January 6, 2019 11:51

Update via comments

88a8385

Merge branch 'master' into trapz

9ed470d

# Conflicts: # doc/computation.rst

integrate can accept multiple dimensions.

42bab02

shoyer reviewed Jan 6, 2019

View reviewed changes

max-sixty changed the title ~~Imprement integrate~~ Implement integrate Jan 8, 2019

fujiisoup added 2 commits January 8, 2019 20:57

using set instead of list

ecf9318

dim -> coord

0561113

shoyer reviewed Jan 10, 2019

View reviewed changes

shoyer merged commit 4923039 into pydata:master Jan 31, 2019

TomNicholas mentioned this pull request Apr 21, 2020

DataArray.integrate has a 'dim' arg, but Dataset.integrate has a 'coord' arg #3992

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement integrate #2653

Implement integrate #2653

fujiisoup commented Jan 5, 2019

shoyer Jan 6, 2019

shoyer Jan 6, 2019

shoyer Jan 6, 2019

shoyer Jan 6, 2019

shoyer Jan 6, 2019

shoyer Jan 6, 2019

fujiisoup Jan 8, 2019

shoyer Jan 8, 2019

fujiisoup Jan 8, 2019

shoyer Jan 8, 2019

dcherian Jan 8, 2019

shoyer Jan 8, 2019

shoyer Jan 6, 2019

shoyer Jan 10, 2019

fujiisoup Jan 14, 2019

dcherian commented Jan 22, 2019

shoyer commented Jan 31, 2019

Implement integrate #2653

Implement integrate #2653

Conversation

fujiisoup commented Jan 5, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dcherian commented Jan 22, 2019

shoyer commented Jan 31, 2019