New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Handle empty aggregation in datashader operations #1281

Merged

jlstevens merged 5 commits into master from datashader_empty_handling

Apr 12, 2017

Member

philippjfr commented Apr 12, 2017

As the title says, this handles empty aggregates gracefully in the datashader operations and ensures the both the interfaces and plotting backends display handle the resulting aggregates correctly.

philippjfr added 5 commits

April 12, 2017 13:01


          Bokeh plots handle NaNs in color ranges

dbe2b75


          XArray interface declares support for xr.DataArray

fbce9c8


          Graceful handling of empty datashader aggregate

f9cd31a


          Improved XArray DataArray handling

65c9997


          Handle and warn about datetime coordinates in datashader

d6c67eb

Member Author

philippjfr commented Apr 12, 2017

Ready to review.

Contributor

jlstevens commented Apr 12, 2017

Reviewing now. Planning on adding some unit tests?

Member Author

philippjfr commented Apr 12, 2017

Reviewing now. Planning on adding some unit tests?

I suppose, there are currently no datashader unit tests.

jlstevens reviewed

View reviewed changes

holoviews/operation/datashader.py

                       elif isinstance(obj, Element):
                           glyph = 'line' if isinstance(obj, Curve) else 'points'
                           paths.append(PandasInterface.as_dframe(obj))
+                      if dims is None or len(dims) != 2:
+                          return None, None, None, None

Contributor

jlstevens Apr 12, 2017

Purely syntactic preference, but I would wrap the return tuple in parentheses i.e (None, None, None, None)

jlstevens reviewed

View reviewed changes

holoviews/operation/datashader.py

+                      for d in (x, y):
+                          if df[d].dtype.kind == 'M':
+                              param.warning('Casting %s dimension data to integer '
+                                            'datashader cannot process datetime data ')

Contributor

jlstevens Apr 12, 2017

How interpretable would the rest be after such datetime to int casting? I suppose it might work out but maybe it doesn't really make sense?

Member Author

philippjfr Apr 12, 2017

You can add a datetime formatter to your axis and it will work. I think it's fine with the warning.

Contributor

jlstevens Apr 12, 2017

Sure.

jlstevens reviewed

View reviewed changes

holoviews/operation/datashader.py

@@ @@ -176,6 +194,15 @@ def _process(self, element, key=None): @@
                       category = agg_fn.column if isinstance(agg_fn, ds.count_cat) else None
                       x, y, data, glyph = self.get_agg_data(element, category)
+                      if x is None or y is None:
+                          x0, x1 = self.p.x_range or (-0.5, 0.5)
+                          y0, y1 = self.p.y_range or (-0.5, 0.5)

Contributor

jlstevens Apr 12, 2017

Seems like unit range is the default. Guess that is fine as long as this is sensible default behavior when the data is missing.

jlstevens reviewed

View reviewed changes

holoviews/operation/datashader.py

@@ @@ -307,7 +334,12 @@ def _process(self, element, key=None): @@
                       with warnings.catch_warnings():
                           warnings.filterwarnings('ignore', r'invalid value encountered in true_divide')
-                          img = tf.shade(array, **shade_opts)
+                          if np.isnan(array.data).all():

Contributor

jlstevens Apr 12, 2017

Seems like it might be a little inefficient to compute this predicate on large arrays but I think it is okay for now. No need to optimize anything just yet.

Member Author

philippjfr Apr 12, 2017

I worried about that too, but 100ms for a 10000x10000 array (which is considerably larger than we'll ever use), is okay.

jlstevens reviewed

View reviewed changes

holoviews/operation/datashader.py

+                          xc = np.linspace(x0, x1, self.p.width)
+                          yc = np.linspace(y0, y1, self.p.height)
+                          xarray = xr.DataArray(np.full((self.p.height, self.p.width), np.NaN, dtype=np.float32),
+                                                dims=['y', 'x'], coords={'x': xc, 'y': yc})

Contributor

jlstevens Apr 12, 2017

Isn't this all np.NaNs? If so you could set a is_all_nans switch and use it later...

Member Author

philippjfr Apr 12, 2017

Not unless you want to add a is_all_nan switches to Image Elements. It's two distinct operations.

Contributor

jlstevens commented Apr 12, 2017

I suppose, there are currently no datashader unit tests.

Ok, maybe file an issue about that then (referencing this PR) and we can address it later.

Contributor

jlstevens commented Apr 12, 2017

Made a few comments but otherwise I'm happy to merge.

Member Author

philippjfr commented Apr 12, 2017

Made a few comments but otherwise I'm happy to merge.

Let's just merge, I'll open an issue about unit tests for datashader operations.

Contributor

jlstevens commented Apr 12, 2017

Looks good. Merging.

jlstevens merged commit 4e8292a into master

jbednar deleted the datashader_empty_handling branch

April 12, 2017 21:48

Member

jbednar commented Apr 12, 2017

Looks good, thanks. If you are working around behavior in datashader that you think should be fixed (e.g. if it should be raising a more sensible exception in some of these cases) then please file an issue there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet