New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle empty aggregation in datashader operations #1281

Merged
merged 5 commits into from Apr 12, 2017

Conversation

Projects
None yet
3 participants
@philippjfr
Member

philippjfr commented Apr 12, 2017

As the title says, this handles empty aggregates gracefully in the datashader operations and ensures the both the interfaces and plotting backends display handle the resulting aggregates correctly.

@philippjfr

This comment has been minimized.

Member

philippjfr commented Apr 12, 2017

Ready to review.

@jlstevens

This comment has been minimized.

Member

jlstevens commented Apr 12, 2017

Reviewing now. Planning on adding some unit tests?

@philippjfr

This comment has been minimized.

Member

philippjfr commented Apr 12, 2017

Reviewing now. Planning on adding some unit tests?

I suppose, there are currently no datashader unit tests.

elif isinstance(obj, Element):
glyph = 'line' if isinstance(obj, Curve) else 'points'
paths.append(PandasInterface.as_dframe(obj))
if dims is None or len(dims) != 2:
return None, None, None, None

This comment has been minimized.

@jlstevens

jlstevens Apr 12, 2017

Member

Purely syntactic preference, but I would wrap the return tuple in parentheses i.e (None, None, None, None)

for d in (x, y):
if df[d].dtype.kind == 'M':
param.warning('Casting %s dimension data to integer '
'datashader cannot process datetime data ')

This comment has been minimized.

@jlstevens

jlstevens Apr 12, 2017

Member

How interpretable would the rest be after such datetime to int casting? I suppose it might work out but maybe it doesn't really make sense?

This comment has been minimized.

@philippjfr

philippjfr Apr 12, 2017

Member

You can add a datetime formatter to your axis and it will work. I think it's fine with the warning.

This comment has been minimized.

@jlstevens
@@ -176,6 +194,15 @@ def _process(self, element, key=None):
category = agg_fn.column if isinstance(agg_fn, ds.count_cat) else None
x, y, data, glyph = self.get_agg_data(element, category)
if x is None or y is None:
x0, x1 = self.p.x_range or (-0.5, 0.5)
y0, y1 = self.p.y_range or (-0.5, 0.5)

This comment has been minimized.

@jlstevens

jlstevens Apr 12, 2017

Member

Seems like unit range is the default. Guess that is fine as long as this is sensible default behavior when the data is missing.

@@ -307,7 +334,12 @@ def _process(self, element, key=None):
with warnings.catch_warnings():
warnings.filterwarnings('ignore', r'invalid value encountered in true_divide')
img = tf.shade(array, **shade_opts)
if np.isnan(array.data).all():

This comment has been minimized.

@jlstevens

jlstevens Apr 12, 2017

Member

Seems like it might be a little inefficient to compute this predicate on large arrays but I think it is okay for now. No need to optimize anything just yet.

This comment has been minimized.

@philippjfr

philippjfr Apr 12, 2017

Member

I worried about that too, but 100ms for a 10000x10000 array (which is considerably larger than we'll ever use), is okay.

xc = np.linspace(x0, x1, self.p.width)
yc = np.linspace(y0, y1, self.p.height)
xarray = xr.DataArray(np.full((self.p.height, self.p.width), np.NaN, dtype=np.float32),
dims=['y', 'x'], coords={'x': xc, 'y': yc})

This comment has been minimized.

@jlstevens

jlstevens Apr 12, 2017

Member

Isn't this all np.NaNs? If so you could set a is_all_nans switch and use it later...

This comment has been minimized.

@philippjfr

philippjfr Apr 12, 2017

Member

Not unless you want to add a is_all_nan switches to Image Elements. It's two distinct operations.

@jlstevens

This comment has been minimized.

Member

jlstevens commented Apr 12, 2017

I suppose, there are currently no datashader unit tests.

Ok, maybe file an issue about that then (referencing this PR) and we can address it later.

@jlstevens

This comment has been minimized.

Member

jlstevens commented Apr 12, 2017

Made a few comments but otherwise I'm happy to merge.

@philippjfr

This comment has been minimized.

Member

philippjfr commented Apr 12, 2017

Made a few comments but otherwise I'm happy to merge.

Let's just merge, I'll open an issue about unit tests for datashader operations.

@jlstevens

This comment has been minimized.

Member

jlstevens commented Apr 12, 2017

Looks good. Merging.

@jlstevens jlstevens merged commit 4e8292a into master Apr 12, 2017

4 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
coverage/coveralls Coverage decreased (-0.09%) to 78.944%
Details
s3-reference-data-cache Test data is cached.
Details

@jbednar jbednar deleted the datashader_empty_handling branch Apr 12, 2017

@jbednar

This comment has been minimized.

Member

jbednar commented Apr 12, 2017

Looks good, thanks. If you are working around behavior in datashader that you think should be fixed (e.g. if it should be raising a more sensible exception in some of these cases) then please file an issue there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment