Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle empty aggregation in datashader operations #1281

Merged
merged 5 commits into from
Apr 12, 2017

Conversation

philippjfr
Copy link
Member

As the title says, this handles empty aggregates gracefully in the datashader operations and ensures the both the interfaces and plotting backends display handle the resulting aggregates correctly.

@philippjfr
Copy link
Member Author

Ready to review.

@jlstevens
Copy link
Contributor

Reviewing now. Planning on adding some unit tests?

@philippjfr
Copy link
Member Author

Reviewing now. Planning on adding some unit tests?

I suppose, there are currently no datashader unit tests.

elif isinstance(obj, Element):
glyph = 'line' if isinstance(obj, Curve) else 'points'
paths.append(PandasInterface.as_dframe(obj))

if dims is None or len(dims) != 2:
return None, None, None, None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Purely syntactic preference, but I would wrap the return tuple in parentheses i.e (None, None, None, None)

for d in (x, y):
if df[d].dtype.kind == 'M':
param.warning('Casting %s dimension data to integer '
'datashader cannot process datetime data ')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How interpretable would the rest be after such datetime to int casting? I suppose it might work out but maybe it doesn't really make sense?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add a datetime formatter to your axis and it will work. I think it's fine with the warning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@@ -176,6 +194,15 @@ def _process(self, element, key=None):
category = agg_fn.column if isinstance(agg_fn, ds.count_cat) else None
x, y, data, glyph = self.get_agg_data(element, category)

if x is None or y is None:
x0, x1 = self.p.x_range or (-0.5, 0.5)
y0, y1 = self.p.y_range or (-0.5, 0.5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like unit range is the default. Guess that is fine as long as this is sensible default behavior when the data is missing.

@@ -307,7 +334,12 @@ def _process(self, element, key=None):

with warnings.catch_warnings():
warnings.filterwarnings('ignore', r'invalid value encountered in true_divide')
img = tf.shade(array, **shade_opts)
if np.isnan(array.data).all():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it might be a little inefficient to compute this predicate on large arrays but I think it is okay for now. No need to optimize anything just yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worried about that too, but 100ms for a 10000x10000 array (which is considerably larger than we'll ever use), is okay.

xc = np.linspace(x0, x1, self.p.width)
yc = np.linspace(y0, y1, self.p.height)
xarray = xr.DataArray(np.full((self.p.height, self.p.width), np.NaN, dtype=np.float32),
dims=['y', 'x'], coords={'x': xc, 'y': yc})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this all np.NaNs? If so you could set a is_all_nans switch and use it later...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not unless you want to add a is_all_nan switches to Image Elements. It's two distinct operations.

@jlstevens
Copy link
Contributor

I suppose, there are currently no datashader unit tests.

Ok, maybe file an issue about that then (referencing this PR) and we can address it later.

@jlstevens
Copy link
Contributor

Made a few comments but otherwise I'm happy to merge.

@philippjfr
Copy link
Member Author

Made a few comments but otherwise I'm happy to merge.

Let's just merge, I'll open an issue about unit tests for datashader operations.

@jlstevens
Copy link
Contributor

Looks good. Merging.

@jlstevens jlstevens merged commit 4e8292a into master Apr 12, 2017
@jbednar jbednar deleted the datashader_empty_handling branch April 12, 2017 21:48
@jbednar
Copy link
Member

jbednar commented Apr 12, 2017

Looks good, thanks. If you are working around behavior in datashader that you think should be fixed (e.g. if it should be raising a more sensible exception in some of these cases) then please file an issue there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants