Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spread support for aggregate arrays #771

Merged
merged 37 commits into from Aug 19, 2020
Merged

Spread support for aggregate arrays #771

merged 37 commits into from Aug 19, 2020

Conversation

jbednar
Copy link
Member

@jbednar jbednar commented Jul 24, 2019

Previously, spreading has been supported only for Image objects, working directly on RGBA pixels. As a result, it has not been available for use with the HoloViews rasterizing operation, which returns aggregate arrays rather than Images.

As proposed in #326, this PR adds a set of array spreading operators and allows spread to accept either an image or an aggregate array.

Remaining issues:

  • Special handling will be needed for categorical aggregates, which are stacks of regular aggregates (e.g. aggc in 2_Pipeline.ipynb)
  • dynspread is not yet supported for aggregate arrays; changes will be needed to _density() in transfer_functions.py
  • Spreading for aggregates is currently only supported when Numba is disabled, composite.arr_operator(f) doesn't currently have the type-specific Numba compilation code from composite.operator(f), and it will need to handle whatever datatypes are supported for aggregations (i32, i64, f32, f64? Not sure.). Spreading is remarkably slow without Numba!
  • How to interpret the various spreading operators for the aggregate case is open for discussion. There are four different operators defined for images, but right now all those operators are defined for aggregates as only one of two different behaviors, with source_arr returning src if truthy, and else dest, and the rest all returning src+dst. Is this correct/useful/the only possibility?
  • Need to update examples/getting_started/2_Pipeline.ipynb to explain agg spreading. Explaining it will be tricky, because spreading is currently explained at the image level, which is in a different section of the pipeline from aggregates; spreading discussion either needs to move to the aggregate stage and then be expanded later when discussed for images (which have different allowable operations), or the other way around (pointing forward to the image section from the aggregates section).
  • Presumably need to update the spread() operation in HoloViews to allow it to be used with rasterize() output (hv.Image) instead of just spread() output (hv.RGB).

@jbednar
Copy link
Member Author

jbednar commented Jul 24, 2019

See examples at https://anaconda.org/jbednar/2_pipeline-aggspread .

@jbednar
Copy link
Member Author

jbednar commented Jul 24, 2019

@jlstevens , please also consider more fundamental changes instead or in addition; e.g. can we take a different approach to making isolated points visible that has better properties than spreading (e.g. would a simple convolution with a box filter work?).

@jbednar
Copy link
Member Author

jbednar commented Jul 24, 2019

Also consider:

  • Should the operators be normalized, to make e.g. add into mean? Seems more useful in general than having to hit overflow limits.

@ltalirz
Copy link

ltalirz commented Jan 16, 2020

This would be really useful, it would potentially also make holoviz/holoviews#2102 less of an issue for people who need a colorbar together with spread

@jlstevens
Copy link
Collaborator

It is hard for me to know what the best tradeoffs are, but what I can do is list the pros and cons of different approaches. I'll list the following proposals: convolution (normalized) and convolution (not normalized):

Convolution (normalized)

Pros

  • Mathematically defined, preserves the total weight across the input.
  • Flexible as you can use different kernel shapes.

Cons

  • By spreading the weight around, you change the absolute values. E.g the hover tool hovering over a value of 1.0 before spreading will show a lower value at that point after spreading.

    Note: Visually the increase in spatial extent should make things easier to see and auto-ranging for e.g colormaps/colorbars can compensate for the change in absolute values.

Convolution (not normalized)

Pros

  • Allowing non normalized kernels is a superset of normalized kernels so you have extra flexibility. In other words, you can accept any kernel which may or may not be normalized.
  • You can use kernels that keep the center pixel the same and simply increase the weight of surrounding pixels accordingly (e.g to the same absolute value in the case of a box filter). This means that you can hover over a larger area (the area spread out by the kernel) and still see the original value with a hover tool (for instance).

Cons

  • If not normalized, the result is not as well mathematically defined. The overall weight may go up or down.

Thinking about the two cases above, the best choice (in my opinion!) is to offer spread as a convolution with 1) a spatial box kernel by default that can be overridden with any custom kernel 2) a boolean to normalize the kernel that is off by default. By a spatial box kernel, I mean a kernel that copies the value of the central pixel to surrounding pixels in a spatially uniform way (e.g as an approximation to a circle or as a square).

One operation could then offer the user with the available tradeoffs. The default aims to keep hover information correct in value (but available over a larger space), making things easier to see. Enabling normalization is then optional and allows the user to preserve the overall weight in the input if they wish. Using custom kernels (e.g gaussians) with normalization, the user can then use this operation to do mathematically correct image blurring (for instance).

No matter how this operation is used, the user needs to understand how it is distorting the original data for the sake of visual clarity. The suggested defaults should result in a purely spatial distortion (i.e giving tiny points a spatial extent) which I think is probably what most people want in most cases.

@jlstevens
Copy link
Collaborator

Numba is now enabled for spreading on aggregates, namely int32, int64, float32 and float64. Support for RGBs is preserved and spreading on the integer types matches the previous results (after colormapping, obviously!). Spreading on floats gives different enough results that some investigation will be needed there.

How to interpret the various spreading operators for the aggregate case is open for discussion. There are four different operators defined for images, but right now all those operators are defined for aggregates as only one of two different behaviors, with source_arr returning src if truthy, and else dest, and the rest all returning src+dst. Is this correct/useful/the only possibility?

I've been thinking about this and I don't think the spreading operators (e.g over and saturate etc) make much sense outside an RGB context. I think your suggestion is the best we will be able to do with this model and that add is the most intuitive default option when working with aggregate arrays.

My next step is to get this working with dynspread.


@arr_operator
def saturate_arr(src, dst):
return src + dst
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like for arrays we also need max and min operators. Maybe source, over, or saturate could be most appropriately interpreted as one or both of those? If not, maybe need to add separate max and min operators, presumably both for images and for arrays.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is leading me to believe that maybe we do some validation about the operators used. We can default to add which makes sense for RGB and aggregate arrays and then complain if an unsuitable operator is chosen for the supplied data type.

Then we could offer min and max operators for use with aggregate arrays only instead of trying to map these two concepts back to the set of operators that make sense with RGB.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this resolved, then?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea of 'saturate' being src+dst might be a bit iffy but the 'min' and 'max' operators are now offered and I think those semantics are clear at least. I suppose the issue is whether we want to support the same operator names as RGB? Having to support some operators for RGB only and a different set for arrays would be a pain so this kind of behavior isn't totally unreasonable imho as long as it is described in the docs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we leave saturate and over defined for arrays, then we'd presumably not mention them in the docs, but I'm inclined to delete them if they don't make sense for arrays. That would mean changing the default how to None, with code selecting add if it's an array and over if it's an Image, I guess?

Copy link
Collaborator

@jlstevens jlstevens Aug 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think how=None is particularly clear as there is a default 'how' which (in my opinion) ought to be explicit. Tricky! Maybe we could do the following?

  1. As we want to encourage array spreading instead of the questionable RGB spreading, I think the default should be driven by that choice. So the default should be how='add' but there is already an image operator named 'add'...
  2. ... which means I could rename the array add operator to 'sum'? I think 'source' is the only operator that really makes sense for both images and array cases.
  3. So the default could be how='sum' in which case any image inputs could warn the user that they are switching to how='over' for backwards compatibility? Alternatively we could let it error which means people would have to specify how='over' to emulate the old default...
  4. Given a default that works for both images and arrays, I think we should then do some validation. Images can then only use source, over, saturate and add which arrays can only use source, sum, min and max

Would this make sense?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that will be confusing, as there is no semantic difference between add and sum. Warning that we'll switch over to over for backwards compatibility seems like a pain for users; there's nothing wrong with their current code, and they should expect it to keep working with no warnings unless we wished to warn about using RGB spreading at all (which even if we did wouldn't be for another 6 months at the earliest given how long RGB spreading has been the only spreading available!). I do think we should do validation, but I find the issues with changing the default to None much less than the issues with these other alternatives.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok...we can do that but we will need to state that the default 'how' is 'over' and 'add' for images and arrays respectively both in the docstrings and the online docs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but online docs might not even mention any details about RGB spreading, so it might not come up there at all...

@jlstevens
Copy link
Collaborator

@jbednar @philippjfr I have pushed an update to how density is computed for dynspread to allow it to support float and int aggregates.

I don't think the results aren't quite right yet but I think that is down to fixes needed in spread and not due to this last set of updates to dynspread. I'll investigate shortly but for now this should be enough to allow Philipp to check that these dynspread updates will work with the new HoloViews release.

@philippjfr
Copy link
Member

Could you rebase the PR or do you mind if I push up my rebase? It's not usable with HoloViews in its current state.

@philippjfr
Copy link
Member

@jbednar asked me to do it, so make sure to git reset origin/arr_spreading --hard before continuing.

@jlstevens
Copy link
Collaborator

jlstevens commented Jun 30, 2020

I just noticed that this PR doesn't have images to show the effect of the new aggregate based spreading. Here is the result of spread (input on the left and spread output on the right) where agg is an integer aggregate array:

int_spread = tf.spread(agg, name="spread 1px")
hv.Image(agg).opts(cmap='Blues', clim=(0, 20)) + hv.Image(int_spread).opts(cmap='Blues', clim=(0, 20))

And here is the result of spread (input on the left and spread output on the right) where aggf is an float aggregate array:

float_spread = tf.spread(aggf, name="spread 1px")
hv.Image(agg).opts(cmap='Blues', clim=(0, 20)) + hv.Image(float_spread).opts(cmap='Blues', clim=(0, 20))

Being able to apply spread on aggregate arrays will allow colormapping with Bokeh which isn't possible when working in RGB space.

@jlstevens
Copy link
Collaborator

Although numba stencils looked like an ideal fit for implementing the spread operation, the spreading implementation earlier in this PR which did work correctly had major performance issues (~10x slower than using manual loops).

I have now gone back to explicit loops, defining a separate float and integer kernel which restores the performance. Now, spreading on uint32 is as fast as the original RGB spreading implementation (very slightly faster in fact) and the operation remains fast for the newly supported dtypes.

This PR will be ready once I've added some tests and I'll file an issue on the numba repo to report the performance issue I've encountered while using stencils.

@jlstevens
Copy link
Collaborator

Looks like I'll have to rebase this PR once #948 is ready to run the tests properly.

@jlstevens
Copy link
Collaborator

@jbednar Ready for review/merge.

@jlstevens
Copy link
Collaborator

@jbednar Here is an example of categorical spreading from the census example (showing Chinatown):

px=0:
image

px=1:
image

px=2:
image

Copy link
Member Author

@jbednar jbednar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks!

datashader/composite.py Show resolved Hide resolved
datashader/composite.py Outdated Show resolved Hide resolved
datashader/transfer_functions/__init__.py Outdated Show resolved Hide resolved

@arr_operator
def saturate_arr(src, dst):
return src + dst
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this resolved, then?

datashader/transfer_functions/__init__.py Outdated Show resolved Hide resolved
datashader/transfer_functions/__init__.py Outdated Show resolved Hide resolved
datashader/transfer_functions/__init__.py Outdated Show resolved Hide resolved
datashader/transfer_functions/__init__.py Outdated Show resolved Hide resolved
@jbednar jbednar merged commit 6ac8588 into master Aug 19, 2020
@jbednar jbednar changed the title WIP: Spread support for aggregate arrays Spread support for aggregate arrays Aug 19, 2020
@jbednar jbednar deleted the arr_spreading branch August 19, 2020 21:08
@jbednar jbednar restored the arr_spreading branch August 19, 2020 21:08
@jbednar
Copy link
Member Author

jbednar commented Aug 19, 2020

Thanks, @jlstevens, for hitting this home!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants