New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spread support for aggregate arrays #771
Conversation
See examples at https://anaconda.org/jbednar/2_pipeline-aggspread . |
@jlstevens , please also consider more fundamental changes instead or in addition; e.g. can we take a different approach to making isolated points visible that has better properties than spreading (e.g. would a simple convolution with a box filter work?). |
Also consider:
|
This would be really useful, it would potentially also make holoviz/holoviews#2102 less of an issue for people who need a colorbar together with spread |
It is hard for me to know what the best tradeoffs are, but what I can do is list the pros and cons of different approaches. I'll list the following proposals: convolution (normalized) and convolution (not normalized): Convolution (normalized)Pros
Cons
Convolution (not normalized)Pros
Cons
Thinking about the two cases above, the best choice (in my opinion!) is to offer spread as a convolution with 1) a spatial box kernel by default that can be overridden with any custom kernel 2) a boolean to normalize the kernel that is off by default. By a spatial box kernel, I mean a kernel that copies the value of the central pixel to surrounding pixels in a spatially uniform way (e.g as an approximation to a circle or as a square). One operation could then offer the user with the available tradeoffs. The default aims to keep hover information correct in value (but available over a larger space), making things easier to see. Enabling normalization is then optional and allows the user to preserve the overall weight in the input if they wish. Using custom kernels (e.g gaussians) with normalization, the user can then use this operation to do mathematically correct image blurring (for instance). No matter how this operation is used, the user needs to understand how it is distorting the original data for the sake of visual clarity. The suggested defaults should result in a purely spatial distortion (i.e giving tiny points a spatial extent) which I think is probably what most people want in most cases. |
Numba is now enabled for spreading on aggregates, namely
I've been thinking about this and I don't think the spreading operators (e.g My next step is to get this working with |
datashader/composite.py
Outdated
|
||
@arr_operator | ||
def saturate_arr(src, dst): | ||
return src + dst |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like for arrays we also need max and min operators. Maybe source, over, or saturate could be most appropriately interpreted as one or both of those? If not, maybe need to add separate max and min operators, presumably both for images and for arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is leading me to believe that maybe we do some validation about the operators used. We can default to add
which makes sense for RGB and aggregate arrays and then complain if an unsuitable operator is chosen for the supplied data type.
Then we could offer min
and max
operators for use with aggregate arrays only instead of trying to map these two concepts back to the set of operators that make sense with RGB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this resolved, then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the idea of 'saturate' being src+dst
might be a bit iffy but the 'min' and 'max' operators are now offered and I think those semantics are clear at least. I suppose the issue is whether we want to support the same operator names as RGB? Having to support some operators for RGB only and a different set for arrays would be a pain so this kind of behavior isn't totally unreasonable imho as long as it is described in the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we leave saturate
and over
defined for arrays, then we'd presumably not mention them in the docs, but I'm inclined to delete them if they don't make sense for arrays. That would mean changing the default how
to None, with code selecting add
if it's an array and over
if it's an Image, I guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think how=None
is particularly clear as there is a default 'how' which (in my opinion) ought to be explicit. Tricky! Maybe we could do the following?
- As we want to encourage array spreading instead of the questionable RGB spreading, I think the default should be driven by that choice. So the default should be
how='add'
but there is already an image operator named 'add'... - ... which means I could rename the array
add
operator to'sum'
? I think 'source' is the only operator that really makes sense for both images and array cases. - So the default could be
how='sum'
in which case any image inputs could warn the user that they are switching tohow='over'
for backwards compatibility? Alternatively we could let it error which means people would have to specifyhow='over'
to emulate the old default... - Given a default that works for both images and arrays, I think we should then do some validation. Images can then only use
source
,over
,saturate
andadd
which arrays can only usesource
,sum
,min
andmax
Would this make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that will be confusing, as there is no semantic difference between add
and sum
. Warning that we'll switch over to over
for backwards compatibility seems like a pain for users; there's nothing wrong with their current code, and they should expect it to keep working with no warnings unless we wished to warn about using RGB spreading at all (which even if we did wouldn't be for another 6 months at the earliest given how long RGB spreading has been the only spreading available!). I do think we should do validation, but I find the issues with changing the default to None much less than the issues with these other alternatives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok...we can do that but we will need to state that the default 'how' is 'over' and 'add' for images and arrays respectively both in the docstrings and the online docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but online docs might not even mention any details about RGB spreading, so it might not come up there at all...
@jbednar @philippjfr I have pushed an update to how density is computed for dynspread to allow it to support float and int aggregates. I don't think the results aren't quite right yet but I think that is down to fixes needed in |
Could you rebase the PR or do you mind if I push up my rebase? It's not usable with HoloViews in its current state. |
aa1f884
to
5e22544
Compare
@jbednar asked me to do it, so make sure to |
I just noticed that this PR doesn't have images to show the effect of the new aggregate based spreading. Here is the result of spread (input on the left and spread output on the right) where int_spread = tf.spread(agg, name="spread 1px")
hv.Image(agg).opts(cmap='Blues', clim=(0, 20)) + hv.Image(int_spread).opts(cmap='Blues', clim=(0, 20)) And here is the result of spread (input on the left and spread output on the right) where float_spread = tf.spread(aggf, name="spread 1px")
hv.Image(agg).opts(cmap='Blues', clim=(0, 20)) + hv.Image(float_spread).opts(cmap='Blues', clim=(0, 20)) Being able to apply spread on aggregate arrays will allow colormapping with Bokeh which isn't possible when working in RGB space. |
Although numba stencils looked like an ideal fit for implementing the spread operation, the spreading implementation earlier in this PR which did work correctly had major performance issues (~10x slower than using manual loops). I have now gone back to explicit loops, defining a separate float and integer kernel which restores the performance. Now, spreading on uint32 is as fast as the original RGB spreading implementation (very slightly faster in fact) and the operation remains fast for the newly supported dtypes. This PR will be ready once I've added some tests and I'll file an issue on the numba repo to report the performance issue I've encountered while using stencils. |
cfd404e
to
92b230c
Compare
Looks like I'll have to rebase this PR once #948 is ready to run the tests properly. |
@jbednar Ready for review/merge. |
@jbednar Here is an example of categorical spreading from the census example (showing Chinatown): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks!
datashader/composite.py
Outdated
|
||
@arr_operator | ||
def saturate_arr(src, dst): | ||
return src + dst |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this resolved, then?
Co-authored-by: James A. Bednar <jbednar@users.noreply.github.com>
Co-authored-by: James A. Bednar <jbednar@users.noreply.github.com>
Co-authored-by: James A. Bednar <jbednar@users.noreply.github.com>
Thanks, @jlstevens, for hitting this home! |
Previously, spreading has been supported only for Image objects, working directly on RGBA pixels. As a result, it has not been available for use with the HoloViews rasterizing operation, which returns aggregate arrays rather than Images.
As proposed in #326, this PR adds a set of array spreading operators and allows spread to accept either an image or an aggregate array.
Remaining issues:
aggc
in 2_Pipeline.ipynb)_density()
in transfer_functions.pycomposite.arr_operator(f)
doesn't currently have the type-specific Numba compilation code fromcomposite.operator(f)
, and it will need to handle whatever datatypes are supported for aggregations (i32, i64, f32, f64? Not sure.). Spreading is remarkably slow without Numba!spread()
operation in HoloViews to allow it to be used withrasterize()
output (hv.Image) instead of justspread()
output (hv.RGB).