Antialiased lines #1048

ianthomas23 · 2022-03-02T14:31:44Z

This is a work in progress to add antialiased lines to datashader. It is making good progress, but I need help/advice on how to proceed with the numba/toolz decorators and the nested functions that create and return other functions.

There are 2 commits. The first adds the low-level antialiasing code but only works in certain situations using pandas dataframes for the input data. This is fast. The second commit restructures the code in a better way and supports dask dataframes, but the performance is appalling.

There is a gist (https://gist.github.com/ianthomas23/898d0d54e35abace6f9be3efcef4a8c5) that can be used to show the performance difference. You will need to use this branch of course, and change lines 216-7 of datashader/core.py from

    def line(self, source, x=None, y=None, agg=None, axis=0, geometry=None,
             linewidth=0, antialias=False):

to

    def line(self, source, x=None, y=None, agg=None, axis=0, geometry=None,
             linewidth=1, antialias=False):

so that you get antialiased lines by default without having to modify holoviz code.

The key difference between the master branch and this is that the antialiasing has to occur on floating-point aggs, even if you are using an any or count reduction, which are usually bool and uint32. Drawing antialiased lines is now uses a two-stage agg process. The first stage is run once for each line or group of lines and uses a floating-point max to deal with the antialiasing correctly, then these are combined in a reduction-specific manner to give the returned floating-point agg.

The key architectural change is therefore that the dtype of a Reduction is no longer fixed but is determined at runtime based on whether antialiasing is turned on or not. In the first commit this is botched to get it working, but has to be done correctly in the second commit so that, for example, the correct dask combine functions are used. The dtype is now an attribute of each Reduction object rather than it being static. The ramification of this is that I have had to turn off some of the cunning/magic decorators of the nested functions-that-return-functions which loses the numba performance benefits.

I suspect that the first place to look is the one-line change in compiler.py, commenting out the @memoize:

#@memoize
def compile_components(agg, schema, glyph, cuda=False):

so I am next inclined to get an antialiased flag as an argument here and pass it down through the hierarchy of functions-that-generate-functions. But it would be good to get the opinion of someone who is more familiar with these decorators.

stuartarchibald · 2022-03-04T09:16:36Z

@ianthomas23 just following up on the request for help xref: https://numba.discourse.group/t/numba-and-datashader/1221

So as to be able to help with this, please could you explain in more detail what the problem is with regards to the following (as I assume this is the issue?):

I have had to turn off some of the cunning/magic decorators of the nested functions-that-return-functions which loses the numba performance benefit

Some guiding questions:

Is the "magic" decorator involved toolz.memoize?
What is the issue with the decorator interacting with Numba?
- Failure to compile?
- Poor performance?
- Something strange happening!?
Can the above issue be recreated as a minimal working reproducer just with e.g. toolz and numba (so as not to involve the holoviz stack)?

Many thanks!

ianthomas23 · 2022-03-04T15:20:26Z

Thanks @stuartarchibald.

* Is the "magic" decorator involved `toolz.memoize`?

Yes. I think that this might be the key:

https://github.com/ianthomas23/datashader/blob/def6154fc1321fc78c4f5862746b755aab786167/datashader/compiler.py#L17

If I leave the memoize in I can draw a normal line (original code) as quickly as before, but if I then try to draw an antialiased line (my new code) I end up using the memoized function so that line isn't antialiased. Hence I disabled the memoize and it draws fine but is understandably slow. The code I refer to as magic begins here such as the make_create calls that return some functions that depend on runtime values. This creation and returning of functions is nested, and I believe it may be setup this way to help numba make the right decisions. I have heard the opinion that it works this way because it was required a few years ago, but now it does not need to be like this because numba is much cleverer now.

* What is the issue with the decorator interacting with Numba?
  
  * Failure to compile?
  * Poor performance?
  * Something strange happening!?

Poor performance. (I can get failure to compile but only when trying to use cupy/cudf so I have shelved that for the moment).

* Can the above issue be recreated as a minimal working reproducer just with e.g. `toolz` and `numba` (so as not to involve the `holoviz` stack)?

I suspect not. I don't think there are actually any bugs/problems with numba or toolz here, it is just the combination of my usage and datashader's. If I could start from scratch I think all would be fine, but I am constrained to work within datashader.

Essentially I have done the bit I am good at (the antialiasing) and now I am inefficiently struggling with the datashader codebase, and I was hoping that someone with some complementary skills to mine could give me some guidance on what direction to go in. I know that is a big ask, but I have to ask anyway.

luk-f-a · 2022-03-04T16:49:43Z

one thing that you should check, that it probably would not come up in a stand along minimal reproducer, is whether the functions are being re-compiled several times, which would lead to poor performance.

jbednar · 2022-03-10T15:54:48Z

It looks like the underlying issue is that making the dtype be an attribute is interfering with how Numba compiles type-specialized code. We could try to debug that whole process, but to me it seems simpler to make Numba happy by substituting the given reduction with one that has the correct type. E.g. if a user supplied rd.any but also asks for antialiasing, you could update the reduction to rd.any_f32 (or equivalent) and leave all the other low-level code paths the same. That way Numba would always be dealing with a single, fixed type by the time it matters for performance.

ianthomas23 · 2022-03-11T15:17:33Z

There is significant progress here and the performance problems are resolved. The approach, following that suggested by @jbednar, was to change the code at the lower levels that uses numba/toolz from object-oriented calls outside of those functions to a more C-like approach passing in everything as function arguments. This presumably allows numba/toolz to compile a fast version of each function based on those arguments.

This works for both pandas and dask inputs to Canvas.line(), and the performance is good enough for interactive holoviews-based visualisation.

ianthomas23 · 2022-03-18T14:27:44Z

I have modified the existing antialiased line tests in line with the new functionality. You can see that the output is much improved by looking at the changes to the PNG files in the datashader/tests/data directory.

I have also added a warning when antialiased lines are attempted using a CUDA-backed data source, e.g. cudf, dask-cudf. This is not yet supported, it could be but is beyond the scope of this PR.

philippjfr · 2022-03-24T14:27:26Z

For illustration here's a video of this in action recorded by @ianthomas23:

Screencast.2022-03-16.10_25_57.mp4

jbednar · 2022-04-01T22:14:30Z

Overall, the GH image diffs look great; nice improvements! I'm not sure this one is an improvement, though:

If I swipe back and forth between them it looks like the antialiasing might be slightly biased upwards and to the right, by a half pixel or so? (Or maybe was the old version biased in the other direction)?

jbednar · 2022-04-01T22:19:27Z

Apart from that, I'm happy to merge!

ianthomas23 · 2022-04-04T13:14:02Z

I've fixed the transform when not snapping to pixels, here is the test image change now:

It turned out to not be half a pixel out but a fraction of a pixel out from zero on the left and bottom to a full pixel on the right and top.

The transform from data to canvas coordinates in the x-direction uses a linear scale sx and translation tx and for the default snapped coordinates is

xx = int(x_mapper(x) * sx + tx)

whereas for the antialiased non-snapped coordinates is

xx = x_mapper(x)*sx*(xmax-xmin-1)/(xmax-xmin) + tx

So I am taking the snapped transform and modifying it to use in the non-snapped situation. It feels like it would be better for the non-snapped transform to be the fundamental one and in the snapped situation just round to the nearest pixel, but this is too dangerous a change to consider just before a release.

Evidently I have messed up the rebase, I will fix this.

ianthomas23 · 2022-04-05T10:43:39Z

This needs more discussion. The transform from data to canvas coordinates doesn't use xmin and xmax which are the data limits but uses the canvas width instead:

xx = (x_mapper(x)*sx + tx)*(width-1)/width

This information is not available at the point in the code where it is needed. We have the following options:

Use the non-snapped coordinate transform throughout, and change the default snapped calculations to use round on the non-snapped coordinates. This actually simplifies the code as we can remove explicit checks for points just off the top and right of the canvas and hence we'd no longer need to pass in the data limits (xmin, xmax, ymin, ymax) to the various map_onto_pixel functions. But it gives different rounding so a number of test fails and will have to be checked carefully. Because of this reason it is a breaking change for downstream libraries so probably should not be done in a hurry.
Add the canvas size to every use of map_onto_pixel (15 places) plus all the functions like draw_segment that call them, through perform_extend_line etc, build_extend_line etc. There are lots of these calls so this change is quite invasive.
Make the requirement for not snapping and the canvas size available on the Axis objects so that they can return a different transform when snapping and non-snapping. As the Axis objects are reused by the canvas we'd have to set and unset the not snapping flag before and after calling antialiased line functions.

ianthomas23 · 2022-04-05T11:00:23Z

Rather than use scale and translate variables (sx, tx, sy, ty) for the transform instead define a new class for it containing the canvas size and with separate "apply transform" functions for the snapped and non-snapped cases.

ianthomas23 · 2022-04-06T12:07:12Z

This is now ready to be merged.

I was wrong about the transform from data to pixel coordinates for unsnapped antialised lines and @jbednar was right. It just needed a half pixel offset.

My error was in attempting to replicate the results for the simple test case of horizontal and vertical lines at the canvas boundaries. In the (unsnapped) antialiased case the lines sit astride the boundaries so only half of the linewidth is within the canvas. In the snapped non-antialised case the lines are snapped to the nearest pixels which are the perimeter pixels of the canvas. The snapping is asymmetric in that the lines at the lower bounds are snapped in the positive x or y direction whereas lines at the upper bounds are snapped in the negative x or y direction. The rectangle formed from the centres of the 4 lines is therefore one pixel smaller in both directions in the non-antialised case than the antialiased one. If you tile adjacent canvases then the non-antialised case renders the boundary lines twice, once per tile, whereas the antialiased case renders more correctly in that half of each boundary line in rendered in each tile.

Although this now feels really obvious, it evidently wasn't originally so I will post some example images here to help with understanding in the future. Test code has lines around the perimeter of the canvas, both diagonal and another line half-way up the canvas:

import pandas as pd
import numpy as np
import datashader as ds
import datashader.transfer_functions as tf

s = 1.23  # start
e = s + 4.92  # end
m = 0.5*(s + e)  # midway

df = pd.DataFrame(
    dict(x=[s, e, e, s, s, e, np.nan, s, e, np.nan, s, e],
         y=[s, s, e, e, s, e, np.nan, e, s, np.nan, m, m]),
    dtype=np.float64,
)

for w in (30, 29):
    cvs = ds.Canvas(plot_width=w, plot_height=w)
    for i, linewidth in enumerate([0, 1, 2]):
        agg = cvs.line(source=df, x="x", y="y", agg=ds.count(), linewidth=linewidth)
        im = tf.shade(agg, how='linear', span=(0, 1), cmap=["white", "darkblue"])
        ds.utils.export_image(im, f"antialias_simple_{w}_{i}", background="white")

For canvas size an even number of pixels (30 here) the magnified results for non-antialiased, linewidth=1 and 2 are

Note full linewidth within canvas if non-antialiased but only half if antialiased and the half-way up line is snapped for non-antialiased.

For canvas size an odd number of pixels (29 here):

jbednar

Fabulous! The examples are convincing, and for each of the test cases that differ, I believe the new behavior is significantly better. Ready to merge when the tests pass after applying these minor changes.

datashader/reductions.py

datashader/core.py

datashader/glyphs/line.py

philippjfr mentioned this pull request Mar 10, 2022

Add support for datashader linewidth option holoviz/holoviews#5234

Closed

ianthomas23 changed the title ~~WIP: Antialiased lines~~ Antialiased lines Mar 18, 2022

ianthomas23 and others added 12 commits April 4, 2022 13:50

Low level implementation of antialiased lines

243a827

Make dtype attribute of Reduction objects

c9c8d5f

Replace antialiased reductions with floating point equivalents

720a350

Correct antialias combine using max or sum

3a446de

Initialise antialiased aggs with NaNs

4d00fd1

Dask support

677dd29

Correct imports

8e251bc

Show warning it try to use antialiasing with CUDA

4fc6d7d

Antialiased line tests

6beb9b7

Speed up antialiased line ds.any()

1770253

Test canvas.line for antialiased lines

6c7a86d

Updated docstrings

250b76f

Fix antialiased line spatial transform

7c782b1

Changed linewidth to line_width to match Bokeh convention

f43bc43

jbednar approved these changes Apr 6, 2022

View reviewed changes

datashader/reductions.py Outdated Show resolved Hide resolved

datashader/core.py Outdated Show resolved Hide resolved

datashader/glyphs/line.py Outdated Show resolved Hide resolved

datashader/glyphs/line.py Outdated Show resolved Hide resolved

jbednar and others added 3 commits April 6, 2022 09:59

Merge branch 'master' into line_antialiasing2

7d6747c

Apply suggestions from code review

33d55cb

Fixed error from merging with master

24e2bfb

jbednar merged commit 21973e4 into holoviz:master Apr 6, 2022

ianthomas23 deleted the line_antialiasing2 branch July 19, 2023 09:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Antialiased lines #1048

Antialiased lines #1048

ianthomas23 commented Mar 2, 2022

stuartarchibald commented Mar 4, 2022

ianthomas23 commented Mar 4, 2022

luk-f-a commented Mar 4, 2022

jbednar commented Mar 10, 2022

ianthomas23 commented Mar 11, 2022

ianthomas23 commented Mar 18, 2022

philippjfr commented Mar 24, 2022

jbednar commented Apr 1, 2022

jbednar commented Apr 1, 2022

ianthomas23 commented Apr 4, 2022 •

edited

Loading

ianthomas23 commented Apr 5, 2022 •

edited

Loading

ianthomas23 commented Apr 5, 2022

ianthomas23 commented Apr 6, 2022

jbednar left a comment

Antialiased lines #1048

Antialiased lines #1048

Conversation

ianthomas23 commented Mar 2, 2022

stuartarchibald commented Mar 4, 2022

ianthomas23 commented Mar 4, 2022

luk-f-a commented Mar 4, 2022

jbednar commented Mar 10, 2022

ianthomas23 commented Mar 11, 2022

ianthomas23 commented Mar 18, 2022

philippjfr commented Mar 24, 2022

jbednar commented Apr 1, 2022

jbednar commented Apr 1, 2022

ianthomas23 commented Apr 4, 2022 • edited Loading

ianthomas23 commented Apr 5, 2022 • edited Loading

ianthomas23 commented Apr 5, 2022

ianthomas23 commented Apr 6, 2022

jbednar left a comment

Choose a reason for hiding this comment

ianthomas23 commented Apr 4, 2022 •

edited

Loading

ianthomas23 commented Apr 5, 2022 •

edited

Loading