Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance regression for ScatterPlot #1492

Open
danielhrisca opened this issue Jan 7, 2021 · 58 comments
Open

performance regression for ScatterPlot #1492

danielhrisca opened this issue Jan 7, 2021 · 58 comments
Labels
high-priority performance Problems or questions related to speed, efficiency, etc.

Comments

@danielhrisca
Copy link
Contributor

Short description

As suggested in #1359 I open this issue.
In this custom plot widget each curve is added to a dedicated ViewBox, and all the ViewBoxes are X-linked. I usually check the performance of the ScatterPlot (with dots) with ~1000-2000 curves.

In the profile data shared in #1359 I've used ~1700 curves.

There is a big drop in performance when comparing 0.11.1 with 0.11.0

Code to reproduce

import pyqtgraph as pg
import numpy as np

Tested environment(s)

  • PyQtGraph version: 0.11.0 and 0.11.1
  • Qt Python binding: 5.15.2
  • Python version: 3.7.7 x64
  • NumPy version: 1.19.4
  • Operating system: Windows 10 x64
  • Installation method: pip

Additional context

@j9ac9k
Copy link
Member

j9ac9k commented Jan 9, 2021

Hi @danielhrisca Thanks for reporting. We made significant changes in #1420 how ScatterPlotItem, and @lidstrom83 was able to benchmark some significant performance improvements; but we are definitely curious to know if we slowed down performance.

Can you provide some dummy data or a script that we could use with your custom plot widget that we could replicate the drop in performance?

@danielhrisca
Copy link
Contributor Author

danielhrisca commented Jan 9, 2021

You will need to install the development branch of my projects first (it remove a this monkey patched updateSpots method)

pip install git+https://github.com/danielhrisca/asammdf/@development[gui]

and then you can use this script

from asammdf.gui.plot import plot
from asammdf import Signal
import numpy as np

v = np.arange(10000)
signals = [
    Signal(v, v, name=f'S_{i}')
    for i in range(400)
]

plot(signals)

At first the curves will not use ScatterPlot so you need to enable that first and re-launch the script (the setting is saved using QSettings)

dots

@ixjlyons ixjlyons added the performance Problems or questions related to speed, efficiency, etc. label Feb 6, 2021
@j9ac9k
Copy link
Member

j9ac9k commented Feb 8, 2021

Assigned "high-priority" label as this is impacted from a recent change; so I'd like to try and zero-in on where things went wrong exactly, as #1420 was supported to have significant benefits.

@danielhrisca which Qt bindings are you using? @lidstrom83 noticed that there was some very different performance behavior for PyQt5/Pyside2 and attempted to compensate for that by the use of _USE_QRECT

@danielhrisca
Copy link
Contributor Author

I'm using pyqt5.15 and pyqt5.13

@j9ac9k
Copy link
Member

j9ac9k commented Feb 8, 2021

I'm using pyqt5.15 and pyqt5.13

There is a lengthy blurb that may be relevant in your use-case:

# When pxMode=True for ScatterPlotItem, QPainter.drawPixmap is used for drawing, which
# has multiple type signatures. One takes int coordinates of source and target
# rectangles, and another takes QRectF objects. The latter approach has the overhead of
# updating these objects, which can be almost as much as drawing.
# For PyQt5, drawPixmap is significantly faster with QRectF coordinates for some
# reason, offsetting this overhead. For PySide2 this is not the case, and the QRectF
# maintenance overhead is an unnecessary burden. If this performance issue is solved
# by PyQt5, the QRectF coordinate approach can be removed by simply deleting all of the
# "if _USE_QRECT" code blocks in ScatterPlotItem. Ideally, drawPixmap would accept the
# numpy arrays of coordinates directly, which would improve performance significantly,
# as the separate calls to this method are the current bottleneck.
# See: https://bugreports.qt.io/browse/PYSIDE-163

_USE_QRECT = QT_LIB not in ['PySide2', 'PySide6']

I'm not sure what your availability here is to experiment; but I would consider starting off with _USE_QRECT = False and see if that changes your behavior.

The other thing we noticed while developing this application was that the re-creation of QPen and QBrush objects was remarkably expensive, and every attempt should be made to re-use existing objects instead of recreating new ones.

@danielhrisca
Copy link
Contributor Author

_USE_QRECT = False

image

_USE_QRECT = True

image

@j9ac9k
Copy link
Member

j9ac9k commented Feb 8, 2021

Ok; so you do get significant speedups to using _USE_QRECT = False so perhaps our assumption to have that set to False when using PyQt5 bindings was not ideal; ...I'll have to re-evaluate it.

How does that compare with pyqtgraph 0.11.0 performance?

@danielhrisca
Copy link
Contributor Author

pyqtgraph==0.11.0

image

@j9ac9k
Copy link
Member

j9ac9k commented Feb 8, 2021

clearly something else is up, because #1420 was supposed to be an across the board improvement; but if you can get back to almost where you were at with _USE_QRECT = False alone I may take the high priority tag off once I do more profiling on all our supported bindings...

Anyway, next step is to re-evaluate when its appropriate to set that variable...

@danielhrisca
Copy link
Contributor Author

danielhrisca commented Feb 8, 2021

I get ~2x performance improvement if I use this SymbolAtlas._keys implementation

    def _keys(self, styles):
        unique_styles = set(
            (symbol, size, id(pen), id(brush))
            for symbol, size, pen, brush in styles
        )
        if len(unique_styles) == 1:
            symbol, size, pen, brush = styles[0]
            return [
                (
                    symbol if isinstance(symbol, (str, int)) else f"{symbol.boundingRect()} + {symbol.elementCount()} elements",
                    size,
                    (pen.style(), pen.capStyle(), pen.joinStyle()),
                    (brush.color().rgba(), brush.style())
                )
            ] * len(styles)
        else:

            return [
                (
                    symbol if isinstance(symbol, (str, int)) else f"{symbol.boundingRect()} + {symbol.elementCount()} elements",
                    size,
                    (pen.style(), pen.capStyle(), pen.joinStyle()),
                    (brush.color().rgba(), brush.style())
                ) for symbol, size, pen, brush in styles
            ]

@j9ac9k
Copy link
Member

j9ac9k commented Feb 8, 2021

Wow, 2x is nothing to sneeze at!

If you don't mind me asking, for your use of the ScatterPlotItem symbols, were you providing a custom QPainterPath?

I need to read through that implementation a bit more, I know I just patched it up, but I'm keying on the boundingRect() properties, when I likely should key on just height/width... also I am not sure why this is a list and not a set to begin with...

Thanks for doing the experimentation here on this.

@danielhrisca
Copy link
Contributor Author

This is where the PlotDataItem is created:

https://github.com/danielhrisca/asammdf/blob/master/asammdf/gui/widgets/plot.py#L2417

I think another optimization that could be made is for the case where all the styles are identical. This would totally eliminate the set creation which takes some processing time

image

@danielhrisca
Copy link
Contributor Author

I think another optimization that could be made is for the case where all the styles are identical. This would totally eliminate the set creation which takes some processing time

Actually I think many optimizations can be made for the case where all the scatter dots have the same style

@j9ac9k
Copy link
Member

j9ac9k commented Feb 8, 2021

I think another optimization that could be made is for the case where all the styles are identical. This would totally eliminate the set creation which takes some processing time

Actually I think many optimizations can be made for the case where all the scatter dots have the same style

Yeah, that's what it's looking like to me as well; but I'm reluctant to make many changes until I can collaborate w/ lidstrom83 (not tagging him as I've already tagging him in a bunch of places and I know he's unavailable right now).

I need to spend some time reading over this code more to better understand the functionality; I do appreciate you testing/profiling these changes, certainly makes it a bit easier for me to evaluate.

also <built-in method color> 3.05 ms ...why so long to get the color of a pen?! ...but that's neither here nor there...

@j9ac9k
Copy link
Member

j9ac9k commented Feb 8, 2021

I also think it might be helpful to implement QPixmapCache https://doc.qt.io/qt-5/qpixmapcache.html#details but that's definitely for another PR/round of optimization.

@j9ac9k
Copy link
Member

j9ac9k commented Feb 8, 2021

@danielhrisca if you want to hop onto our slack (or communicate off the issue tracker) to try and iterate through something that may work a bit faster let me know.

@j9ac9k
Copy link
Member

j9ac9k commented Feb 9, 2021

So, I'm going to kick the can down the road a bit on the _USE_QRECT bit, as I'll need to test against a bunch of bindings... but going back to the SymbolAtlas bit...type hints would really be useful here...

Actually I think many optimizations can be made for the case where all the scatter dots have the same style

I think you're right ...

We have a SymbolAtlas, it looks like it is used in 5 places within ScatterPlotItem

  • ScatterPlotItem._maybeRebuildAtlas uses SymbolAtlas.__len__ (which returns the length of SymbolAtlas._coords dict)
  • ScatterPlotItem._maybeRebuildAtlas calls SymbolAtlas.rebuild if an if-statement is not met...
  • ScatterPlotItem.updateSpots uses SymbolAtlas.__getitem__
  • ScatterPlotItem._updateMaxSpotSizes uses SymbolAtlas.maxWidth
  • ScatterPlotItem.paint uses SymbolAtlas.pixmap

SymbolAtlas.__getitem__ expects styles to be a List[Tuple[Union[str, int, QPainterPath], int, QPen, QBrush] and it returns List[Tuple[float, float, float, float]] where each element there corresponds to (x, y, width, height) (ok, so type-hinting and named tuples would go a long way here...)

SymbolAtlas._keys is only called by SymbolAltas.__getitem__ and SymbolAtlas.rebuild. in the rebuild method, first thing it does is convert the list ._keys returns to a set(); __getitem__ constucts a dictionary key'd off the elements returned in _keys`... There is an optimization here for not building out an entire list; which many entries will likely be duplicates...

Edit: modifying def _keys(styles) to the following unfortunately did not work:

    @staticmethod
    def _keys(styles):
        return {
            (
                symbol if isinstance(symbol, (str, int)) else f"{id(symbol)}+{symbol.elementCount()}",
                size,
                (pen.style(), pen.capStyle(), pen.joinStyle()),
                (brush.color().rgba(), brush.style())
            ) : (symbol, size, pen, brush) for symbol, size, pen, brush in styles
        }

specifically w/ these lines in updateSpots

                dataSet['sourceRect'][mask] = self.fragmentAtlas[
                    list(zip(*self._style(['symbol', 'size', 'pen', 'brush'], data=dataSet, idx=mask)))
                ]

@j9ac9k
Copy link
Member

j9ac9k commented Feb 9, 2021

Going back to your suggested optimization, unfortunately I can't do it quite that way:

        unique_styles = set(
            (symbol, size, id(pen), id(brush))
            for symbol, size, pen, brush in styles
        )

since symbol can be a QPainterPath which is not hashable, but I can likely use id(symbol)

>>> path = QtGui.QPainterPath()
>>> test = (path, 5, 123, 345)
>>> set(test)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'PySide2.QtGui.QPainterPath'
>>>

@danielhrisca
Copy link
Contributor Author

I think this is the reason that the lists are used in the code since most of the Q objects are not hashable (qpen, qbrush,...)

@j9ac9k
Copy link
Member

j9ac9k commented Feb 9, 2021

I think this is the reason that the lists are used in the code since most of the Q objects are not hashable (qpen, qbrush,...)

I'm going to try and do some benchmarking on the _USE_QRECT variable and see which bindings those are helpful for; and update that property accordingly; ... I do think there is a speedup to be had with collections.Counter and the (id(symbol), size, id(pen), id(brush)) to be had...

@j9ac9k
Copy link
Member

j9ac9k commented Feb 9, 2021

@danielhrisca can you try the following method for _keys ?

    @staticmethod
    def _keys(styles):
        hashable = {
            (id(symbol), size, id(pen), id(brush)):(symbol, size, pen, brush) for symbol, size, pen, brush in styles
        }

        persistent = {
            key:(
                symbol if isinstance(symbol, (str, int)) else f"id:{id(symbol)} pts:{symbol.elementCount}",
                size,
                (pen.style(), pen.capStyle(), pen.joinStyle()),
                (brush.color().rgba(), brush.style())
            ) for key, (symbol, size, pen, brush) in hashable.items()
        }
    
        returnedList = [
            persistent[(id(symbol), size, id(pen), id(brush))] for symbol, size, pen, brush in styles
        ]
        return returnedList

@danielhrisca
Copy link
Contributor Author

image

@j9ac9k
Copy link
Member

j9ac9k commented Feb 9, 2021

from 16.40 seconds to 18.40 seconds; that's not helpful....

@danielhrisca
Copy link
Contributor Author

from 16.40 seconds to 18.40 seconds; that's not helpful....

still better than the default

@j9ac9k
Copy link
Member

j9ac9k commented Feb 9, 2021

from 16.40 seconds to 18.40 seconds; that's not helpful....

still better than the default

Oh is that with _USE_QRECT at the default option?

@danielhrisca
Copy link
Contributor Author

_USE_QRECT _keys time
True default 42s
False default 27s
False staticmethod 18.4s
0.11.0 15.4s

I've left out my _keys implementation since it is not universally applicable

@danielhrisca
Copy link
Contributor Author

danielhrisca commented Feb 9, 2021

I realize that I was looking at the wrong function timing: the plot is running as long as the window is displayed, so the correct measurement would be to use the timing for the __init__ call

_USE_QRECT _keys time
True default 29.3s
False default 16.9s
False staticmethod 9.9s
0.11.0 4.95s

@j9ac9k
Copy link
Member

j9ac9k commented Feb 9, 2021

so _USE_QRECT while appears to be a drag on display, should help with zoom/panning (on PyQt5 based bindings), I'm not sure of an easy way to benchmark that though.

@j9ac9k
Copy link
Member

j9ac9k commented Feb 9, 2021

I could try this as well since there are key bindings for zooming in and out and this makes it reproducible

That would be perfect!

I need to come up with some kind of modification to one of the examples (ideally) that collects timing information, that I can test against the various Qt bindings that tests both display/render and zoom/pan performance with pxMode on/off.

@dgoeries
Copy link
Contributor

dgoeries commented Feb 9, 2021

from pyqtgraph.Qt import QtGui, QtCore, QT_LIB
import numpy as np
import pyqtgraph as pg
from pyqtgraph.ptime import time

app = QtGui.QApplication([])

if QT_LIB == 'PySide':
    from ScatterPlotSpeedTestTemplate_pyside import Ui_Form
elif QT_LIB == 'PySide2':
    from ScatterPlotSpeedTestTemplate_pyside2 import Ui_Form
elif QT_LIB == 'PyQt5':
    from ScatterPlotSpeedTestTemplate_pyqt5 import Ui_Form
else:
    from ScatterPlotSpeedTestTemplate_pyqt import Ui_Form

win = QtGui.QWidget()
win.setWindowTitle('pyqtgraph example: ScatterPlotSpeedTest')
ui = Ui_Form()
ui.setupUi(win)
win.show()

points_brush = pg.mkBrush((51, 153, 255, 200))
pen = pg.mkPen(None)

p = ui.plot

x = np.arange(500)
y = np.random.random(size=(500))

ptr = 0
lastTime = time()
fps = None

curve = pg.ScatterPlotItem([], [], size=5, brush=points_brush,
                           pen=pen, pxMode=True)
p.addItem(curve)
p.setRange(xRange=[-500, 500], yRange=[-500, 500])


def update():
    global curve, ptr, p, lastTime, fps
    curve.setData(x, y)
    ptr += 1
    now = time()
    dt = now - lastTime
    lastTime = now
    if fps is None:
        fps = 1.0 / dt
    else:
        s = np.clip(dt * 3., 0, 1)
        fps = fps * (1 - s) + (1.0 / dt) * s
    p.setTitle('%0.2f fps' % fps)
    p.repaint()


timer = QtCore.QTimer()
timer.timeout.connect(update)
timer.start(0)

## Start Qt event loop unless running in interactive mode.
if __name__ == '__main__':
    import sys

    if (sys.flags.interactive != 1) or not hasattr(QtCore, 'PYQT_VERSION'):
        QtGui.QApplication.instance().exec_()


@danielhrisca
Copy link
Contributor Author

@dgoeries
I get ~130FPS with _USE_QRECT=True and ~195FPS with _USE_QRECT=False

@j9ac9k
Copy link
Member

j9ac9k commented Feb 9, 2021

We need a way to benchmark the key-press events to zoom in and pan...

@danielhrisca
Copy link
Contributor Author

We need a way to benchmark the key-press events to zoom in and pan...

It's in the same range but I use a custom culling on the data so I don't know if it is relevant

@lidstrom83
Copy link
Contributor

Hey @danielhrisca, would it be possible and easy enough to create a minimal working example of this performance regression?

@danielhrisca
Copy link
Contributor Author

Just follow the steps from above #1492 (comment)

@dgoeries
Copy link
Contributor

thanks for the benchmark, that is also my observation that _USE_QRECT=False is faster although I use Qt5

@lidstrom83
Copy link
Contributor

thanks for the benchmark, that is also my observation that _USE_QRECT=False is faster although I use Qt5

I find this too using this benchmark. My target benchmark when evaluating whether to keep this option in #1420 was panning and zooming. A quick and dirty way to profile this is to set PYQTGRAPHPROFILE=ScatterPlotItem.paint. I found a ~2x speedup for PyQt5 when _USE_QRECT=True, but that it's slower for PySide2. I assumed interaction happens much more frequently than calling setData (paint is called for every pixel of mouse movement during panning and zooming) which is why I preferred this benchmark.

@lidstrom83
Copy link
Contributor

FYI: I'm currently adding a panning and zooming benchmark to #1566 .

@j9ac9k
Copy link
Member

j9ac9k commented Feb 10, 2021

I merged the PR, @lidstrom83 I know I've said this elsewhere, but this is an amazing tool!

Is there a specific combination of parameters you'd like to see?

@lidstrom83
Copy link
Contributor

I haven't been able to measure the regression yet using the tool. I think _USE_RECT=True is probably a red herring since this was the approach taken before #1420 when the _USE_RECT=False approach didn't exist. I can't really say what parameters to look at because I don't know how the ScatterPlotItem is being used in the asammdf library. I tried digging into it, but I don't understand how to create a minimal working example. I'd ultimately like to add another benchmark to the tool that would demonstrate this regression, since it's clearly missing some important use-case.

I'm working on a hovering benchmark as well, but I don't think it'll be relevant for this issue.

@lidstrom83
Copy link
Contributor

Oh, never mind, I think I am able to measure it. Looking into it...

@lidstrom83
Copy link
Contributor

Mind giving #1569 a try? I accidentally introduced this regression near the end of #1420 without noticing. Sorry for the trouble.

@j9ac9k
Copy link
Member

j9ac9k commented Feb 11, 2021

Here is what the fps results look like, please keep in mind numbers were bouncing around:

Config 0.11.0 master #1569
PySide2 5.12 175 145 150
PySide2 5.12 _USE_QRECT NA 100 120
PySide2 5.15 170 155 160
PySide2 5.15 _USE_QRECT NA 107 130
PyQt5 5.12 107 76 80
PyQt5 5.12 _USE_QRECT NA 65 72
PyQt5 5.15 197 130 130
PyQt5 5.15 _USE_QRECT NA 120 155

To test 0.11.0; I modified the ScatterPlotSpeedTest.py file to effectively remove reference to _USE_QRECT

EDIT: Here is an updated chart with #1569 as merged:

Config 0.11.0 master #1569 #1569-as-merged
PySide2 5.12 175 145 150 230
PySide2 5.12 _USE_QRECT NA 100 120 180
PySide2 5.15 170 155 160 255
PySide2 5.15 _USE_QRECT NA 107 130 195
PyQt5 5.12 107 76 80 210
PyQt5 5.12 _USE_QRECT NA 65 72 155
PyQt5 5.15 197 130 130 220
PyQt5 5.15 _USE_QRECT NA 120 155 230

@j9ac9k
Copy link
Member

j9ac9k commented Feb 11, 2021

Here we've been jumping on @lidstrom83 but looks like #1383 is actually the culprit

@dgoeries
Copy link
Contributor

thanks for the test. Sad that the transparent background gives a drop. However, it does not seem to be the only reason.

@j9ac9k
Copy link
Member

j9ac9k commented Feb 11, 2021

thanks for the test. Sad that the transparent background gives a drop. However, it does not seem to be the only reason.

If you can identify another place we're taking a performance hit on scatter plot items, let me know, and I'll take a look!

@lidstrom83
Copy link
Contributor

However, it does not seem to be the only reason.

That's correct. The first commit in #1569 fixed a regression I introduced in #1420 for the default parameter choice in the benchmark tool. That would have also degraded performance between 0.11.0 and 0.11.1.

I'm not aware of any remaining performance issues, though I'd appreciate if people could post some numbers so I can be sure.

@j9ac9k
Copy link
Member

j9ac9k commented Feb 11, 2021

I'm not aware of any remaining performance issues, though I'd appreciate if people could post some numbers so I can be sure.

I've updated the table to showcase how things are as they stand; this was for "Reuse Item" mode; I wouldn't mind doing more comparisons

@j9ac9k
Copy link
Member

j9ac9k commented Feb 11, 2021

@danielhrisca Can you check how your tool behaves on master now?

@danielhrisca
Copy link
Contributor Author

pyqtgraph asammdf bench sscatterpolot bench
0.11.0 5.19s 245fps
master 8.75s 350fps

@j9ac9k
Copy link
Member

j9ac9k commented Feb 15, 2021

Hmm...So your benchmark is measuring something different from ours. I'm not sure what to do from here, if you can narrow down where you think pyqtgraph is behaving slower, I can certainly look into it more.

Something I did notice in our testing was that Qt 5.15 definitely outperformed Qt 5.12 bindings. You mentioned using 5.13 earlier, if feasible, I would evaluate migrating to 5.15 (I haven't tested benchmark performance on Qt6 yet).

@lidstrom83
Copy link
Contributor

@danielhrisca were you able to see a drop in performance with any of the parameter choices in the scatter plot benchmark tool? If there's anything I missed in that tool that you think might reveal the drop, let me know and I'll add it. Otherwise, are you certain that this is related to ScatterPlotItem?

@lidstrom83
Copy link
Contributor

lidstrom83 commented Feb 16, 2021

To rule out the other issues we've noticed recently, I wonder what happens if you set dynamicRangeLimit=None for the PlotDataItem and turn off auto range for the PlotItem.

@j9ac9k
Copy link
Member

j9ac9k commented May 27, 2021

@danielhrisca you may have thought I forgot about this, but I did not!

through the mail list, @ibrewster requested some input for trying to speed up a scatter plot with (what I consider to be) far too many points. After not having a fix, he noticed far better performance on 0.11.0. With some work on his behalf, we tracked down the commit causing the slow-down to be this guy. This commit aims to fix an upstream Qt issue; but, more importantly, we have a flag to disable it, which may be of reference.

In your usage of PlotDataItem, pass dynamicRangeLimit=None and see if you regain some performance.

@ibrewster
Copy link
Contributor

with (what I consider to be) far too many points

Oh, it's DEFINITELY far too many points. I tell my users that they need to limit the geographical region they are looking at to get good performance, but at the end of the day, the user will do what the user will do - and of course it's on me to deliver the best possible performance regardless! :)

@j9ac9k
Copy link
Member

j9ac9k commented May 27, 2021

Oh, it's DEFINITELY far too many points. I tell my users that they need to limit the geographical region they are looking at to get good performance, but at the end of the day, the user will do what the user will do - and of course it's on me to deliver the best possible performance regardless! :)

If only we could do away with our users, then everything would be just peachy!

@danielhrisca if ScatterPlotItem performance is important to you, I encourage you to chime in on this issue here on the pyside bug-tracker: https://bugreports.qt.io/browse/PYSIDE-1572

Right now, ScatterPlots generation is not vectorized, due to no ability to pass a block of data to Qt in such a way that Qt can iterate through on the C-level. If this feature is adopted, it should provide some significant performance boosts for scatter plots (on PySide6 bindings).

@danielhrisca
Copy link
Contributor Author

Hello,

I've tried setting dynamicRangeLimit=None with no real performance difference. The only thing that gives me a big speed-up is setting _USE_QRECT=False in ScatterPlotItem.py (almost 2x speed improvement); with this it gets close to 0.11.0 speed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high-priority performance Problems or questions related to speed, efficiency, etc.
Projects
None yet
Development

No branches or pull requests

6 participants