Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scatter Plot Improvements #1420

Merged
merged 57 commits into from Dec 16, 2020
Merged

Conversation

lidstrom83
Copy link
Contributor

@lidstrom83 lidstrom83 commented Oct 25, 2020

  • Fixes various bugs and performance issues.
  • Adds a hovering API to ScatterPlotItem which lets the user specify a separate style for the hovered points and show a tool tip containing information about them. A signal is also emitted during hovering.

Yields significant performance improvements when updating the scatter plot's options. See e.g. the plot hover example.
Speedups for ScatterPlotSpeedTest.py:
~50% without pxMode
~ 0% pxMode with useCache
~30% pxMode without useCache
@lidstrom83 lidstrom83 changed the title Efficient Scatter Plot Hovering Scatter Plot Optimizations Oct 25, 2020
@j9ac9k
Copy link
Member

j9ac9k commented Oct 25, 2020

Looks like one of the examples is hanging when running on pyside2/conda-forge instances. I recently merged #1302 which did some changes on how the test application is loaded, and how the test suite is ran in context of the examples. Can you replicate locally?

@lidstrom83
Copy link
Contributor Author

I noticed I used a format string in the example I changed and pushed a compatibility fix while the checks were still running. Could this have something to do with it?

@lidstrom83
Copy link
Contributor Author

Just found some more room for improvement. Should get a ~2x boost in pxMode in the next commit (~3x overall). It'll change a fair bit though, so should it be put in a separate PR?

@lidstrom83
Copy link
Contributor Author

I ended up pushing the extra commit as it turned out to be a pretty simple change after all.

@j9ac9k
Copy link
Member

j9ac9k commented Oct 26, 2020

Looks like those pipelines are hanging, I'll try to test out locally and see if I can repliace.

@j9ac9k
Copy link
Member

j9ac9k commented Oct 27, 2020

I can confirm the issue locally, I get a crash when running the Plotting.py test example...

pyqtgraph-pyside2-conda ❯ pytest -k "Plotting.py"
==================================================================== test session starts =====================================================================
platform darwin -- Python 3.7.7, pytest-5.4.3, py-1.9.0, pluggy-0.13.1
rootdir: /Users/ogi/Developer/pyqtgraph, inifile: pytest.ini
plugins: forked-1.2.0, cov-2.10.0, xdist-1.32.0
collected 156 items / 155 deselected / 1 skipped

examples/test_examples.py Timeout (0:00:15)!
Thread 0x000000010d9dddc0 (most recent call first):
  File "/Users/ogi/Developer/pyqtgraph/examples/test_examples.py", line 222 in testExamples
...

passing --full-trace argument, I get the following non-helpful information:

    """ % (import1, graphicsSystem, import2)
        if sys.platform.startswith('win'):
            process = subprocess.Popen([sys.executable],
                                        stdin=subprocess.PIPE,
                                        stderr=subprocess.PIPE,
                                        stdout=subprocess.PIPE)
        else:
            process = subprocess.Popen(['exec %s -i' % (sys.executable)],
                                       shell=True,
                                       stdin=subprocess.PIPE,
                                       stderr=subprocess.PIPE,
                                       stdout=subprocess.PIPE)
        process.stdin.write(code.encode('UTF-8'))
        process.stdin.close()
        output = ''
        fail = False
        while True:
            try:
>               c = process.stdout.read(1).decode()
E               KeyboardInterrupt

Running pytest without test_examples, I get a setfault on test_ScatterPlotItem.py

$ pytest -k "not test_examples.py" -v
...
pyqtgraph/graphicsItems/tests/test_ScatterPlotItem.py::test_scatterplotitem zsh: segmentation fault  pytest -k "not test_examples.py" -v

FWIW, ignoring the examples and test_ScatterPlotItem, everything runs..

$ pytest -k "not test_examples and not test_ScatterPlotItem" -v
...
======================================================= 83 passed, 6 skipped, 68 deselected in 17.22s ========================================================

These are always tricky to debug...

@lidstrom83
Copy link
Contributor Author

These are always tricky to debug...

Oh joy. Installing Miniconda now.

BTW, has a timeline been decided yet for when to drop support for Python2.7 and Qt4?

@j9ac9k
Copy link
Member

j9ac9k commented Oct 27, 2020

Oh joy. Installing Miniconda now.

I suggest looking here to see how we configure our conda environment and just replicate that: https://github.com/pyqtgraph/pyqtgraph/blob/master/azure-test-template.yml#L93-L113

BTW, has a timeline been decided yet for when to drop support for Python2.7 and Qt4?

I want to do it after the number of outstanding PRs gets to be 5-10ish... feel free to join our slack channel if you have some opinions on that regard.

pyqtgraph/functions.py Outdated Show resolved Hide resolved
@lidstrom83
Copy link
Contributor Author

I just ran the scatter plot speed test at various stages throughout this PR and realized the pxMode=True gains from various commits just bring us back to roughly the original performance; serializing objects rather than using their id as the key in SymbolAtlas.getSymbolCoords was initially a big performance hit, and I was accidentally comparing to speeds after doing this. ~Doubling of the speed test for pxMode=False still appears to be correct. Apologies for overselling this.

@j9ac9k
Copy link
Member

j9ac9k commented Nov 25, 2020

I just ran the scatter plot speed test at various stages throughout this PR and realized the pxMode=True gains from various commits just bring us back to roughly the original performance; serializing objects rather than using their id as the key in SymbolAtlas.getSymbolCoords was initially a big performance hit, and I was accidentally comparing to speeds after doing this. ~Doubling of the speed test for pxMode=False still appears to be correct. Apologies for overselling this.

Thanks for following up on this. Would say this issue still resolves #1020 ?

@j9ac9k
Copy link
Member

j9ac9k commented Nov 25, 2020

Also, can you explain the motivation behind serializing the objects and storing the serialized representation vs. just storing the objects themselves in the cache dictionary?

@lidstrom83
Copy link
Contributor Author

Thanks for following up on this. Would say this issue still resolves #1020 ?

Yes, this issue is sill resolved.

Also, can you explain the motivation behind serializing the objects and storing the serialized representation vs. just storing the objects themselves in the cache dictionary?

Initially I noticed that in the hovering example I added, SymbolAtlas.buildAtlas was getting called every time ScatterPlotItem.updateSpots was called, and that SymbolAtlas.symbolMap was growing without bound. The cache was missing when it shouldn't in part due to recreating new default pen and brush objects during each update. I decided to use serializations of pens and brushes in the symbolMap lookup key rather than their id to avoid this problem.

I did consider instead to just avoid repeatedly recreating the default pen and brush, but realized taking this kind of care to avoid cache misses would extend to the user as well. For example, in the hover example I would have had to be careful to reuse the hovering QPen object during each update. Additionally, the SpotItem.setPen method would need to be changed to avoid creating a new QPen object. The approach I took is more forgiving to the user and avoids having to change a bunch of methods and the API. And the initially large performance hit was eventually recovered from by caching the serialized Qt objects while looping over the spots in SymbolAtlas.getSymbolCoords.

For the other side of the trade-off between caching approaches: keeping all these changes but switching back to keying by id appears to boost the speed test by ~10% with pxMode=True.

@j9ac9k
Copy link
Member

j9ac9k commented Nov 26, 2020

I would have never thought to serialize the objects like that!

I'm not suggesting you go undo the work, but if the problem with the cache hits involve multiple calls to mk<whatever>() functions return objects with different IDs for the same input, would it not be easier to get around the cache hit issue by using the @lru_cache decorator on those functions? Not only would the return of those functions be significantly faster, but they should be returning objects with the same ID as well.

@lidstrom83
Copy link
Contributor Author

I'm not suggesting you go undo the work, but if the problem with the cache hits involve multiple calls to function.mk<whatever>() return objects with different IDs for the same input, would it not be easier to get around the cache hit issue by using the @lru_cache decorator on those functions?

During this PR I played around with this library's LRUCache after discovering Python 2.7 doesn't have one. I think I was trying to cache the serialized Qt objects. I didn't think to cache these functions though, and would've been scared to mess with them considering how much code they touch.

Thinking more about the serialized keying: if every spot's pen or brush is different then the caching of serialized pens and brushes doesn't help alleviate the performance hit with this approach. This makes me less comfortable with the trade-off.

I'm going to try out your idea now.

@j9ac9k
Copy link
Member

j9ac9k commented Nov 26, 2020

I can't think of any harm of wrapping mkPen, mkBrush and so on with the @lru_cache decorator (other than slightly increased memory usage depending on the size of the cache we allocate); I never really thought that they were being hit that often so never thought that the library should cache them, but for profiling purposes we should definitely explore it.

Regarding python 2.7 compatibility; ...we'll worry about that later 😆

@lidstrom83
Copy link
Contributor Author

TypeError: unhashable type: 'QPen'

Right... We need hashable arguments when using @lru_cache. Often e.g. mkPen is called on an existing QPen object. We could just return QPen in this case.

@lidstrom83
Copy link
Contributor Author

This also resolves #421 as well now.

@lidstrom83
Copy link
Contributor Author

And #420.

@j9ac9k
Copy link
Member

j9ac9k commented Dec 15, 2020

geez, leave some issues for the rest of us 👍

@j9ac9k
Copy link
Member

j9ac9k commented Dec 15, 2020

Something's not quite right:

python examples/ScatterPlot.py

From current master:

image

From scatter-performance

image

@lidstrom83
Copy link
Contributor Author

Thanks for the screenshots. I can't seem to reproduce this. I've tried on my Linux laptop with Python 3.8 using both PyQt5 and PySide2, with and without an external monitor plugged into it.

@j9ac9k
Copy link
Member

j9ac9k commented Dec 15, 2020

Running git bisect on your branch pointed the first commit with the issue being c6c4043

@j9ac9k
Copy link
Member

j9ac9k commented Dec 15, 2020

I can confirm on macOS the scatter plots behave well, as does the exporter, nice work!! I'll try to break this a bit more a little bit later today.

@lidstrom83
Copy link
Contributor Author

#915 is also related.

I treated the missing invalidate as a bug in order to match the behaviour when pxMode=True, and because the user can specify update=False in the setters.

Regarding not emitting sigPlotChanged, that seemed odd to me too but I also worry about breaking users' code by adding it. Similar to this suggestion, I implemented an alternative API with __getitem__, __setitem__ and setOpts methods where I emit this signal. I've since reverted it, but now that this PR has become monolithic, I could add it back in order to fully resolve the issue.

@lidstrom83
Copy link
Contributor Author

And another: #1118. I haven't implemented a hovering API in PlotDataItem, but that would be fairly easy to do so now. It would just pass the hovering opts hoverable, hoverPen, hoverBrush, hoverSize, hoverSymbol, and tipalong to the underlying PlotCurveItem.

@lidstrom83
Copy link
Contributor Author

Does anyone have any thoughts about deprecating the dataSet and mask arguments in the setters? The former seems to be a foot gun - it silently fails unless dataSet is a view of self.data - and the latter is just doing e.g. dataSet['pen'] = pens[mask], so why wouldn't the user just pass in pens[mask] in the first place? I could replace these with an index argument which could be a slice, mask, or array of indices.

@j9ac9k
Copy link
Member

j9ac9k commented Dec 16, 2020

And another: #1118. I haven't implemented a hovering API in PlotDataItem, but that would be fairly easy to do so now. It would just pass the hovering opts hoverable, hoverPen, hoverBrush, hoverSize, hoverSymbol, and tipalong to the underlying PlotCurveItem.

I think this would be great, but I don't think this has to be part of this PR, as I anticipate merging shortly; ...

Regarding not emitting sigPlotChanged, that seemed odd to me too but I also worry about breaking users' code by adding it. Similar to this suggestion, I implemented an alternative API with getitem, setitem and setOpts methods where I emit this signal. I've since reverted it, but now that this PR has become monolithic, I could add it back in order to fully resolve the issue.

I think we should add this in, but like the previous comment, doesn't have to be part of this specific PR, but another one that addresses this specific issue. If there is any kind of potential API breakage, having it on its own PR will make it easier to track down later should there be an issue made in the future.

@j9ac9k
Copy link
Member

j9ac9k commented Dec 16, 2020

This is a monster PR, thanks for your work on this @lidstrom83 now I have to go through and identify which issues/existing PRs I can close as a result 😆 merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants