Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor of axis.py/ticking should have large performance gains #5665

Open
mdboom opened this issue Dec 11, 2015 · 25 comments
Open

Refactor of axis.py/ticking should have large performance gains #5665

mdboom opened this issue Dec 11, 2015 · 25 comments
Labels
Difficulty: Medium https://matplotlib.org/devdocs/devel/contribute.html#good-first-issues keep Items to be ignored by the “Stale” Github Action Performance topic: ticks axis labels

Comments

@mdboom
Copy link
Member

mdboom commented Dec 11, 2015

Right now, each Axis has a two lists of Tick objects (one major, one minor), each of which has 3 Line objects (for left tick, right tick and gridline) and 2 Text objects (for main and secondary label). Constructing Line objects is rather heavy and expensive. Refactoring this code so that each Axis had 3 LineCollection objects instead where only the points of the ticks needs to be updated, sharing all of the line style information, should be much faster.

On the following simple benchmark:

plt.subplots(8, 8)
plt.savefig('test.png')

25% of the time is spent constructing the Line objects. The suggested change above should reduce the number of Line objects from 3*N (where N is the number of ticks, usually a dozen or so) to 3.

@mdboom mdboom added the Difficulty: Medium https://matplotlib.org/devdocs/devel/contribute.html#good-first-issues label Dec 11, 2015
@Tillsten
Copy link
Contributor

I think i have a prototype doing that, if no one works on this before i will have a look after christmas. The whole axis code seems quite convoluted for what it does.

@timhoffm
Copy link
Member

This looks like an interesting topic with some performance gain and not too difficult (once we've figured out the right structure). 😄
I may embark on the topic. Some first thoughts:

Current implementation:

Axis
- major: Ticker
- minor: Ticker
- majorticks: List[Tick]
- minorticks: List[Tick]

Ticker
- locator: Locator
- formatter: Formatter

Tick
- tick1line: Line
- tick2line: Line
- gridline: Line
- label1: Text
- label2: Text

Possible alternative implementation (first try):

Axis
- majorticks: TickList
- minorticks: TickList

TickList
- locator: Locator
- formatter: Formatter
- ticks: LineCollection
- secondary_ticks: LineCollection
- gridlines: LineCollection
- ticklabels: List[Text]
- secondary_ticklabels: List[Text]

This feels significantly better and should be doable with reasonable backward compatibility (full compatibility may be difficult or at least require some extra work, if users really descend to the Artists of individual ticks). I'm not sure if and how much incompatibility we want to allow here.

Structuring by major/minor and then by primary/secondary may not be the best option. One could swap the order so that

Axis
- ticks: AxisTicks
- secondary_ticks: AxisTicks

AxisTicks
- major: LineCollection
- minor: LineCollection
...

This would potentially allow different ticks on the secondary axis. Also it could make it simpler to handle interactions between minor and major ticks (e.g. leave out minor ticks at the position of major ticks).

@tacaswell tacaswell added this to the v3.0 milestone Mar 28, 2018
@anntzer
Copy link
Contributor

anntzer commented Mar 28, 2018

@timhoffm I had started giving a try to the idea (not pursuing it actively now), per https://github.com/anntzer/matplotlib/tree/_wip/newaxis (taking an approach of "delete everything and reimplement stuff on top of it"). Feel free to scavenge stuff from that.

A few random notes:

  • LineCollections don't support markers, so if you use them for ticks each tick has to become a line itself, rather than being represented by a marker. Conversely, using a single line object to represent all ticks (what I've done right now) makes controlling properties of individual ticks close to impossible.
  • I think that it'll be very difficult to maintain backcompat on a project like that; however, a possibility (noted in this week's call) would be to have a new, separate Axes class that uses these new Axises (while also maintaining the user-facing plotting methods); one could then choose to use the new Axes class by (ab)using the projection system.

@jklymak
Copy link
Member

jklymak commented Mar 28, 2018

This was discussed a bit during the call this week.

In working on #10841, I found it pretty strange that the formatter only operated on one tick at a time, and the formatter ended up looping through all the ticks on the first tick and then caching the labels for all the ticks. So, a reorganization like this makes a lot of sense to me - I think there are more use cases where you want to treat the ticks as a list rather than being able to do something (?) to individual ticks.

@anntzer
Copy link
Contributor

anntzer commented Mar 28, 2018

#5804 is possibly relevant re: formatter work too.

@efiring
Copy link
Member

efiring commented Mar 28, 2018

I suspect the need to manipulate individual ticks is extremely rare. It's possible that the tick handling pre-dated the introduction of collections.

@anntzer
Copy link
Contributor

anntzer commented Mar 28, 2018

One case (not that I use it, but doesn't sounds too crazy) is https://matplotlib.org/examples/api/skewt.html, where you can see that for some of the slanted gridlines, the associated tick is not visible.

@timhoffm
Copy link
Member

timhoffm commented Mar 28, 2018

@anntzer Thanks for the example. It helps to have an idea what could all be needed. This needed a special implementation for XTick. The same solution for a future TickCollection (or whatever it's called) will probably need a special subclass of that.

@ all: Generally, can we agree that in the default implementation all ticks of a kind (major/minor) will have the same properties? It's still always possible to tell the locator to leave out certain positions and draw something custom there yourself.

@anntzer
Copy link
Contributor

anntzer commented Mar 28, 2018

I just realized that that specific example could in fact be made to work by clipping the x-ticks to a bounding box that bounds the horizontal direction but not the vertical one -- perhaps a better idea than manipulating tick visibility...).

@tacaswell
Copy link
Member

I suspect we will need to aggressively shim the back compatibility on anything that is public API (and maybe even some private stuff). Users are extremely clever and find all the corner cases.

separating the primary/secondary ticks is probably a good idea as that makes in easy(ier) to put different locators and tickers on each of them.

It is probably worth going through the axis_grid implementation of ticks and axis objects carefully to understand what it does differently, what of it we want to pull up to mainline, and what lessons we should learn.

@anntzer
Copy link
Contributor

anntzer commented Mar 28, 2018

... actually playing with the clipbox won't work in the example above as labels will not be cropped out properly (they may only be half-clipped).

@tacaswell If we move stuff to a separate "projection" (as proposed above) the new implementation would be strictly opt-in. It doesn't mean we should gratuitiously break things but given that the current proposals don't even really include the notion of "individual" ticks (everything goes into a Collection object), any shimming is going to be tricky too...

@tacaswell
Copy link
Member

I responded not having read the rest of the discussion (because I did not reload my browser), sorry.

Putting in it a separate projection is a good way to make progress and side-step back compatibility issues, but I worry that we will end up with a third full independent implementation of ticks and friends in the code base.

@dopplershift
Copy link
Contributor

As the author of the SkewT example above (and maintaining a proper implementation in MetPy), I'm all for making ticks more sensible. The implementation of the SkewT in matplotlib as it stands now took a lot of trial and error and discovery of weird things. The fact that it takes implementing the entire stack (XAxis, XTick, XSpine) just to clip out of range ticks is nuts.

So I'm happy to lend my experience with that use case. I also want to make sure I'm aware of any potential breakage, since it turns out this plot is probably the most popular feature of MetPy--which is actually a testament to how much better matplotlib's plots are than meteorology's other tools.

@ImportanceOfBeingErnest
Copy link
Member

In general wouldn't it make sense to first solve the existing problems with the ticks? Like the one that ticks are in general up to two pixels off from their position. Reading much of what is going on here makes me fear that the final matplotlib version that is still usable with python2 will be a version that has scatter markers and ticks at wrong positions.

Concerning the proposal here:

  • I think it makes sense to first define some goal. How much performance gain would be needed? One may then evaluate different approaches according to the goal and discard others. I think a little bit of project management basics can't hurt in such cases.
  • It is useful to have individual ticks customizable. However, I would solve that by not restricting the number of tickers (or whatever you call it) to two (major/minor) but leave that open. If you want a custom tick somewhere, create a new ticker (possibly with only one single tick position), entangle it with the axis, done.
  • I'm totally in favor of allowing asynchronous labeling of the two sides to the axes. Currently to have two different tickers on a single plot you need to create a complete new axes (twinx) just to have some other ticklabels on the right side of the plot. This would also be easily achievable with a third ticker which has some property "right" associated with it.

@anntzer
Copy link
Contributor

anntzer commented Mar 28, 2018

@dopplershift What do you think of the reimplementation of skewT at #10088? (I think that the skewT implementation is actually independent of the rest of the changes to tick1On & friends, i.e. even if the skewT axes doesn't touch tick1On, just by setting the visibility that should be enough to hide the ticks).

@ImportanceOfBeingErnest Without having thought much about it, I like the idea of letting users just add more tickers... As you mention, that also helps with the left/right problem.

I think you're referring to #7341 for tick positioning? Unfortunately that's a quite tricky problem to solve. mplcairo has the ability to draw ticks "exactly" were they should be but as it turns out that looks pretty bad: that's because ticks end up not aligned at pixel boundaries, get antialiased, and thus quite blurry (for some reason, blurry short lines look particularly bad). So pixel-snapping of ticks is more or less necessary. The problem is made worse by the fact that ticks are typically ~1px wide, but many lines are ~2px wide and they get snapped to cover two consecutive pixels -- so they look off-center relative to the ticks. I think (can relatively easily play with that in mplcairo) that the best solution is to snap 1 (and other odd)-px-wide lines (including ticks) to centered at full pixels and let 2 (and other even)-px-wide lines get antialiased on both sides by 0.5px.

@dopplershift
Copy link
Contributor

@anntzer Short answer: 👍 (We can continue the discussion over there.)

@timhoffm
Copy link
Member

I see this topic is much more complex than exchanging a list of Lines for a LineCollection 😃. I'm still willing to work on this, but it's probably mid-term effort.

I agree with @ImportanceOfBeingErnest that some goals should be set and a good planning is essential so that we don't end with yet another axis/ticks implementation. Not exactly knowing what a MEP is, but this feels like it could become one.

As for the goal, my primary goal would be to create a cleaner, easier to use and more flexible implementation. If included in the design and done right, performance will probably be faster (or can be made more easily by tuning the relevant parts). However, I would not see performance as the first goal and I wouldn't commit to a fixed desired percentage gain.

@jklymak
Copy link
Member

jklymak commented Mar 29, 2018

https://matplotlib.org/devel/MEP/index.html But I'm not clear on the process. I added MEP process as an agenda item for the call on Monday https://paper.dropbox.com/doc/Matplotlib-meeting-agenda-aAmENlkgepgsMeDZtlsYu

@timhoffm
Copy link
Member

timhoffm commented Apr 1, 2018

At what time is the call on Monday? See if I can join.

@jklymak
Copy link
Member

jklymak commented Apr 1, 2018

Oops added to the agenda. 15:00 eastern us time.

@jklymak
Copy link
Member

jklymak commented Apr 8, 2018

See the two issues above for other relevant discussion: #10881 #10874

@anntzer
Copy link
Contributor

anntzer commented Apr 16, 2018

Just dropping an idea here (actually may be worth moving this to a design document, or whatnot): perhaps spines could also get reparented to (and "managed by", whatever that means) the axis objects, so that e.g. one would do ax.xaxis.spine1/2 instead of ax.spines["top"]/["bottom"]? (Yes, 1/2 is not optimal, but that's already what we have for ticks and labels, so at least it's consistent.)

@timhoffm
Copy link
Member

timhoffm commented Jul 6, 2021

I actually think there is a way forward migrating to collections while maintaining full backward compatibility including the ability to style individual ticks. Here is how:

  1. Define a facade for ticks, ticklabels and lines. This fascade has methods that aggreate the access to the individual ticks so that you can write
    self._major_ticks.apply_params(**kwtrans)
    
    instead of looping through the individual ticks
    for tick in self.majorTicks:
         tick._apply_params(**kwtrans)
    
  2. Replace all internal loops over ticks by these aggregate functions.
  3. Now, that we have isolated the internal tick representation from the usage in the library, we can switch to using more efficient collections than having lists of single ticks.
  4. To restore backward compatibility, e.g. styling individual ticks: If a user tries to access individual ticks, convert the Collection-based internal representation back to our old single-tick-based representation and from now on work with that.

Advantages:

  • Standard use-cases will benefit from the improved design. Possibly faster, and better API.
  • We can keep full backward compatibility. The figures using fancy tick modifications will just not benefit from any performance improvements of collections.
  • The facade will result in a cleaner more high-level internal working with ticks.

Downsides:

  • We'll have to maintain two data structures for tick representations, the new collection based, and the old tick-based. And we have to be able to convert from the first to the second. But given that Tick objects themselves are not that complicated, that seems manageable.

I have a basic prove of concept for 1+2 at https://github.com/timhoffm/matplotlib/tree/tick-refactor (might need a rebase). I don't have the capacity to move this forward in any foreseeable future. Anybody is welcome to pick this up.

Copy link

This issue has been marked "inactive" because it has been 365 days since the last comment. If this issue is still present in recent Matplotlib releases, or the feature request is still wanted, please leave a comment and this label will be removed. If there are no updates in another 30 days, this issue will be automatically closed, but you are free to re-open or create a new issue if needed. We value issue reports, and this procedure is meant to help us resurface and prioritize issues that have not been addressed yet, not make them disappear. Thanks for your help!

@github-actions github-actions bot added the status: inactive Marked by the “Stale” Github Action label Jan 29, 2024
@timhoffm
Copy link
Member

#5665 (comment) Still holds

@timhoffm timhoffm added keep Items to be ignored by the “Stale” Github Action and removed status: inactive Marked by the “Stale” Github Action labels Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Difficulty: Medium https://matplotlib.org/devdocs/devel/contribute.html#good-first-issues keep Items to be ignored by the “Stale” Github Action Performance topic: ticks axis labels
Projects
None yet
Development

No branches or pull requests

10 participants