Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Axes instantiation is very slow #6664

Closed
Kojoley opened this issue Jun 28, 2016 · 34 comments
Closed

Axes instantiation is very slow #6664

Kojoley opened this issue Jun 28, 2016 · 34 comments

Comments

@Kojoley
Copy link
Member

Kojoley commented Jun 28, 2016

> python -V
Python 2.7.11
> pip freeze | grep matplotlib
matplotlib==1.5.1
> python -m platform -s "print platform()"
Windows-7-6.1.7601-SP2

I have discovered that my application spends a lot of time in Axes creation. Results of class instantiation profiling is below:

> python -m timeit -s "from matplotlib.figure import Figure; from matplotlib.axes import Axes; fig = Figure()" "Axes(fig, (0, 0, 1 ,1))"
10 loops, best of 3: 33.2 msec per loop

axes-compressor
Inside Axes.__init__ most of the execution time takes self.cla() call and about the half of it is
this loop below

        for name, spine in six.iteritems(self.spines):
            spine.cla()

After commenting this two lines I have got this:

> python -m timeit -s "from matplotlib.figure import Figure; from matplotlib.axes import Axes; fig = Figure()" "Axes(fig, (0, 0, 1 ,1))"
100 loops, best of 3: 17.9 msec per loop

So we have 33 ms per axes creation, and the half of this time it is used for spines creation that actually may be not used (in my case I have bunch of sharex axes and only one of them have spines).
I see a possible solution where creation of spines, scales (you can see creation of logarithmic scale, which I do not use too) and other stuff is delayed to the point when it actually needed/initialized.

And the fun part. For 20 created Axes we have 60115 calls to Rectangle.stale (3005 call per Axes instantiation)
image

@tacaswell tacaswell added this to the 2.1 (next point release) milestone Jun 28, 2016
@tacaswell
Copy link
Member

Any help improving the performance would be greatly appreciated!

It is a bit odd that the stale that gets clobbered is the one on rectangle. Can you tell which rectangle artists that is on?

In the orginal implementation of stale the the propagation was short circuited, but that lead to the figure auto-refresh getting stuck.

@Kojoley
Copy link
Member Author

Kojoley commented Jun 28, 2016

Here is the test and profiler diagram

import timeit
timeit.timeit(
    "Axes(fig, (0, 0, 1 ,1))"
    ,
    "from matplotlib.figure import Figure;" \
    "from matplotlib.axes import Axes;" \
    "fig = Figure();"
    ,
    number=20
)

diagram2-min

@efiring
Copy link
Member

efiring commented Jun 28, 2016

How did you generate that profiler diagram?

@Kojoley
Copy link
Member Author

Kojoley commented Jun 28, 2016

It is generated in PyCharm IDE which uses yFiles and yappi

@Kojoley
Copy link
Member Author

Kojoley commented Jun 28, 2016

The bottleneck is reset_ticks() call in the Axis.cla() call

After commenting it out I have got only 2 ms instead of 33 ms per Axes instantiation

100 loops, best of 3: 2.41 msec per loop

Call graph becomes much simpler too
diagram3-min

@anntzer
Copy link
Contributor

anntzer commented Jun 28, 2016

I think stale is a red herring, mostly there because of the overhead of using an instrumenting profiler (?). On the other hand reset_ticks is indeed a bottleneck, see http://vmprof.com/#/207c1fb01e6cedab167a82487d425da1 for the vmprof (statistical profiler) analysis of

from matplotlib.figure import Figure; from matplotlib.axes import Axes

for _ in range(1000):
    fig = Figure()
    Axes(fig, (0, 0, 1 ,1))

(matplotlib 2.0b1)

@WeatherGod
Copy link
Member

Wow... that is a very informative graph. There seems to be several
degeneracies going on here. First and foremost, I think it is obvious that
we are calling cla() a bit indiscriminately. This may partly be due to
multiple entry points that may require calls to cla(), but we probably
could cut back on it a bit. Second, each call to cla() does a lot of work
that may not be necessary. We might want to have a sentinel that indicates
whether the axis is still in a cleared state, thereby some calls to cla()
would short-circuit.

On Tue, Jun 28, 2016 at 6:45 PM, Antony Lee notifications@github.com
wrote:

I think stale is a red herring, mostly there because of the overhead of
using an instrumenting profiler (?). On the other hand reset_ticks is
indeed a bottleneck, see
http://vmprof.com/#/207c1fb01e6cedab167a82487d425da1 for the vmprof
(statistical profiler) analysis of

from matplotlib.figure import Figure; from matplotlib.axes import Axes

for _ in range(1000):
fig = Figure()
Axes(fig, (0, 0, 1 ,1))


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#6664 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AARy-Lju_LQYg4yakaWgElU0butJ-jn3ks5qQaP1gaJpZM4JAZZR
.

@efiring
Copy link
Member

efiring commented Jun 30, 2016

Every possible grid and tick line is being instantiated; in practice, most of them don't get used. Lazy instantiation would solve that, at the cost of a little more complexity.

@Kojoley
Copy link
Member Author

Kojoley commented Jun 30, 2016

On newly created axis we do not even have ticks, but here we have creation of single tick for the internal purposes. This should not be problem, but it is, because for single Axes we have multiple reset_ticks calls (in Axes.__init__, XAxis.cla, YAxis.cla). It is definitely some kind of recursion.
image

I have tried to figure out if it is possible to simplify reset_tick method by removing that tick creation (and entirely remove _lastNumMajorTicks and _lastNumMinorTicks variables). I have not found things that will prevent such optimization, result of this refactoring will be about 11 times performance boost to Axes instantiation (my application runs 30% faster after that) and cleaner code too.
But even after that optimization multiple reset_ticks will not gone. Yes them will be much less resource heavy than now, but it is still not good.

As this optimization will affect multiple places and include rewriting of get_ticks_position, get_minor_ticks, get_major_ticks, set_tick_params methods, I'm going to setup a test environment to develop a PR.

@larsoner
Copy link
Contributor

result of this refactoring will be about 11 times performance boost to Axes instantiation (my application runs 30% faster after that) and cleaner code too.
But even after that optimization multiple reset_ticks will not gone. Yes them will be much less resource heavy than now, but it is still not good.

It seems like using such changes to get a performance boost is still worth it, even if it's not the best solution. After all, these changes were proposed about 10 months ago, so that's 10 months where things could have been faster...

@efiring would you be okay with such a workaround in the meantime until lazy instantiation can be implemented?

@efiring
Copy link
Member

efiring commented Apr 28, 2017

@Kojoley, any more thoughts on this? Or did you run into a roadblock?

@QuLogic QuLogic mentioned this issue May 16, 2017
1 task
QuLogic added a commit to QuLogic/cartopy that referenced this issue May 23, 2017
We don't need any Spines since GeoAxes uses its own outline_patch to
draw the outline. We also don't need any Axis drawing, so override its
generation of a "proto-Tick". This is expensive [1] and it's not used to
draw any ticks anyway.

[1] matplotlib/matplotlib#6664
@QuLogic
Copy link
Member

QuLogic commented May 25, 2017

Somehow, Ticks are inexplicably tied with Spines. If I apply this patch to defer Tick creation till they're used:

diff --git a/lib/matplotlib/axis.py b/lib/matplotlib/axis.py
index 870c5e8..307b425 100644
--- a/lib/matplotlib/axis.py
+++ b/lib/matplotlib/axis.py
@@ -764,10 +764,8 @@ class Axis(artist.Artist):
         del self.majorTicks[:]
         del self.minorTicks[:]
 
-        self.majorTicks.extend([self._get_tick(major=True)])
-        self.minorTicks.extend([self._get_tick(major=False)])
-        self._lastNumMajorTicks = 1
-        self._lastNumMinorTicks = 1
+        self._lastNumMajorTicks = 0
+        self._lastNumMinorTicks = 0
 
     def set_tick_params(self, which='major', reset=False, **kw):
         """

then the first test that fails looks like this:
clip_path_clipping
which is really odd, as I'd have expected the ticks to be the ones that break.

@tacaswell tacaswell modified the milestones: 2.1 (next point release), 2.2 (next next feature release) Oct 3, 2017
@timhoffm
Copy link
Member

timhoffm commented Nov 7, 2017

What is the status here? I need to create plots with some tenths of axes and this issue results in instatiation times of seconds, which is a pain in interactive gui applications.

If there are some problems with the above solutions, one could use a LazyDefaultTickList that replaces itself with the default list on first use. This basically does only need changes in reset_ticks(). In a test, I could speed up Axes creation by a factor of 3 using this lazy tick list instanciation.

If there is interest, I can provide a pull request.

@efiring
Copy link
Member

efiring commented Nov 8, 2017

We have two overlapping PRs, #8626 and #8752, that will improve the situation quite a bit by eliminating most of the unnecessary cla calls. Based on timing in the former, that will give a speedup of a factor of 4. I hope we can get this functionality merged soon.

We would certainly be interested to see a PR with your proposal as well, which is orthogonal to the cla-elimination PRs so it should provide additional speedup in cases where ticks are not used at all.

@michalkahle
Copy link

I have run into the same problem with 384 subplots. Solved it with this hack (thanks @tacaswell ). Got down from 20 minutes to 2 seconds. Not sure if it helps anybody but I have measured that the time to produce figure grows exponentially(!) with the number of subplots.

image

@efiring
Copy link
Member

efiring commented Jan 5, 2018

I'm seeing linear, not exponential, growth in the time to create multiple subplots (with mpl very close to current master):

In [1]: import matplotlib.pyplot as plt

In [2]: %timeit plt.subplots(2, 6); plt.close()
342 ms ± 4.22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [3]: %timeit plt.subplots(2, 12); plt.close()
1.09 s ± 4.76 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [4]: %timeit plt.subplots(2, 24); plt.close()
2.2 s ± 6.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [5]: %timeit plt.subplots(4, 24); plt.close()
4.42 s ± 9.75 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [6]: %timeit plt.subplots(8, 24); plt.close()
8.98 s ± 53.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

@efiring
Copy link
Member

efiring commented Jan 5, 2018

@michalkahle, can you provide a minimal and self-contained test case that reproduces something like your curve?

@jklymak
Copy link
Member

jklymak commented Jan 5, 2018

2.0.2:

In [2]:  %timeit plt.subplots(2, 24); plt.close()
1.72 s ± 17.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [2]:  %timeit plt.subplots(4, 24); plt.close()
3.6 s ± 32.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [3]:  %timeit plt.subplots(8, 24); plt.close()
10.3 s ± 746 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In 2.1.1

In [4]:  %timeit plt.subplots(2, 24); plt.close()
1.28 s ± 139 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [2]: %timeit plt.subplots(4, 24); plt.close()
2.23 s ± 32.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [3]:  %timeit plt.subplots(8, 24); plt.close()
4.61 s ± 57.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

master is the same as 2.1.1 for this test...

My guess would be that @michalkahle is using 2.0.2?

@michalkahle
Copy link

Hi, my mpl is 2.1.0.

import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
from time import time

s = pd.Series(np.random.randn(10).cumsum(), index=np.arange(0, 10))

def test_grid(n, s):
    nrows = int(math.sqrt(n))
    ncols = math.ceil(n / nrows)
    fig, ax = plt.subplots(
        nrows=nrows, ncols=ncols, figsize=(24, 16), sharex=True, sharey=True, squeeze=False)
    for a in range(n):
        row, col = divmod(a, ncols)
        s.plot(ax=ax[row,col])
    plt.close()

results = []
for n in range(1, 150, 10):
    print(n)
    start = time()
    test_grid(n, s)
    results.append((n, time() - start))

pd.DataFrame(results, columns=['n', 'time [s]']).set_index('n').plot()

image

@jklymak
Copy link
Member

jklymak commented Jan 5, 2018

Its the sharex and sharey thats doing it. If I take those out:

figure_1

@efiring
Copy link
Member

efiring commented Jan 5, 2018

It's not so much the sharing, it's the way Pandas is handling the sharing. Removing Pandas and leaving the sharing speeds it up greatly. Removing the sharing speeds it up even more.
many_axes_plain

import matplotlib.pyplot as plt
import numpy as np
from time import time


def test_grid(n, share):
    start = time()
    nrows = int(np.sqrt(n))
    ncols = int(np.ceil(n / nrows))
    kw = dict(sharex=True, sharey=True) if share else {}
    fig, ax = plt.subplots(
        nrows=nrows, ncols=ncols, figsize=(24, 16), squeeze=False, **kw)
    for i in range(nrows):
        for j in range(ncols):
            ax[i, j].plot([1,2,3])
    plt.close()
    return nrows * ncols, time() - start

nums = list(range(1, 151, 10))
times = np.zeros((len(nums), 2), dtype=float)
naxes = np.zeros((len(nums),), dtype=int)
for i, share in enumerate((True, False)):
    for j, n in enumerate(nums):
        na, duration = test_grid(n, share)
        times[j, i] = duration
        naxes[j] = na
        print("%5s   %3d   %7.2f" % (share, na, duration))

fig, ax = plt.subplots()
lines = ax.plot(naxes, times)
ax.set_ylabel('time (s)')
ax.set_xlabel('number of Axes')
ax.legend(lines, ['shared', 'unshared'])
fig.savefig('many_axes_plain.png')

@michalkahle
Copy link

Yes, you are right.
image

array = np.random.randn(10).cumsum()
series = pd.Series(array)

def test_subplots(n, data):
    start = time()
    fig, ax = plt.subplots(nrows=1, ncols=n, sharex=True, sharey=True, squeeze=False)
    for i in range(n):
        if type(data) is pd.core.series.Series:
            data.plot(ax=ax[0,i])
        else:
            ax[0,i].plot(data)
    plt.close()
    return n, time() - start

@QuLogic
Copy link
Member

QuLogic commented Feb 5, 2018

So #8626 was applied and then reverted due to some issues with polar stuff. For 1-, 2-, and 10-square subplots, timings are 118.42ms, 374.95ms, and 3.91s, respectively. In the meantime, #10302 was merged, which drops time to 79.00ms, 284.09ms, and 2.90s, respectively. You can see that change in the benchmarks. master also includes #10327, which nets 77.55ms, 279.91ms, and 2.84s.

If we re-apply #8626, we could perhaps save another 2-4% at 77.03ms, 268.03ms, and 2.77s. But unfortunately, it's still breaking polar plots somehow.

@TomNicholas
Copy link

What is the status with this? I'm using matplotlib in a Jupyter notebook and it seems that plt.subplots() is taking longer than processing GBs of data is! Is there another reason why jupyter in particular would be very slow?

@timhoffm
Copy link
Member

timhoffm commented Jan 16, 2019

We have a speedup of a factor 4 by lazily instantiating ticks ( #9727 / #10302 ). However, the essential problems still persist. One is the tick handling, which handles each tick as a separate object. The second is the layered architecture and initialization / update mechanism. Both are hard to change.

@efiring
Copy link
Member

efiring commented Jan 16, 2019

There is certainly more work to do on this, but it is usually not a noticeable when working interactively unless you are making a plot with a large number of axes. Is this your use case, @TomNicholas? If not, I wonder whether you might be running into some other problem.

@rth
Copy link
Contributor

rth commented Jan 22, 2019

There is certainly more work to do on this, but it is usually not a noticeable when working interactively unless you are making a plot with a large number of axes.

Another place where this might have a significant impact is animations / interactive plot updates. Blitting is a workaround, but a user perspective it's sometimes easier to just redraw everything in which case axis instantiation time will be non-negligible.

@aasensio
Copy link

I'm finding the same slow behavior when doing a bunch of imshow in a figure in jupyter and I don't know if the issue belongs here or to jupyter. Something like this simple takes almost 18 s in my computer. The larger the image, the longer it takes:

f, ax = plt.subplots(nrows=3, ncols=3, figsize=(13,8))
for i in range(3):
    for j in range(3):
        ax[i,j].imshow(np.random.randn(100,100))

@Kojoley
Copy link
Member Author

Kojoley commented Feb 12, 2019

Another place where this might have a significant impact is animations / interactive plot updates. Blitting is a workaround, but a user perspective it's sometimes easier to just redraw everything in which case axis instantiation time will be non-negligible.

The original issue is about axes creation, redrawing is a different problem. There is RGBA<->BGRA conversion which takes a considerable amount of time if you make hundreds redraws per second. (I have a patch to make it faster but it actually three patches for every of GCC, Clang, and MSVC compilers because their optimizers and vectorizers are too far from being good).

The larger the image, the longer it takes

I tried your code and it takes 600ms for me.
The same comment as for the case above applies here. In other words the original issue was about plt.subplots call of your example. Axes instantiation time is not dependent on plot size, so you experience some other issue.

@timhoffm
Copy link
Member

@aasensio which matplotlib version? Also please time with

%%timeit -n1 -r1 f, ax = plt.subplots(nrows=3, ncols=3, figsize=(13,8))
for i in range(3):
    for j in range(3):
        ax[i,j].imshow(np.random.randn(100,100))

and

%%timeit -n1 -r1
f, ax = plt.subplots(nrows=3, ncols=3, figsize=(13,8))
for i in range(3):
    for j in range(3):
        ax[i,j].imshow(np.random.randn(100,100))

to see the which fraction of the time is covered by the subplot creation.

@aasensio
Copy link

@timhoffm My version of matplotlib is 3.0.2, running with notebook server 5.7.4. The timing results are 684 ms for the subplots creation only and 12.7 s when the imshows are added. I'm starting to suspect that it is somehow related to the server-client I/O in my system, which might be firewalled. I'll investigate on that.

@dstansby
Copy link
Member

dstansby commented Dec 30, 2021

I think the original issue is solved now - at least for me the performance bottlnecks mentioned at the top are not present in the currently main branch. I'm going to close this to avoid it becoming a general posting of performance issues, and recommend others open new and more specific issues if they notice any performance problems.

@Kojoley
Copy link
Member Author

Kojoley commented Dec 30, 2021

I would not say it is solved, but the situation is indeed improved, from 'very slow' it became just 'slow'. The #6664 (comment) implied that there could be 13-14x improvement, the #10302 gave 3-4x, so there still should be some bottlenecks. Also there were performance regressions since 2.2.5 totaling in 1.6x slowdown:

CPython matplotlib Axes creation
2.7 1.5.1 22ms
2.7 1.5.3 22.1ms
2.7 1.5.3 22.1ms
2.7 2.0.2 25.2ms
2.7 2.1.2 17.9ms
2.7 2.2.0 4.9ms
2.7 2.2.2 5.95ms
2.7 2.2.5 5.98ms
3.8 2.2.5 4.65ms
3.8 3.1.3 5.48ms
3.8 3.2.2 5.78ms
3.8 3.3.4 8.51ms
3.8 3.5.1 7.29ms
3.10 3.5.1 7.37ms

image

@jklymak
Copy link
Member

jklymak commented Dec 30, 2021

@Kojoley i think the point here is that it's pretty hard to track this in a really old issue. For sure, if we think there are current bottlenecks that could be fixed we should fix them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests