A few docs take orders of magnitude longer than others #3448

kastnerkyle · 2014-07-20T10:11:09Z

Various Agglomerative Clustering on a 2D embedding of digits
SNIP
ward : 96.95s
average : 96.07s
complete : 97.23s
 - time elapsed : 3.2e+02 sec

Is there any way we can reduce the time it takes to make this doc? It is way, way slower than the rest. For this plot, it is 320 seconds, versus approximately 90 seconds for the next worst plot (LARS image denoising... which is next on my list to look at). All others besides a few seem to be in the 2 to 10 second range.

The text was updated successfully, but these errors were encountered:

kastnerkyle · 2014-07-20T11:35:09Z

Fuel for the fire...

make html 2>&1 | tee log.log
grep 'time elapsed' log.log | cut -d : -f 2 | cut -d ' ' -f 2 | cut -d 's' -f 1 > times.txt
grep 'plot_' log.log | grep -v '\[' | grep -v example | grep -v File | cut -d ' ' -f 2 > names.txt

then run this script

import numpy as np
import matplotlib.pyplot as plt

times = np.loadtxt('times.txt')
with open('names.txt') as f:
    names = [l.strip() for l in f.readlines()]
sorted_indices = np.argsort(times)
n, bins, patches = plt.hist(times, bins=100, color='steelblue')
plt.annotate(names[sorted_indices[-1]],
             xy=(bins[-1], n[-1]),
             xytext=(-25, 25),
             xycoords='data',
             textcoords='offset points',
             arrowprops=dict(arrowstyle="->",
                             connectionstyle="arc3,rad=0.3"))
plt.title("Histogram of document/example times")
plt.xlabel("Time (s)")
plt.ylabel("Count")
plt.figure()
plt.plot(np.cumsum(np.sort(times)), color='steelblue')
plt.title("Cumulative sum of document/example times")
plt.xlabel("Test count")
plt.ylabel("Time (s), total %i seconds" % times.sum())
plt.show()

arjoly · 2014-07-20T12:58:03Z

Waouw, 20 examples over 140 are taking around 1000s (71%) of the time.

GaelVaroquaux · 2014-07-23T15:31:02Z

time elapsed : 3.2e+02 sec

Interesting, it is much slower than on my box: it takes 90s on my box,
which is more acceptable, but still slow.

Is there any way we can reduce the time it takes to make this doc?

Reducing it (for instance by reducing the number of point) will now
enable to show the percolation behavior, and thus will reduce the
difference between the various clustering algorithms.

That said, 320s is too long. We might have to do something, if other
people can confirm that it takes so long.

kastnerkyle · 2014-07-23T15:35:23Z

Well my laptop is pretty old, but even 90s seems long to me :) . It would be intersting to see on another box if the distribution is still the same, or if that particular test is just a result of poor model optimization on old hardware.

larsmans · 2014-07-30T08:52:31Z

Bumming the actual code is an option too ;)

larsmans · 2014-10-19T15:43:56Z

I just checked and the example is spending ca. 100% of its time in scipy.cluster. Different SciPy versions can explain the timing differences. If we want to speed this stuff up, we have to fork scipy.cluster.

amueller · 2016-10-27T21:02:58Z

takes <1s on master.

kastnerkyle added Documentation labels Jul 20, 2014

kastnerkyle changed the title ~~Agglomerative clustering doc takes orders of magnitude longer than others~~ A few docs take orders of magnitude longer than others Jul 20, 2014

amueller added the Need Contributor label Oct 27, 2016

amueller closed this as completed Oct 27, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A few docs take orders of magnitude longer than others #3448

A few docs take orders of magnitude longer than others #3448

kastnerkyle commented Jul 20, 2014

kastnerkyle commented Jul 20, 2014

arjoly commented Jul 20, 2014

GaelVaroquaux commented Jul 23, 2014

kastnerkyle commented Jul 23, 2014

larsmans commented Jul 30, 2014

larsmans commented Oct 19, 2014

amueller commented Oct 27, 2016

A few docs take orders of magnitude longer than others #3448

A few docs take orders of magnitude longer than others #3448

Comments

kastnerkyle commented Jul 20, 2014

kastnerkyle commented Jul 20, 2014

arjoly commented Jul 20, 2014

GaelVaroquaux commented Jul 23, 2014

kastnerkyle commented Jul 23, 2014

larsmans commented Jul 30, 2014

larsmans commented Oct 19, 2014

amueller commented Oct 27, 2016