A few docs take orders of magnitude longer than others #3448

Closed
opened this Issue Jul 20, 2014 · 7 comments

Projects
None yet
5 participants
Member

kastnerkyle commented Jul 20, 2014

 ``````Various Agglomerative Clustering on a 2D embedding of digits SNIP ward : 96.95s average : 96.07s complete : 97.23s - time elapsed : 3.2e+02 sec `````` Is there any way we can reduce the time it takes to make this doc? It is way, way slower than the rest. For this plot, it is 320 seconds, versus approximately 90 seconds for the next worst plot (LARS image denoising... which is next on my list to look at). All others besides a few seem to be in the 2 to 10 second range.

Member

kastnerkyle commented Jul 20, 2014

 Fuel for the fire... ``````make html 2>&1 | tee log.log grep 'time elapsed' log.log | cut -d : -f 2 | cut -d ' ' -f 2 | cut -d 's' -f 1 > times.txt grep 'plot_' log.log | grep -v '\[' | grep -v example | grep -v File | cut -d ' ' -f 2 > names.txt `````` then run this script ``````import numpy as np import matplotlib.pyplot as plt times = np.loadtxt('times.txt') with open('names.txt') as f: names = [l.strip() for l in f.readlines()] sorted_indices = np.argsort(times) n, bins, patches = plt.hist(times, bins=100, color='steelblue') plt.annotate(names[sorted_indices[-1]], xy=(bins[-1], n[-1]), xytext=(-25, 25), xycoords='data', textcoords='offset points', arrowprops=dict(arrowstyle="->", connectionstyle="arc3,rad=0.3")) plt.title("Histogram of document/example times") plt.xlabel("Time (s)") plt.ylabel("Count") plt.figure() plt.plot(np.cumsum(np.sort(times)), color='steelblue') plt.title("Cumulative sum of document/example times") plt.xlabel("Test count") plt.ylabel("Time (s), total %i seconds" % times.sum()) plt.show() ``````

Member

arjoly commented Jul 20, 2014

 Waouw, 20 examples over 140 are taking around 1000s (71%) of the time.
Member

GaelVaroquaux commented Jul 23, 2014

 time elapsed : 3.2e+02 sec Interesting, it is much slower than on my box: it takes 90s on my box, which is more acceptable, but still slow. Is there any way we can reduce the time it takes to make this doc? Reducing it (for instance by reducing the number of point) will now enable to show the percolation behavior, and thus will reduce the difference between the various clustering algorithms. That said, 320s is too long. We might have to do something, if other people can confirm that it takes so long.
Member

kastnerkyle commented Jul 23, 2014

 Well my laptop is pretty old, but even 90s seems long to me :) . It would be intersting to see on another box if the distribution is still the same, or if that particular test is just a result of poor model optimization on old hardware.
Member

larsmans commented Jul 30, 2014

 Bumming the actual code is an option too ;)
Member

larsmans commented Oct 19, 2014

 I just checked and the example is spending ca. 100% of its time in `scipy.cluster`. Different SciPy versions can explain the timing differences. If we want to speed this stuff up, we have to fork `scipy.cluster`.

Member

amueller commented Oct 27, 2016

 takes <1s on master.