Various Agglomerative Clustering on a 2D embedding of digits
ward : 96.95s
average : 96.07s
complete : 97.23s
- time elapsed : 3.2e+02 sec
Is there any way we can reduce the time it takes to make this doc? It is way, way slower than the rest. For this plot, it is 320 seconds, versus approximately 90 seconds for the next worst plot (LARS image denoising... which is next on my list to look at). All others besides a few seem to be in the 2 to 10 second range.
Fuel for the fire...
make html 2>&1 | tee log.log
grep 'time elapsed' log.log | cut -d : -f 2 | cut -d ' ' -f 2 | cut -d 's' -f 1 > times.txt
grep 'plot_' log.log | grep -v '\[' | grep -v example | grep -v File | cut -d ' ' -f 2 > names.txt
then run this script
import numpy as np
import matplotlib.pyplot as plt
times = np.loadtxt('times.txt')
with open('names.txt') as f:
names = [l.strip() for l in f.readlines()]
sorted_indices = np.argsort(times)
n, bins, patches = plt.hist(times, bins=100, color='steelblue')
plt.title("Histogram of document/example times")
plt.title("Cumulative sum of document/example times")
plt.ylabel("Time (s), total %i seconds" % times.sum())
Waouw, 20 examples over 140 are taking around 1000s (71%) of the time.
Well my laptop is pretty old, but even 90s seems long to me :) . It would be intersting to see on another box if the distribution is still the same, or if that particular test is just a result of poor model optimization on old hardware.
Bumming the actual code is an option too ;)
I just checked and the example is spending ca. 100% of its time in scipy.cluster. Different SciPy versions can explain the timing differences. If we want to speed this stuff up, we have to fork scipy.cluster.