Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A few docs take orders of magnitude longer than others #3448

Closed
kastnerkyle opened this issue Jul 20, 2014 · 7 comments
Closed

A few docs take orders of magnitude longer than others #3448

kastnerkyle opened this issue Jul 20, 2014 · 7 comments
Labels
Documentation Easy Well-defined and straightforward way to resolve

Comments

@kastnerkyle
Copy link
Member

Various Agglomerative Clustering on a 2D embedding of digits
SNIP
ward : 96.95s
average : 96.07s
complete : 97.23s
 - time elapsed : 3.2e+02 sec

Is there any way we can reduce the time it takes to make this doc? It is way, way slower than the rest. For this plot, it is 320 seconds, versus approximately 90 seconds for the next worst plot (LARS image denoising... which is next on my list to look at). All others besides a few seem to be in the 2 to 10 second range.

@kastnerkyle
Copy link
Member Author

Fuel for the fire...
example_time_hist
cumulative_sum_times

make html 2>&1 | tee log.log
grep 'time elapsed' log.log | cut -d : -f 2 | cut -d ' ' -f 2 | cut -d 's' -f 1 > times.txt
grep 'plot_' log.log | grep -v '\[' | grep -v example | grep -v File | cut -d ' ' -f 2 > names.txt

then run this script

import numpy as np
import matplotlib.pyplot as plt

times = np.loadtxt('times.txt')
with open('names.txt') as f:
    names = [l.strip() for l in f.readlines()]
sorted_indices = np.argsort(times)
n, bins, patches = plt.hist(times, bins=100, color='steelblue')
plt.annotate(names[sorted_indices[-1]],
             xy=(bins[-1], n[-1]),
             xytext=(-25, 25),
             xycoords='data',
             textcoords='offset points',
             arrowprops=dict(arrowstyle="->",
                             connectionstyle="arc3,rad=0.3"))
plt.title("Histogram of document/example times")
plt.xlabel("Time (s)")
plt.ylabel("Count")
plt.figure()
plt.plot(np.cumsum(np.sort(times)), color='steelblue')
plt.title("Cumulative sum of document/example times")
plt.xlabel("Test count")
plt.ylabel("Time (s), total %i seconds" % times.sum())
plt.show()

@kastnerkyle kastnerkyle changed the title Agglomerative clustering doc takes orders of magnitude longer than others A few docs take orders of magnitude longer than others Jul 20, 2014
@arjoly
Copy link
Member

arjoly commented Jul 20, 2014

Waouw, 20 examples over 140 are taking around 1000s (71%) of the time.

@GaelVaroquaux
Copy link
Member

  • time elapsed : 3.2e+02 sec

Interesting, it is much slower than on my box: it takes 90s on my box,
which is more acceptable, but still slow.

Is there any way we can reduce the time it takes to make this doc?

Reducing it (for instance by reducing the number of point) will now
enable to show the percolation behavior, and thus will reduce the
difference between the various clustering algorithms.

That said, 320s is too long. We might have to do something, if other
people can confirm that it takes so long.

@kastnerkyle
Copy link
Member Author

Well my laptop is pretty old, but even 90s seems long to me :) . It would be intersting to see on another box if the distribution is still the same, or if that particular test is just a result of poor model optimization on old hardware.

@larsmans
Copy link
Member

Bumming the actual code is an option too ;)

@larsmans
Copy link
Member

I just checked and the example is spending ca. 100% of its time in scipy.cluster. Different SciPy versions can explain the timing differences. If we want to speed this stuff up, we have to fork scipy.cluster.

@amueller
Copy link
Member

takes <1s on master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Easy Well-defined and straightforward way to resolve
Projects
None yet
Development

No branches or pull requests

5 participants