Performance regression in 0.2 #38

ahirner · 2018-01-28T13:25:23Z

I was pip updating from 0.1.3 to 0.2. Two sample workloads of us took a significant hit in performance: Reducing 480x13500 to 80x13500 ran 2:24 instead of 1:14 and reducing 480x6700 to 80x6700 took 1:49 instead of 0:28.

Alongside updating umap-learn, other libraries got a bump (llvmlite 0.2 to 0.21, numba 0.35.0 to 0.36.2). Neither of those affected running times. After downgrading to 0.1.3, I got the former numbers.

I saw that this commit disabled jitting for fuzzy_simplical_set. Could this or anything else cause this regression?

The text was updated successfully, but these errors were encountered:

lmcinnes · 2018-01-28T14:31:29Z

I suspect small dataset sizes are the issue here. The changes that were made for 0.2 were largely targetted at large dataset sizes, and correcting some issues in the resulting embedding. These changes are fundamentally necessary, but they may result in less perfomant (but more accurate!) results for small number of points, particularly when reducing to larger embedding dimensions as you are doing here.

Long story short: I think this may simply be a necessary performance regression for the kinds of data you have here. Sorry.

ahirner · 2018-01-28T15:08:27Z

That's interesting. It turned out that organizing small chunks of data many times works pretty well for our domain. I will have a look at the qualitative difference. So should this issue be closed?

lmcinnes · 2018-01-28T15:18:41Z

Leave it open for now -- I would like to be able to resolve such issues if I can, and perhaps with more time I might come up with an approach that could make this better. In the meantime you can use the new n_epochs parameter to speed up training time (at some loss of accuracy). For the dataset sizes you have I believe the effective default is 500; you could try dropping it to 200 and see if that helps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression in 0.2 #38

Performance regression in 0.2 #38

ahirner commented Jan 28, 2018

lmcinnes commented Jan 28, 2018

ahirner commented Jan 28, 2018

lmcinnes commented Jan 28, 2018

Performance regression in 0.2 #38

Performance regression in 0.2 #38

Comments

ahirner commented Jan 28, 2018

lmcinnes commented Jan 28, 2018

ahirner commented Jan 28, 2018

lmcinnes commented Jan 28, 2018