New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance regression in 0.2 #38
Comments
I suspect small dataset sizes are the issue here. The changes that were made for 0.2 were largely targetted at large dataset sizes, and correcting some issues in the resulting embedding. These changes are fundamentally necessary, but they may result in less perfomant (but more accurate!) results for small number of points, particularly when reducing to larger embedding dimensions as you are doing here. Long story short: I think this may simply be a necessary performance regression for the kinds of data you have here. Sorry. |
That's interesting. It turned out that organizing small chunks of data many times works pretty well for our domain. I will have a look at the qualitative difference. So should this issue be closed? |
Leave it open for now -- I would like to be able to resolve such issues if I can, and perhaps with more time I might come up with an approach that could make this better. In the meantime you can use the new |
I was pip updating from 0.1.3 to 0.2. Two sample workloads of us took a significant hit in performance: Reducing
480x13500
to80x13500
ran2:24
instead of1:14
and reducing480x6700
to80x6700
took1:49
instead of0:28
.Alongside updating umap-learn, other libraries got a bump (llvmlite 0.2 to 0.21, numba 0.35.0 to 0.36.2). Neither of those affected running times. After downgrading to 0.1.3, I got the former numbers.
I saw that this commit disabled jitting for
fuzzy_simplical_set
. Could this or anything else cause this regression?The text was updated successfully, but these errors were encountered: