-
Notifications
You must be signed in to change notification settings - Fork 805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] What's the scaling complexity? #8
Comments
The complexity is essentially bounded by the nearest neighbor search phase. By making use of algorithms for approximate nearest neighbor graph construction I can get the scaling with input dimension down to essentially linear (effectively scaling with regard to the complexity of the distance computation). With regard to number of sample the scaling is a little harder to pin down (the algorithm I'm using, NN-descent, doesn't have a published theoretical complexity) and is, of course, dataset dependent (i.e. it will depend on properties of the distribution of the samples). Empirically I believe it should be something like O(d*n^1.14) or so. A better theoretical analysis of the complexity is definitely on my list of things to do, but its currently a long list. |
Thanks a lot! |
O(d*n^1.14) sounds almost too good to be true doesn't it? |
It certainly does, but it is an empirical estimate of an average case complexity. I have not seen any complete estimates of NNDescent complexity. Some work by Richard Darling and Jacob Baron demonstrate both some cases where NNDescent will fail (with complexity greater than O(N^2)), and classes of problem for which a complexity of O(N log N) is provable. The catch is that both are relatively theoretical cases: the first involves metrics that fail friend-of-a-friend principles, which isn't true for most of the metrics UMAP would be used with; the second is a fairly constrained class of problem that one is unlikely to see in practice (but for which complexity is at least provable). So, given that worst cases do exist, I guess the real answer (though it is a highly deceptive one, and would equally apply to any algorithm that requires nearest neighbor computations) is O(N^2). For practical purposes it certainly empirically scales at O(N log N) and there are certainly problem classes for which that is the provable complexity, so perhaps that is a reasonable estimate? |
Great response Leland. I think this highlights a fundamental problem O()
notation. It's a worst case complexity which is potentially the tail end
of a distribution of complexities and may not represent real world scaling
in practice (even ignoring those pesky constants). As such it can be quite
deceptive for many classes of algorithms. I've noticed this in particular
with graph algorithms. The worst case graph distribution for many
algorithms is a single long connected chain which *may* be incredibly
unlikely for any particular family of graphs that users care about. While
this is definitely an awful corner case for many graph algorithms it
doesn't tell a user much about how the algorithm will perform in practice
on their data. For that reason I'm a big fan of the less popular average
case complexity. Of course, to do anything useful with average case
complexity you need to come up with a distribution that your data is being
sampled from which can be hard. The advantage in the case of nearest
neighbour algorithms is that it forces you to explicitly think about things
like the metric space your algorithm will be running over. While such
notions certainly raise the complexity of the complexity analysis they do
provide significantly more insight into the question of how well will this
algorithm perform on *my* data which, I think, is the question we all want
to answer.
Now that sounds quite down on O-notation which I really don't mean to be.
It's an upper limit on the average case complexity and if an algorithm has
great O complexity then you know it will also have great average case
complexity on your data. Instead I mean it as a warning, that all O(n^2)
algorithms aren't born equal. I too often see folks making statements like
all those algorithms are O(n^2) so use any of them. The max is a useful
summary statistic of any distribution but it isn't necessarily the most
useful descriptive statistic for that distribution.
…On Mon, Jan 18, 2021 at 12:27 PM Leland McInnes ***@***.***> wrote:
It certainly does, but it is an empirical estimate of an average case
complexity. I have not seen any complete estimates of NNDescent complexity.
Some work by Richard Darling and Jacob Baron demonstrate both some cases
where NNDescent will fail (with complexity greater than O(N^2)), and
classes of problem for which a complexity of O(N log N) is provable. The
catch is that both are relatively theoretical cases: the first involves
metrics that fail friend-of-a-friend principles, which isn't true for most
of the metrics UMAP would be used with; the second is a fairly constrained
class of problem that one is unlikely to see in practice (but for which
complexity is at least provable).
So, given that worst cases do exist, I guess the real answer (though it is
a highly deceptive one, and would equally apply to *any* algorithm that
requires nearest neighbor computations) is O(N^2). For practical purposes
it certainly empirically scales at O(N log N) and there are certainly
problem classes for which that is the provable complexity, so perhaps that
is a reasonable estimate?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#8 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC3IUWV5AUXSWL4DHL2PEK3S2RVP7ANCNFSM4EDKKQPQ>
.
|
The max statistic is useful in matching the human trait of taking assurance and building social trust by being able to commit to some amount with what they consider certainty. It's helpful for planning. Of course the finer view of considering the distribution of the complexity can be very useful in augmenting that approach. If we have a particular way to frame the factors affecting this algorithm's runtime complexity beyond just the size of the input, or even testing/sampling the data for predicting the runtime, that might be pragmatically helpful. |
Looks like a great alternative to t-SNE! The readme mentions how fast it is, but I wonder what is the complexity in big-O depending on the number of samples and dimensions? Waiting for the paper to read impatiently!
The text was updated successfully, but these errors were encountered: