Skip to content
This repository has been archived by the owner on Oct 8, 2021. It is now read-only.

Performance of diameter #1516

Open
bkamins opened this issue Jan 23, 2021 · 8 comments
Open

Performance of diameter #1516

bkamins opened this issue Jan 23, 2021 · 8 comments
Labels
efficiency / performance related to speed/memory performance good first issue ideal for new users/contributors - easy but important

Comments

@bkamins
Copy link
Contributor

bkamins commented Jan 23, 2021

I have a gh graph on 37700 vertices and 289003 edges.

Calculation of its diameter like this:

maximum(maximum(gdistances(gh, i)) for i in vertices(gh))

takes around 250 seconds (probably this is not an optimal algorithm for the task but at least it works).

However, when I run diameter(gh) the process takes so long that I did not wait till it finished.

@bkamins bkamins added the bug confirmed bug producing incorrect results label Jan 23, 2021
@bkamins
Copy link
Contributor Author

bkamins commented Jan 23, 2021

This is not a bug but performance issue, but a bug label was automatically added.

@simonschoelly
Copy link
Contributor

I think this might be, because diameter works for the general case where an additional distance matrix is used. Internally it then uses dijkstra_shortest_paths instead of gdistances. Maybe we should have a special case for when no distance matrix is used.

@bkamins
Copy link
Contributor Author

bkamins commented Jan 23, 2021

In general using gdistances is also not optimal I think. igraph is much faster on this graph in calculation of diameter.
I think that in general either e.g. Floyd-Warshall or Johnson's algorithm should be used (depending on the sparsity of the graph), but unfortunately currently I have too much work to get DataFrames.jl to 1.0 release to dig into LightGraphs.jl sources - sorry for this. (or maybe there are some special algorithms that are custom made for diameter - I have not checked)

@simonschoelly
Copy link
Contributor

I am not sure if Floyd-Warshall is much more performant unless one finds a way to avoid allocating the temporary |V| x |V| distance matrix. Best would probably to just do some benchmark with different algorithms & graphs, if someone got time for that.

There might also be the possiblity for a parallel diameter algorithm

@sbromberger
Copy link
Owner

sbromberger commented Jan 24, 2021

All-source bfs should be faster than FW for unweighted graphs. For weighted I think dijkstra will still beat FW but tests should confirm.

And we should definitely specialize distance metrics for unweighted.

@bkamins
Copy link
Contributor Author

bkamins commented Jan 24, 2021

igraph does BFS. F-W allocates but if graph is dense this should not be a problem as then |V| cannot be large anyway.

@sbromberger sbromberger added efficiency / performance related to speed/memory performance good first issue ideal for new users/contributors - easy but important and removed bug confirmed bug producing incorrect results labels Feb 17, 2021
@gdalle
Copy link
Contributor

gdalle commented Feb 20, 2021

Would something like https://who.rocq.inria.fr/Laurent.Viennot/road/papers/ifub.pdf be overkill for our purposes?

Also, when you talk about distance metrics to specialize, what do you have in mind besides the diameter?

@sbromberger
Copy link
Owner

Also, when you talk about distance metrics to specialize, what do you have in mind besides the diameter?

Sorry for the delayed response. In general, if you have unweighted graphs, where we're currently using Dijkstra, we can simply substitute BFS. There's no need to track weights in an unweighted graph, and BFS is more performant specifically because we don't need to do these comparisons at each traversal.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
efficiency / performance related to speed/memory performance good first issue ideal for new users/contributors - easy but important
Projects
None yet
Development

No branches or pull requests

4 participants