SCC benchmarks and use of G._adj in Tarjan algorithm#8064
SCC benchmarks and use of G._adj in Tarjan algorithm#8064dschult merged 5 commits intonetworkx:mainfrom
G._adj in Tarjan algorithm#8064Conversation
G._adj in Tarjan algorithm
amcandio
left a comment
There was a problem hiding this comment.
Looks good to me! I wonder if we should make _adj "public" at this point
One of the reasons In [1]: import networkx as nx
In [2]: G = nx.path_graph(4)
In [3]: G.adj[1] = 2
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[3], line 1
----> 1 G.adj[1] = 2
TypeError: 'AdjacencyView' object does not support item assignment
In [4]: G._adj[1] = 2
In [5]: print(G)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[5], line 1
----> 1 print(G)
.....
.....
TypeError: object of type 'int' has no len()If required you can always use it, but What we could do is probably document this speed up if we haven't already! |
| _graphs = [ | ||
| nx.erdos_renyi_graph(100, 0.1, directed=True), | ||
| nx.erdos_renyi_graph(100, 0.5, directed=True), | ||
| nx.erdos_renyi_graph(100, 0.9, directed=True), | ||
| nx.erdos_renyi_graph(1000, 0.01, directed=True), | ||
| nx.erdos_renyi_graph(1000, 0.05, directed=True), | ||
| nx.erdos_renyi_graph(1000, 0.09, directed=True), |
There was a problem hiding this comment.
These values aren't actually sufficient for measuring SCC performance at all - the graphs are too small and the edge probability is (way) too high. I'll play around with finding better values. In terms of execution time (per benchmark), what's the upper limit of what we'd want to add? A few seconds? Up to a minute?
There was a problem hiding this comment.
My personal preference would be to keep runtimes of benchmarks manageable, at least within one collection of benchmarks.
Long-running benchmarks are okay in principle, but at the very least I'd advocate for them to be split off into their own class, which makes them easier to de-select with the asv CLI.
There was a problem hiding this comment.
That makes sense. I'll try to keep the benchmarks so they don't go beyond the 120 second timeout (which I assume refers to the whole suite, not each individual test?).
In the spirit of your second paragraph: how come the benchmarks/ folder doesn't roughly mirror the structure of the main folder?
There was a problem hiding this comment.
These values aren't actually sufficient for measuring SCC performance at all - the graphs are too small and the edge probability is (way) too high. I'll play around with finding better values.
I recently run into a similar issue when benchmarking Dijkstra's. You might need some engineering to get a challenging graph. One think I can think of here is:
- Generate a random DAG of size S
- Randomly assign each node to a node in the DAG
- For each edge (u,v), only add it to a graph if random() < p and DAG[u]==DAG[v] or (DAG[u], DAG[v]) is an edge on the DAG
This way you guarantee that your graph has at least S strongly connected components
Co-authored-by: Ross Barnowski <rossbar@caltech.edu>
rossbar
left a comment
There was a problem hiding this comment.
Okay the benchmarks are now probing the behavior as expected. I see minor speedups for some benchmark cases (up to 20%) on my machine and, more importantly, no performance degradations! This is expected since the only change to code is switching from G[v] to G._adj[v], which should only improve performance.
As a quick reference for anyone who wants to run the benchmarks locally, try:
asv continuous --bench DirectedAlgorithmBenchmarks main enh/benchmark-scc
Once that's completed, you can run asv compare main enh/benchmark-scc to get a quick summary of the performance comparisons!
Thanks for putting this together @Peiffap !
|
Thanks for taking a look, @rossbar. It'd be good to get this in to unblock #8056! Thanks for the note on running benchmarks locally, too. This kind of information around benchmarking should be part of the contributor's guide, IMO! Performance is a big consideration in most recent PRs and it's a shame that the information on how to measure it properly is so hard to find. |
dschult
left a comment
There was a problem hiding this comment.
This looks good -- and nice to have the timing tools set up for directed case here.
* Use `G._adj` in inner loops of SCC algorithm * Add benchmarks for SCC algorithms * Use neighbor iterator to avoid re-traversing neighbors * Make generators iterate fully Co-authored-by: Ross Barnowski <rossbar@caltech.edu> --------- Co-authored-by: Ross Barnowski <rossbar@caltech.edu>
This PR is a minified version of #8056 that removes the changes to Kosaraju to simplify review of both, i.e. it:
G._adjinstrongly_connected_components.