Skip to content

SCC benchmarks and use of G._adj in Tarjan algorithm#8064

Merged
dschult merged 5 commits intonetworkx:mainfrom
Peiffap:enh/benchmark-scc
Jul 9, 2025
Merged

SCC benchmarks and use of G._adj in Tarjan algorithm#8064
dschult merged 5 commits intonetworkx:mainfrom
Peiffap:enh/benchmark-scc

Conversation

@Peiffap
Copy link
Copy Markdown
Member

@Peiffap Peiffap commented May 23, 2025

This PR is a minified version of #8056 that removes the changes to Kosaraju to simplify review of both, i.e. it:

@Peiffap Peiffap changed the title Enh/benchmark scc SCC benchmarks and use of G._adj in Tarjan algorithm May 23, 2025
Copy link
Copy Markdown
Contributor

@amcandio amcandio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I wonder if we should make _adj "public" at this point

@MridulS
Copy link
Copy Markdown
Member

MridulS commented May 23, 2025

I wonder if we should make _adj "public" at this point

One of the reasons _adj isn't really public is that it's too easy to break your graph object!

In [1]: import networkx as nx

In [2]: G = nx.path_graph(4)

In [3]: G.adj[1] = 2
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[3], line 1
----> 1 G.adj[1] = 2

TypeError: 'AdjacencyView' object does not support item assignment

In [4]: G._adj[1] = 2

In [5]: print(G)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[5], line 1
----> 1 print(G)
.....
.....

TypeError: object of type 'int' has no len()

If required you can always use it, but _ by convention basically tells the users to play with this dictionary at your own risk!

What we could do is probably document this speed up if we haven't already!

Comment on lines +55 to +61
_graphs = [
nx.erdos_renyi_graph(100, 0.1, directed=True),
nx.erdos_renyi_graph(100, 0.5, directed=True),
nx.erdos_renyi_graph(100, 0.9, directed=True),
nx.erdos_renyi_graph(1000, 0.01, directed=True),
nx.erdos_renyi_graph(1000, 0.05, directed=True),
nx.erdos_renyi_graph(1000, 0.09, directed=True),
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These values aren't actually sufficient for measuring SCC performance at all - the graphs are too small and the edge probability is (way) too high. I'll play around with finding better values. In terms of execution time (per benchmark), what's the upper limit of what we'd want to add? A few seconds? Up to a minute?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My personal preference would be to keep runtimes of benchmarks manageable, at least within one collection of benchmarks.

Long-running benchmarks are okay in principle, but at the very least I'd advocate for them to be split off into their own class, which makes them easier to de-select with the asv CLI.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I'll try to keep the benchmarks so they don't go beyond the 120 second timeout (which I assume refers to the whole suite, not each individual test?).

In the spirit of your second paragraph: how come the benchmarks/ folder doesn't roughly mirror the structure of the main folder?

Copy link
Copy Markdown
Contributor

@amcandio amcandio May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These values aren't actually sufficient for measuring SCC performance at all - the graphs are too small and the edge probability is (way) too high. I'll play around with finding better values.

I recently run into a similar issue when benchmarking Dijkstra's. You might need some engineering to get a challenging graph. One think I can think of here is:

  • Generate a random DAG of size S
  • Randomly assign each node to a node in the DAG
  • For each edge (u,v), only add it to a graph if random() < p and DAG[u]==DAG[v] or (DAG[u], DAG[v]) is an edge on the DAG

This way you guarantee that your graph has at least S strongly connected components

Peiffap and others added 2 commits June 12, 2025 14:00
Co-authored-by: Ross Barnowski <rossbar@caltech.edu>
Copy link
Copy Markdown
Contributor

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay the benchmarks are now probing the behavior as expected. I see minor speedups for some benchmark cases (up to 20%) on my machine and, more importantly, no performance degradations! This is expected since the only change to code is switching from G[v] to G._adj[v], which should only improve performance.

As a quick reference for anyone who wants to run the benchmarks locally, try:

asv continuous --bench DirectedAlgorithmBenchmarks main enh/benchmark-scc

Once that's completed, you can run asv compare main enh/benchmark-scc to get a quick summary of the performance comparisons!

Thanks for putting this together @Peiffap !

@Peiffap
Copy link
Copy Markdown
Member Author

Peiffap commented Jul 8, 2025

Thanks for taking a look, @rossbar. It'd be good to get this in to unblock #8056!

Thanks for the note on running benchmarks locally, too. This kind of information around benchmarking should be part of the contributor's guide, IMO! Performance is a big consideration in most recent PRs and it's a shame that the information on how to measure it properly is so hard to find.

Copy link
Copy Markdown
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good -- and nice to have the timing tools set up for directed case here.

@dschult dschult merged commit 860948d into networkx:main Jul 9, 2025
45 checks passed
@Peiffap Peiffap deleted the enh/benchmark-scc branch July 9, 2025 15:11
amcandio pushed a commit to amcandio/networkx that referenced this pull request Jul 20, 2025
* Use `G._adj` in inner loops of SCC algorithm

* Add benchmarks for SCC algorithms

* Use neighbor iterator to avoid re-traversing neighbors

* Make generators iterate fully

Co-authored-by: Ross Barnowski <rossbar@caltech.edu>

---------

Co-authored-by: Ross Barnowski <rossbar@caltech.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

5 participants