Update calculation of triangles #6258

SultanOrazbayev · 2022-12-06T17:56:21Z

This PR follows up on issue #5246, the key function is mostly as suggested by @dschult and @zephyr111 (minor variable renaming, hopefully for better clarity).

Note that the existing docs are missing the int return when a single node is provided (even though it is one of the examples), so the docs are modified to account for that also.

@dschult

This PR follows up on issue networkx#5246, the key function is mostly as suggested by @dschult and @zephyr111 (minor variable renaming, hopefully for better clarity). Note that the existing docs are missing the `int` return when a single node is provided (even though it is one of the examples), so the docs are modified to account for that also.

networkx/algorithms/cluster.py

Co-authored-by: Erik Welch <erik.n.welch@gmail.com>

SultanOrazbayev

the code seems to be fine

dschult · 2022-12-08T03:41:06Z

Do we have any evidence yet that this speeds up the calculation? It looks like a fair amount of extra code and that usually ends up slowing things down. My comment in the original issue about looking up one neighbor not speeding things up made me think of that again here. How is this version doing?

SultanOrazbayev · 2022-12-08T04:12:05Z

Yes, there's definitely a speed-up (my original SO question was prompted by very slow triangles calculation).

I didn't do comprehensive benchmarking, but here's a small python script to examine the times:https://gist.github.com/SultanOrazbayev/c4d5be83b5a8dd35ad13f0eff847135d. For an Erdos-Reny graph with 1500 nodes and 0.1 edge probability I get a speed up of about 10x-20x (note I gave up on trying larger graphs, since nx current version is taking rather long).

SultanOrazbayev · 2022-12-08T04:12:45Z

And in the above script, I'm not controlling for seed etc, but the results are large and consistent enough.

SultanOrazbayev · 2022-12-08T04:27:56Z

For the single-node case, though, yes, the current version appears to be much, much faster. This is likely due to the sub-graph creation part. Let me investigate a bit more.

SultanOrazbayev · 2022-12-08T06:00:00Z

After further inspection, it seems that the speedup is observed only when computing triangles for the whole graph (possibly, there is also speed up when nodes contains a lot of nodes). For single node or for small subset of nodes the old approach is much, much faster.

As a way to combine the strengths of both approaches, the old approach is preserved if nodes is specified, if nodes kwarg is None, then the new approach is used.

dschult · 2022-12-08T15:14:09Z

To speedup the subgraph part, you might try making a copy of the subgraph. Subgraph Views are not very performant. When traversing the graph, you still have to look at all edges and reject the ones that are not in the graph. If memory is not a bottleneck, and if you expect to look at an edge more than once on average, it is likely to be faster to add .copy() to the end of your subgraph call. This does take time of course -- because you have to create that new graph to hold the subgraph. So, no guarantees...
That also would mean you should have some logic to handle the case where nodes is None so that you don't make a copy/subgraph of the whole graph. :)

SultanOrazbayev · 2022-12-08T16:34:04Z

Thank you for these details, I didn't think about the overhead of using views vs copy. Right now, the iterative approach of the original code is preserved for the case when nodes are provided. Do you think this will work?

dschult · 2022-12-08T18:17:20Z

Yes -- that is a good way to handle the case when nodes are provided.

dschult

This look good!
The later_neighbors approach creates an extra data structure (more RAM) but avoids the repeated lookup of neighbors in the previous set(G[w]) - {w}. I think the extra RAM is at most a dict of sets with N pointers to sets with at most N elements (N being the number of nodes). That's less than a copy of the graph so I view this as a good enhancement.

For performance, I checked a complete_graph on 1000 nodes and the old/current code is 28s while the PR code is 34s. So, the new version is slower for denser graphs. This makes sense because the bulk of the time savings with the new approach is pre-computation of set(G[w]) - {w} and the cost of the new method is looping over nodes in neighbors & later_neighbors[node2] instead of using it's length. When those "third node" sets are large the extra inner loop is costly. And when they are small, using lookups makes up for the extra inner loop cost. Ordering the nodes to avoid counting twice helps too, but the main savings is replacing set creation time with lookup time.

I then checked random graphs with 1500 nodes and p=0.3 and 0.5. When p=0.5, the two methods both took about 20.7s. So, I think that is the density where the methods take essentially the same time.

I was able to speed up the PR method by a factor of 2-4 by moving the update of triangle_counts outside the inner loop and increasing it by m instead of 1 where m is the number of nodes to be looped over in the inner loop. Using a Counter to handle the inner loop also helps speed things up.

I tried adding pre-computation to the current networkx version and it helped a lot -- about 3-4 times faster. But that is still slower than the current PR for sparse graphs and slower than the inner-loop-speedup version for complete graphs. So, overall we should go with the PR with the inner-loop speedup with the inner loop pushed into the compiled code that is within Counter.update().

We're looking at a speedup of 10+ times for sparse graphs and about 7 for density p=0.5 and about 4 for the complete_graph... The complete graph on 1500 nodes takes 1min 56sec for the existing code, and 28sec for the PR with inner loop speedup.
See the comments below...

networkx/algorithms/cluster.py

- return to the original doc for default

- make corrections in the doc string

Co-authored-by: Dan Schult <dschult@colgate.edu>

- this is to make sure that the Counter contains a count for every node in the graph

- update doctest to pass a list rather that tuple

networkx/algorithms/cluster.py

Co-authored-by: Ross Barnowski <rossbar@berkeley.edu>

SultanOrazbayev · 2023-01-10T10:53:08Z

TLDR; I simplified the code and made the behaviour consistent with the original version. This is ready for a review now.

The ambiguity of nodes kwarg makes it difficult to have a nice implementation that would return 0 if a node is not in the graph. As a result, this test that I wanted to add will not be added:

    def test_node_not_in_graph(self):
        G = nx.Graph()
        G.add_node((0, 0))
        G.add_node((2, 3))
        assert nx.triangles(G, (0, 1)) == 0

The tricky part here is that if one passes an iterable containing nodes that are not in the graph, one would expect a dictionary with keys corresponding to the nodes in the iterable and values being zero. (somewhat similar behaviour for a list containing some nodes in the graph and some nodes not in the graph). Since a node could potentially also be an iterable itself, this leads to ambiguity (if we should return zero because the iterable is not a node in the graph or try to iterate over the contents of the iterable and return a dict with key/val containing zeros for nodes that are not in the graph and n-triangles for nodes that are in the graph). Implementing this would not lead to a readable code, so for now the most efficient path is to revert to the original behaviour.

networkx/algorithms/cluster.py

dschult

Thanks for this -- and sorry for the delay. I tried to include the new changes in the helper function used by other functions for triangles, degree, generalized degree. But the loops are different enough that it didn't end up helping there. Let's get this merged. :)
Thanks!

SultanOrazbayev · 2023-06-22T17:10:41Z

Thank you very much, @dschult !

@dschult

* Update calculation of triangles This PR follows up on issue networkx#5246, the key function is mostly as suggested by @dschult and @zephyr111 (minor variable renaming, hopefully for better clarity). Note that the existing docs are missing the `int` return when a single node is provided (even though it is one of the examples), so the docs are modified to account for that also. * fix a typo * fix formatting * ignore self-loops * remove print statements * Update networkx/algorithms/cluster.py Co-authored-by: Erik Welch <erik.n.welch@gmail.com> * Update cluster.py * Update cluster.py * Update cluster.py * Update cluster.py - return to the original doc for default * Update cluster.py - make corrections in the doc string * Update cluster.py * Update networkx/algorithms/cluster.py Co-authored-by: Dan Schult <dschult@colgate.edu> * Update cluster.py * Update cluster.py - this is to make sure that the Counter contains a count for every node in the graph * Update cluster.py - update doctest to pass a list rather that tuple * Update cluster.py * Update test_cluster.py * Update networkx/algorithms/cluster.py Co-authored-by: Ross Barnowski <rossbar@berkeley.edu> * simplify the code * trailing white space * add blank line (restart tests) * remove test * minor tweak to docs and rerun CI --------- Co-authored-by: Erik Welch <erik.n.welch@gmail.com> Co-authored-by: Dan Schult <dschult@colgate.edu> Co-authored-by: Ross Barnowski <rossbar@berkeley.edu>

@dschult

* Update calculation of triangles This PR follows up on issue networkx#5246, the key function is mostly as suggested by @dschult and @zephyr111 (minor variable renaming, hopefully for better clarity). Note that the existing docs are missing the `int` return when a single node is provided (even though it is one of the examples), so the docs are modified to account for that also. * fix a typo * fix formatting * ignore self-loops * remove print statements * Update networkx/algorithms/cluster.py Co-authored-by: Erik Welch <erik.n.welch@gmail.com> * Update cluster.py * Update cluster.py * Update cluster.py * Update cluster.py - return to the original doc for default * Update cluster.py - make corrections in the doc string * Update cluster.py * Update networkx/algorithms/cluster.py Co-authored-by: Dan Schult <dschult@colgate.edu> * Update cluster.py * Update cluster.py - this is to make sure that the Counter contains a count for every node in the graph * Update cluster.py - update doctest to pass a list rather that tuple * Update cluster.py * Update test_cluster.py * Update networkx/algorithms/cluster.py Co-authored-by: Ross Barnowski <rossbar@berkeley.edu> * simplify the code * trailing white space * add blank line (restart tests) * remove test * minor tweak to docs and rerun CI --------- Co-authored-by: Erik Welch <erik.n.welch@gmail.com> Co-authored-by: Dan Schult <dschult@colgate.edu> Co-authored-by: Ross Barnowski <rossbar@berkeley.edu>

@dschult

* Update calculation of triangles This PR follows up on issue networkx#5246, the key function is mostly as suggested by @dschult and @zephyr111 (minor variable renaming, hopefully for better clarity). Note that the existing docs are missing the `int` return when a single node is provided (even though it is one of the examples), so the docs are modified to account for that also. * fix a typo * fix formatting * ignore self-loops * remove print statements * Update networkx/algorithms/cluster.py Co-authored-by: Erik Welch <erik.n.welch@gmail.com> * Update cluster.py * Update cluster.py * Update cluster.py * Update cluster.py - return to the original doc for default * Update cluster.py - make corrections in the doc string * Update cluster.py * Update networkx/algorithms/cluster.py Co-authored-by: Dan Schult <dschult@colgate.edu> * Update cluster.py * Update cluster.py - this is to make sure that the Counter contains a count for every node in the graph * Update cluster.py - update doctest to pass a list rather that tuple * Update cluster.py * Update test_cluster.py * Update networkx/algorithms/cluster.py Co-authored-by: Ross Barnowski <rossbar@berkeley.edu> * simplify the code * trailing white space * add blank line (restart tests) * remove test * minor tweak to docs and rerun CI --------- Co-authored-by: Erik Welch <erik.n.welch@gmail.com> Co-authored-by: Dan Schult <dschult@colgate.edu> Co-authored-by: Ross Barnowski <rossbar@berkeley.edu>

SultanOrazbayev added 5 commits December 6, 2022 23:55

fix a typo

3f8fe6e

fix formatting

1921f84

ignore self-loops

2c0e132

remove print statements

f3ddb38

eriknw reviewed Dec 7, 2022

View reviewed changes

networkx/algorithms/cluster.py Outdated Show resolved Hide resolved

Update networkx/algorithms/cluster.py

4314629

Co-authored-by: Erik Welch <erik.n.welch@gmail.com>

SultanOrazbayev commented Dec 8, 2022

View reviewed changes

Update cluster.py

2e6c104

Update cluster.py

b2d7ad5

Update cluster.py

bdd810a

dschult reviewed Dec 10, 2022

View reviewed changes

networkx/algorithms/cluster.py Outdated Show resolved Hide resolved

networkx/algorithms/cluster.py Outdated Show resolved Hide resolved

networkx/algorithms/cluster.py Outdated Show resolved Hide resolved

networkx/algorithms/cluster.py Outdated Show resolved Hide resolved

SultanOrazbayev and others added 7 commits December 11, 2022 09:03

Update cluster.py

4878634

- return to the original doc for default

Update cluster.py

a64d1cf

- make corrections in the doc string

Update cluster.py

df3c1be

Update networkx/algorithms/cluster.py

fb251f5

Co-authored-by: Dan Schult <dschult@colgate.edu>

Update cluster.py

4f5a4bc

Update cluster.py

41bb782

- this is to make sure that the Counter contains a count for every node in the graph

Update cluster.py

1122143

- update doctest to pass a list rather that tuple

This was referenced Dec 11, 2022

doc: improve doc of possible values of nodes and expected behaviour #6274

Closed

Ambiguity of nbunch #6275

Open

rossbar reviewed Dec 11, 2022

View reviewed changes

networkx/algorithms/cluster.py Outdated Show resolved Hide resolved

SultanOrazbayev and others added 8 commits December 13, 2022 10:14

Update cluster.py

5ffd087

Update test_cluster.py

694d4ad

Merge branch 'networkx:main' into triangles

3f25da1

Update networkx/algorithms/cluster.py

aced3de

Co-authored-by: Ross Barnowski <rossbar@berkeley.edu>

simplify the code

462245c

trailing white space

9703435

add blank line (restart tests)

32b8b70

remove test

5b75492

aaronzo reviewed Mar 31, 2023

View reviewed changes

networkx/algorithms/cluster.py Outdated Show resolved Hide resolved

networkx/algorithms/cluster.py Show resolved Hide resolved

minor tweak to docs and rerun CI

3ad3761

dschult approved these changes Jun 22, 2023

View reviewed changes

jarrodmillman added the type: Enhancements label Jul 26, 2023

jarrodmillman approved these changes Jul 26, 2023

View reviewed changes

jarrodmillman merged commit adf6cff into networkx:main Jul 26, 2023
38 checks passed

jarrodmillman added this to the 3.2 milestone Jul 26, 2023

dschult mentioned this pull request Oct 22, 2023

could_be_isomorphic returns False for an isomorphic graph on 3.2 #7038

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update calculation of triangles #6258

Update calculation of triangles #6258

SultanOrazbayev commented Dec 6, 2022

SultanOrazbayev left a comment

dschult commented Dec 8, 2022

SultanOrazbayev commented Dec 8, 2022 •

edited

Loading

SultanOrazbayev commented Dec 8, 2022

SultanOrazbayev commented Dec 8, 2022

SultanOrazbayev commented Dec 8, 2022

dschult commented Dec 8, 2022

SultanOrazbayev commented Dec 8, 2022

dschult commented Dec 8, 2022

dschult left a comment

SultanOrazbayev commented Jan 10, 2023

dschult left a comment

SultanOrazbayev commented Jun 22, 2023

Update calculation of triangles #6258

Update calculation of triangles #6258

Conversation

SultanOrazbayev commented Dec 6, 2022

SultanOrazbayev left a comment

Choose a reason for hiding this comment

dschult commented Dec 8, 2022

SultanOrazbayev commented Dec 8, 2022 • edited Loading

SultanOrazbayev commented Dec 8, 2022

SultanOrazbayev commented Dec 8, 2022

SultanOrazbayev commented Dec 8, 2022

dschult commented Dec 8, 2022

SultanOrazbayev commented Dec 8, 2022

dschult commented Dec 8, 2022

dschult left a comment

Choose a reason for hiding this comment

SultanOrazbayev commented Jan 10, 2023

dschult left a comment

Choose a reason for hiding this comment

SultanOrazbayev commented Jun 22, 2023

SultanOrazbayev commented Dec 8, 2022 •

edited

Loading