Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added girth computation function #6633

Merged
merged 26 commits into from May 28, 2023
Merged

Added girth computation function #6633

merged 26 commits into from May 28, 2023

Conversation

alabarre
Copy link
Contributor

I implemented the algorithm mentioned on Wikipedia to compute the girth of a graph. This request seems to have been dropped a while ago (#1030), hopefully it is still of interest to some of us.

@boothby
Copy link
Contributor

boothby commented Apr 14, 2023

This looks good to me, @alabarre. Would you mind adding tests in /networkx/algorithms/tests/test_cycles.py?

@alabarre
Copy link
Contributor Author

alabarre commented Apr 14, 2023

This looks good to me, @alabarre. Would you mind adding tests in /networkx/algorithms/tests/test_cycles.py?

I added a few. Mine pass, but others fail:

================================================================================== short test summary info ===================================================================================
FAILED networkx/algorithms/components/tests/test_connected.py::TestConnected::test_connected_components[convert] - KeyError: 'networkx.plugins'
FAILED networkx/algorithms/tests/test_structuralholes.py::TestStructuralHoles::test_constraint_directed[convert] - KeyError: 'networkx.plugins'
======================================================================== 2 failed, 4305 passed, 233 skipped in 48.77s ========================================================================

I'm leaving them alone, just wanted to let you know.

Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tests! I just have a couple comments/questions...

networkx/algorithms/tests/test_cycles.py Outdated Show resolved Hide resolved
networkx/algorithms/tests/test_cycles.py Outdated Show resolved Hide resolved
Copy link
Contributor

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the liberty of fixing up the References formatting for the docstring. Generally LGTM, just a few minor suggestions for code/tests, most of which are nits!

networkx/algorithms/cycles.py Outdated Show resolved Hide resolved
networkx/algorithms/cycles.py Outdated Show resolved Hide resolved
networkx/algorithms/cycles.py Outdated Show resolved Hide resolved
networkx/algorithms/tests/test_cycles.py Outdated Show resolved Hide resolved
networkx/algorithms/tests/test_cycles.py Outdated Show resolved Hide resolved
Copy link
Contributor

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the liberty of making a few minor changes to help get this over the line:

  • I merged with main to pick up the latest CI fixes
  • I added girth to the autosummary so it shows up in the reference docs
  • A few minor formatting fixups.

This LGTM, thanks @alabarre !

Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the "resolved" comments above were not actually implemented even though agreement was indicated. But they are minor items.

I have 2 concerns:

  1. Shouldn't this have a weight argument? Or is girth only used for unweighted graphs? I guess this could be a first PR with the default weight of 1. But I suspect people will request a weighted version eventually.
  2. I don't think we should be mapping all the edges to frozensets. We should be able to test for an edge in an undirected graph without using the frozensets.

I will push the minor changes to remove frozensets and include the agreed upon comments above that got missed. Hopefully it will become easier to read as well, but we'll see.

I think the weighted version should perhaps be a separate PR.

@boothby
Copy link
Contributor

boothby commented May 19, 2023

I agree that the weighted version should be a separate PR. Negative weights, for example, would confound this algorithm.

@dschult
Copy link
Member

dschult commented May 19, 2023

Hmmm... I'm not sure this makes sense for negative weights... Can't you just go back and forth across one of the negative edges and get as negative a cycle as you want? Or are we limiting this to simple cycles?

Anyway, I was assuming we would use Dijkstra -- which would require positive weights anyway.

@boothby do you think this PR is ready then? I could approve it, but would feel better with someone else looking at what I did rather than approving it myself. :}

@boothby
Copy link
Contributor

boothby commented May 19, 2023

I presume that simple cycles are assumed. I hadn't considered the weighted girth problem today, but there is interest in negative weights in the literature.

https://www.sciencedirect.com/science/article/pii/S1570866715001021

I'll re-read this PR today.

@dschult
Copy link
Member

dschult commented May 19, 2023

Oh my! That article is looking for the negative "cost" cycle with the fewest edges -- not with the least cost...
We don't even look for that with our shortest_path suite of functions. I guess if the count of edges represent the objective and the edge weights represent the constraint -- we are getting the least length of those cycles with negative cost.

I would restrict any PR I do to positive weights and use dijkstra to find the shortest paths.

@boothby
Copy link
Contributor

boothby commented May 20, 2023

Back to you, Dan.

I tightened up the length bound, and reverted a change you made that iterated over all of the edges of G; all in the name of performance. I also added the trick of deleting nodes as we go, and in so doing caught a bug in handling isolated nodes. Sorry; your version was so short and sweet.

@boothby
Copy link
Contributor

boothby commented May 20, 2023

And for what it's worth, the calls to bfs and dijkstra are throwing away a lot of information and re-computing it. I've implemented this algorithm "my way" in alabarre/networkx@girth...boothby:networkx:girth-faster but I have not been so presumptuous to push a wholesale rewrite.

Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the bfs_edges approach is not ideal because it causes us to go back and find the lengths. We do have at least one function that returns both distance and predecessors. Maybe that could help us avoid including another copy of BFS by using:

    pred, dist = nx.predecessor(G, n, return_seen=True)
    tree = nx.Graph((p[0], u) for u, p in pred.items() if u != n)
    for u, unbrs in tree.adj.items():
        ...

But maybe we should just write another version of BFS here as in your branch.

networkx/algorithms/cycles.py Outdated Show resolved Hide resolved
networkx/algorithms/tests/test_cycles.py Outdated Show resolved Hide resolved
@boothby
Copy link
Contributor

boothby commented May 22, 2023

Okay, I gave this all a good hard stare. What this algorithm really calls for is a function analogous to dfs_labeled_edges. Here's what I came up with: in our breadth first search, one sees four kinds of edges (tree, forward, back, and level). It's easy to efficiently traverse the graph and label each edge with the following:

BACK_EDGE = "back"
TREE_EDGE = "tree"
FORWARD_EDGE = "forward"
LEVEL_EDGE = "level"
def bfs_labeled_edges(G, sources):
    if sources in G:
        sources = [sources]

    depth = {s: 0 for s in sources}
    queue = deque((0, s) for s in sources)
    label = [LEVEL_EDGE, FORWARD_EDGE, BACK_EDGE]
    while queue:
        du, u = queue.popleft()
        for v in G[u]:
            dv = depth.get(v)
            if dv is None:
                depth[v] = dv = du + 1
                queue.append((dv, v))
                yield u, v, du, dv, TREE_EDGE
            else:
                yield u, v, du, dv, label[dv-du]

this greatly simplifies our girth computation:

def girth(G):
    girth = depth_limit = inf
    skip_labels = (
        nx.algorithms.traversal.breadth_first_search.BACK_EDGE,
        nx.algorithms.traversal.breadth_first_search.TREE_EDGE,
    )
    forward_edge = nx.algorithms.traversal.breadth_first_search.FORWARD_EDGE
    level_edge = nx.algorithms.traversal.breadth_first_search.LEVEL_EDGE
    for n in G:
        for u, v, du, dv, label in nx.bfs_labeled_edges(G, n):
            if du > depth_limit:
                break
            if label in skip_labels:
                pass
            else:
                length = du + dv + 1
                if length < girth:
                    depth_limit = du - (label is level_edge)
                    girth = length

    return girth

This gets my K200 benchmark down to ~25ms. Should I commit it? I'm mostly seeking commentary on bfs_labeled_edges here. The name is probably weird because I'm yielding depth information along with the edges. A more "pure" version might omit that info from the generator and recompute it (quick enough). A version seen as impure could live in cycles.py.

... 
    for n in G
        depth = {n:0}
        for u, v, label in nx.bfs_labeled_edges(G, n):
            if label is TREE_EDGE:
                depth[v] = depth[u] + 1
...

@boothby
Copy link
Contributor

boothby commented May 22, 2023

It occurs to me that we could do girth(G, parity = None/'even'/'odd') by optionally stuffing an extra label into skip_labels and thinking carefully about the stopping criterion.

@dschult
Copy link
Member

dschult commented May 22, 2023

I agree that we should create a bfs_labeled_edges function to match dfs_labeled_edges. It allows nice flexibility for the traversals. I have thought that before, but this seems like a good time to do it. I had always thought it should live in traversal/breadth_first_search.py, but could maybe be convinced otherwise. I also think it should work with directed and undirected graphs -- and probably multigraphs too if that doesn't prove to be too nasty.

As I think about edge types I'm confused by the BACK_EDGEs. At first I couldn't get them to happen. But I think now that I've convinced myself that they will not occur for undirected graphs. And they definitely can occur for directed graphs. For the undirected graphs, you would always traverse an edge outward before it would be reached going inward.

So, while I very much enjoyed(!) the triplet of edge labels indexed by dv-du, I think it isn't needed for the undirected case and for directed case we'll need to check if dv-du is negative rather than -1. Does this make sense?

I'm still thinking about whether to have the labels yield the distances -- it is certainly convenient, but also (as you mention) likely to have a way to compute it based on the previous edges. I'm leaning toward a pure version in traversal, and then the girth code (and maybe a doc_string example in the bfs_labeled_edges doc_string?) can show how to compute the distances. But I'd like to think/play about it some more. :}

@boothby
Copy link
Contributor

boothby commented May 22, 2023

As I drifted off to sleep last night, the directed case occurred to me. There, back-jumps can be arbitrarily long so my triplet is certainly invalid.

I think you're assuming that each edge should occur exactly once, and concluding that back-edges won't happen in the undirected case. And that's certainly achievable, but currently my code emits each undirected edge exactly twice.

Perhaps this?

def _undirected_bfs_labeled_edges(G, sources):
    if sources in G:
        sources = [sources]

    depth = {s: 0 for s in sources}
    visited = set()
    queue = deque((0, s) for s in sources)
    while queue:
        du, u = queue.popleft()
        visited.add(u)
        for v in G[u]:
            dv = depth.get(v)
            if dv is None:
                depth[v] = dv = du + 1
                queue.append((dv, v))
                yield u, v, TREE_EDGE
            else:
                if du == dv:
                    # could use either "in" or "not in" -- this emits self-loops
                    if v in visited:
                        yield u, v, LEVEL_EDGE
                elif du < dv:
                    yield u, v, FORWARD_EDGE

def _directed_bfs_labeled_edges(G, sources):
    if sources in G:
        sources = [sources]

    depth = {s: 0 for s in sources}
    queue = deque((0, s) for s in sources)
    while queue:
        du, u = queue.popleft()
        for v in G[u]:
            dv = depth.get(v)
            if dv is None:
                depth[v] = dv = du + 1
                queue.append((dv, v))
                yield u, v, TREE_EDGE
            elif du < dv:
                yield u, v, FORWARD_EDGE
            elif du > dv:
                yield u, v, BACK_EDGE
            else:
                yield u, v, LEVEL_EDGE

* added new helper function bfs_labeled_edges
* added support for even girth and odd girth
@boothby
Copy link
Contributor

boothby commented May 22, 2023

Oh, drat. I was hasty in adding the option for odd&even girth. Thanks, Petersen Graph, for yet again showing us the way.

Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!! :)

Whatever the choice for self-loops (see below), it looks like we don't need to separate directed and undirected. The use of visited should apply in the same way to both types of graphs -- so that self.loops are handled in both cases. And the case du > dv will not arise in the undirected case.

Thanks!

@boothby
Copy link
Contributor

boothby commented May 23, 2023

The use of visited should apply in the same way to both types of graphs -- so that self.loops are handled in both cases

Self-loops are reported in both directed and undirected right now. The directed case does not need to track visited because edges are only traversed in a single direction. I'm reluctant to use the extra memory when not needed.

In the undirected case, every edge is traversed twice, so we need to use visited or another construct to filter edges from occurring twice (or, produce "reverse" edges -- which is also fine, but then self-loops would be reported only once). By using the visited filter, there are two choices

  1. mark u as visited before or after iterating over G[u]
  2. check if v in visited or v not in visited

Toggling either choice changes whether or not loops are reported in the undirected case.

@boothby
Copy link
Contributor

boothby commented May 24, 2023

new benchmark

head    K200   C1000  CC1019
5f516c6 1.7005 3.5432 0.4079 
a9f9343 1.6351 3.5444 0.4078 
bb815f6 1.7013 3.5563 0.4143 
e85f26c 1.6755 3.4549 0.3939 
344f5ea 1.6772 3.4542 0.4120 
25a4ae7 1.6477 3.4241 0.3932 
c2ac2ff 1.6864 3.5351 0.4074 
282312e 1.7071 3.5349 0.4011 
722039a 1.6862 3.4973 0.3965 
b0a63af 3.8599 2.8787 0.7263 
07e673a 0.2681 1.1041 0.0331 
54f2921 0.2803 1.1071 0.0333 
7b21cc9 0.2475 0.5321 0.0141 
40dde43 0.0340 0.5402 0.0107 
1af2d48 0.0132 1.3616 0.0120 
db297d4 0.0132 0.7919 0.0073 
2a60dd8 0.0135 0.8059 0.0076 
2b2516c 0.0134 0.7952 0.0077 
c789b37 0.0131 0.7806 0.0072 
0f44580 0.0136 0.8134 0.0078 
0d99c88 0.0138 0.8153 0.0077 
c101cf2 0.0109 0.7663 0.0069 
bcae004 0.0104 0.7665 0.0080
0e426c8 0.0093 0.4212 0.0063 
from networkx import girth, complete_graph, cycle_graph, chordal_cycle_graph, Graph
from time import perf_counter

test_graphs = complete_graph(200), cycle_graph(1000), Graph([(u, v) for u, v, _ in chordal_cycle_graph(1019).edges if u != v])
g = girth
for G in test_graphs:
    t0, *girths, t1 = (
        perf_counter(),
        g(G), g(G), g(G), g(G), g(G), g(G), g(G), g(G), g(G), g(G),
        g(G), g(G), g(G), g(G), g(G), g(G), g(G), g(G), g(G), g(G),
        g(G), g(G), g(G), g(G), g(G), g(G), g(G), g(G), g(G), g(G),
        g(G), g(G), g(G), g(G), g(G), g(G), g(G), g(G), g(G), g(G),
        g(G), g(G), g(G), g(G), g(G), g(G), g(G), g(G), g(G), g(G),
        g(G), g(G), g(G), g(G), g(G), g(G), g(G), g(G), g(G), g(G),
        g(G), g(G), g(G), g(G), g(G), g(G), g(G), g(G), g(G), g(G),
        g(G), g(G), g(G), g(G), g(G), g(G), g(G), g(G), g(G), g(G),
        g(G), g(G), g(G), g(G), g(G), g(G), g(G), g(G), g(G), g(G),
        g(G), g(G), g(G), g(G), g(G), g(G), g(G), g(G), g(G), g(G),
        perf_counter()
    )
    print(f"{(t1-t0)/len(girths):.4f}", end=' ')
print()

@boothby
Copy link
Contributor

boothby commented May 25, 2023

I just noticed that this ticket is (now) related to #6340 and #6359 -- my approach is slightly different; it's worth looking at them holistically.

@boothby
Copy link
Contributor

boothby commented May 28, 2023

@dschult or @rossbar please take a (hopefully) final look. I should really be done picking at this.

Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @alabarre I bet you didn't expect this explosion of activity, but that shows that it is an interesting issue -- and it got us thinking about more than just girth. :)

Thanks @boothby for making this shine! :}

@dschult dschult merged commit af0a6e2 into networkx:main May 28, 2023
34 checks passed
@alabarre
Copy link
Contributor Author

Indeed I didn’t. I had no time to review or chime in on the latest developments, but it’s exciting to see what this humble snippet has become. Thanks to everyone involved, I’m looking forward to our next collaborations.

@jarrodmillman jarrodmillman added this to the 3.2 milestone Jun 4, 2023
Alex-Markham pushed a commit to Alex-Markham/networkx that referenced this pull request Oct 13, 2023
* added girth computation function

* added tests for girth

* removed useless method and added seed to random call as recommended

* Add numpydoc reference formatting to docstring.

* Apply suggestions from code review

* Update networkx/algorithms/tests/test_cycles.py

Co-authored-by: Dan Schult <dschult@colgate.edu>

* Update networkx/algorithms/tests/test_cycles.py

* Blacken.

* Add girth fn to reference.

* avoid frozenset and include other comments

* fix import order

* performance improvement to girth; added test for isolated nodes

* Update networkx/algorithms/cycles.py

Co-authored-by: Dan Schult <dschult@colgate.edu>

* took Dan's suggestion to use nx.predecessor to obviate nx.single_source...

* reordered computation to only find one min-length (even/odd) cycle per root

* copying is no longer worthwhile for runtime

* rewrote girth for speed and hopefully clarity
* added new helper function bfs_labeled_edges
* added support for even girth and odd girth

* doctest typos

* doctest fix

* doctests were a mistake

* removed faulty parity argument

* patched up un-tested unit test

* unified directed & undirected bfs_labeled_edges

* touchup on bfs_labeled_edges; added _dispatch

* more performance improvements

---------

Co-authored-by: Ross Barnowski <rossbar@berkeley.edu>
Co-authored-by: Mridul Seth <mail@mriduls.com>
Co-authored-by: Dan Schult <dschult@colgate.edu>
Co-authored-by: Kelly Boothby <boothby@dwavesys.com>
dschult added a commit to BrunoBaldissera/networkx that referenced this pull request Oct 23, 2023
* added girth computation function

* added tests for girth

* removed useless method and added seed to random call as recommended

* Add numpydoc reference formatting to docstring.

* Apply suggestions from code review

* Update networkx/algorithms/tests/test_cycles.py

Co-authored-by: Dan Schult <dschult@colgate.edu>

* Update networkx/algorithms/tests/test_cycles.py

* Blacken.

* Add girth fn to reference.

* avoid frozenset and include other comments

* fix import order

* performance improvement to girth; added test for isolated nodes

* Update networkx/algorithms/cycles.py

Co-authored-by: Dan Schult <dschult@colgate.edu>

* took Dan's suggestion to use nx.predecessor to obviate nx.single_source...

* reordered computation to only find one min-length (even/odd) cycle per root

* copying is no longer worthwhile for runtime

* rewrote girth for speed and hopefully clarity
* added new helper function bfs_labeled_edges
* added support for even girth and odd girth

* doctest typos

* doctest fix

* doctests were a mistake

* removed faulty parity argument

* patched up un-tested unit test

* unified directed & undirected bfs_labeled_edges

* touchup on bfs_labeled_edges; added _dispatch

* more performance improvements

---------

Co-authored-by: Ross Barnowski <rossbar@berkeley.edu>
Co-authored-by: Mridul Seth <mail@mriduls.com>
Co-authored-by: Dan Schult <dschult@colgate.edu>
Co-authored-by: Kelly Boothby <boothby@dwavesys.com>
cvanelteren pushed a commit to cvanelteren/networkx that referenced this pull request Apr 22, 2024
* added girth computation function

* added tests for girth

* removed useless method and added seed to random call as recommended

* Add numpydoc reference formatting to docstring.

* Apply suggestions from code review

* Update networkx/algorithms/tests/test_cycles.py

Co-authored-by: Dan Schult <dschult@colgate.edu>

* Update networkx/algorithms/tests/test_cycles.py

* Blacken.

* Add girth fn to reference.

* avoid frozenset and include other comments

* fix import order

* performance improvement to girth; added test for isolated nodes

* Update networkx/algorithms/cycles.py

Co-authored-by: Dan Schult <dschult@colgate.edu>

* took Dan's suggestion to use nx.predecessor to obviate nx.single_source...

* reordered computation to only find one min-length (even/odd) cycle per root

* copying is no longer worthwhile for runtime

* rewrote girth for speed and hopefully clarity
* added new helper function bfs_labeled_edges
* added support for even girth and odd girth

* doctest typos

* doctest fix

* doctests were a mistake

* removed faulty parity argument

* patched up un-tested unit test

* unified directed & undirected bfs_labeled_edges

* touchup on bfs_labeled_edges; added _dispatch

* more performance improvements

---------

Co-authored-by: Ross Barnowski <rossbar@berkeley.edu>
Co-authored-by: Mridul Seth <mail@mriduls.com>
Co-authored-by: Dan Schult <dschult@colgate.edu>
Co-authored-by: Kelly Boothby <boothby@dwavesys.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants