equality of graphs is 60 times slower than equality of their list of edges #30641

videlec · 2020-09-23T08:58:23Z

On sage 9.1

sage: G = Graph(loops=False, multiedges=False)
sage: G.add_edges([(i, (i+85)%100) for i in range(100)])
sage: G.add_edges([(i, (i+37)%100) for i in range(100)])
sage: G.add_edges([(i, (i+85)%100) for i in range(100)])
sage: H = G.copy()
sage: %timeit G == H
196 µs ± 708 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
sage: E = list(G.edges())
sage: F = list(H.edges())
sage: %timeit E == F
3.3 µs ± 5.33 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Using immutable graphs is better by a factor 2 (and hence "only" 30x slower than list comparisons)

sage: E = [(i, (i+85)%100) for i in range(100)] + \
....:     [(i, (i+37)%100) for i in range(100)] + \
....:     [(i, (i+85)%100) for i in range(100)]
sage: G = Graph(E, loops=False, multiedges=False, immutable=True)
sage: H = G.copy()
sage: %timeit G == H
111 µs ± 2.01 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Depends on #30645
Depends on #30665
Depends on #30776

CC: @kliem @dimpase @dcoudert

Component: graph theory

Keywords: thursdaysbdx

Issue created by migration from https://trac.sagemath.org/ticket/30641

The text was updated successfully, but these errors were encountered:

videlec · 2020-09-23T09:01:57Z

Changed keywords from none to thursdaysbdx

dcoudert · 2020-09-23T09:15:51Z

comment:2

So far, G.edges() returns a sorted list of edges. So the test E == F is simply a lexicographic comparison of lists, omitting the time to extract and sort the list of edges. The reporting time comparison is therefore not completely fair.

Furthermore, since the switch to Python3, we can no longer rely on sorted list of edges (vertices and edge labels may be of different types, leading to an error when trying to sort the list). One objective is to deprecate this behavior.

I think that the only way to speed up the test G == H, is to speed up the test other.has_edge(...).

videlec · 2020-09-23T09:25:49Z

comment:3

The reason why I created this ticket is that I have code which creates a big list of graphs and where I want to remove duplicates. Currently, the program spends more than 50% of its time in graph equality which is ridiculous.

dcoudert · 2020-09-23T09:43:10Z

comment:4

I agree. I recently made a #30510 to speed up method subgraph which was ridiculously slow. It's better now.

Here, I don't know how to speed up the method without going deep into the backends...

videlec · 2020-09-23T09:53:47Z

comment:5

I am fine with "going deep into the backends". My graphs have vertices {0, 1, ..., n-1} and there is no multiple edge. The weights are integers (but I don't think this is taken into account in the backend). The comparison of such graphs should be faster than the generic Python list comparison.

kliem · 2020-09-23T10:13:08Z

comment:6

Already the edge iterator is pretty slow. It takes 76 µs out 175 for me.

dcoudert · 2020-09-23T11:08:40Z

comment:7

for your code, it could be better to work with immutable graphs.

sage: sage: E = [(i, (i+85)%100) for i in range(100)] + [(i, (i+37)%100) for i in range(100)] + [(i, (i+85)%100) for i in range(100)]     
sage: G = Graph([range(100), E], format='vertices_and_edges', loops=False, multiedges=False, immutable=True)                                  
sage: H = G.copy()                                                                                                                  
sage: %timeit G == H                                                                                                                
131 µs ± 1.78 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
sage: sage: G = Graph(loops=False, multiedges=False) 
....: sage: G.add_edges([(i, (i+85)%100) for i in range(100)]) 
....: sage: G.add_edges([(i, (i+37)%100) for i in range(100)]) 
....: sage: G.add_edges([(i, (i+85)%100) for i in range(100)]) 
....: sage: H = G.copy() 
....: sage: %timeit G == H 
....:                                                                                                                               
241 µs ± 8.24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

But of course it would be better to improve the backends to get faster edge iterator, edge membership tests, equality test, etc.

kliem · 2020-09-23T12:55:36Z

comment:9

I wrote a quick dirty solution that counts compares two vertices at a time and counts how much the out arcs differ. This takes about 34 µs instead of 175 using out_arc_unsafe.

kliem · 2020-09-23T13:47:38Z

comment:10

I will start with a tiny ticket that does some trivial optimizations for has_edge.

kliem · 2020-09-26T07:56:21Z

Dependencies: #30645, #30665

kliem · 2020-09-26T15:13:53Z

comment:12

#30645 and #30665 are a good start I would say.

Things are set up in a way, that we can implement an optimized containment check in CGraph.

videlec · 2020-09-26T15:22:17Z

comment:13

Nice. Thanks Jonathan for your efforts!

I also wonder if we could implement a comparison that would bypass the translation between "vertex labels" and "integers in {0, ..., n-1}". Namely have a comparison of the internal representations (assuming the backend is the same).

dcoudert · 2020-09-26T15:58:00Z

comment:14

Does it make sense ? Currently, we check that the graphs have same settings and sets of vertices and edges. However, the internal representation might differ a lot. Suppose one graph G is the result of many additions/deletions of vertices of edges. Typically, at some point it has 100000 vertices, but at the end of the sequence of operations, it remains only 2. Internal bitsets/data structures might be very large. Now, if H is a copy of G, it has a very compact internal representation. The 2 graphs are equal, but have different internal representations.

videlec · 2020-09-26T16:06:15Z

comment:15

That is one use case. Here is another. I am considering all trivalent graphs up to isomorphism on n vertices. I want to construct the flip graph where I put an edge between the trivalent graph G1 and trivalent graph G2 if they differ by a Whitehad move

 A          C                      A          C
  \        /                        \        /
   \      /                          \      /
    u -- v          ----------->      u -- v
   /      \                          /      \
  /        \                        /        \
 B          D                      D          B

In order to generate this flip graph, one has to be able to identify the graph after a move. Since canonical labels are computed we need to compare graphs on {0, ..., n-1} with no surprise... and these are a lot of comparisons!

dcoudert · 2020-09-26T17:09:18Z

comment:16

I see, not easy to do it efficiently.

kliem · 2020-09-26T20:10:00Z

comment:17

To check whether one CGraph is contained in another one can store a translation array at the beginning. If you can guarantee that the vertices match, you can of course skip this step.

After this initial step is pretty much the same as #30665. Just that you check has_arc_unsafe of other instead of yield. Things are a bit more complicated for multiple edges.

If you can guarantee the vertices to match, things should be really fast for dense graphs, but of course only, if you are somewhat dense (e.g. if n <= 64). Here you can just immediately compare to vertices.

kliem · 2020-10-16T08:59:45Z

Changed dependencies from #30645, #30665 to #30645, #30665, #30776

kliem · 2020-10-16T08:59:45Z

comment:18

Things are improving, I think.

IMO what would still improve things here:

better heuristics for vertex translation to int,
factor out the backends to standard backends and backends with vertices in range(n) (we could still have active vertices or not, but all vertices need to be in range(n), if you add a new vertex that is too large, you will get a memory problem, but that is your problem),
implement specialized versions of is_subgraph and subgraph_given_vertices for dense graphs over range(n) (we need a good name for this graph backend yet), this should be much faster than the current methods

So instead of just CGraphBackend we would have CGraphBackend, CGraphVertices and CGraphRangeVertices (not good names yet). DenseGraphBackend as we know it would have to inherit from CGraphBackend and CGraphVertices and DenseGraphBackendRangeVertices would have to inherit from CGraphBackend and CGraphRangeVertices. Likewise for sparse and static sparse.

Does this sound plausible?

Does anyone have good suggestions for a naming scheme?

kliem · 2020-10-16T09:03:11Z

comment:19

Btw, dense graphs should really use bitsets. I don't understand why it isn't. Eventually this might be sped up with intrinsics. E.g. an iterator over a bitset should be much faster with an improved leading zero count (_lzcnt_u64).

dcoudert · 2020-10-16T09:25:01Z

comment:20

I don't know if it's better to use bitsets for dense graphs. In case, we can use src/sage/data_structures/binary_matrix.<pxi|pxd>.

seblabbe · 2020-10-16T18:29:27Z

comment:21

Replying to @kliem:

So instead of just CGraphBackend we would have CGraphBackend, CGraphVertices and CGraphRangeVertices (not good names yet). DenseGraphBackend as we know it would have to inherit from CGraphBackend and CGraphVertices and DenseGraphBackendRangeVertices would have to inherit from CGraphBackend and CGraphRangeVertices. Likewise for sparse and static sparse.

Does this sound plausible?

Does anyone have good suggestions for a naming scheme?

This is how DisjointSet (union-find data structure) is implemented in Sage.

def DisjointSet(arg):
cdef class DisjointSet_class(SageObject):
cdef class DisjointSet_of_integers(DisjointSet_class):
cdef class DisjointSet_of_hashables(DisjointSet_class):

in the sense that DisjointSet_of_integers uses directly the internal representation and DisjointSet_of_hashables uses maps from integers in range(n) to the hashable objects and vice versa. See https://github.com/sagemath/sage-prod/blob/develop/src/sage/sets/disjoint_set.pyx

I don't know if the naming scheme _of_integers and _of_hashables I chosen at the time is good or not.

dcoudert · 2020-12-29T17:28:42Z

comment:23

Timing with 9.3.beta5 on a macbook air. It seems we are much better now !

sage: G = Graph(loops=False, multiedges=False) 
sage: G.add_edges([(i, (i+85)%100) for i in range(100)]) 
sage: G.add_edges([(i, (i+37)%100) for i in range(100)]) 
sage: G.add_edges([(i, (i+85)%100) for i in range(100)]) 
sage: H = G.copy() 
sage: %timeit G == H 
43.1 µs ± 821 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
sage:
sage: E = list(G.edges()) 
sage: F = list(H.edges()) 
sage: %timeit E == F 
12.4 µs ± 357 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
sage:
sage: E = [(i, (i+85)%100) for i in range(100)] + \ 
....:     [(i, (i+37)%100) for i in range(100)] + \ 
....:     [(i, (i+85)%100) for i in range(100)] 
sage: G = Graph(E, loops=False, multiedges=False, immutable=True) 
sage: H = G.copy() 
sage: %timeit G == H                                                                                                                                               
31 µs ± 1.63 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

mkoeppe · 2021-02-13T20:51:01Z

comment:24

Setting new milestone based on a cursory review of ticket status, priority, and last modification date.

videlec added this to the sage-9.2 milestone Sep 23, 2020

videlec added c: graph theory labels Sep 23, 2020

This comment has been minimized.

Sign in to view

mkoeppe modified the milestones: sage-9.2, sage-9.3 Oct 24, 2020

mkoeppe modified the milestones: sage-9.3, sage-9.4 Feb 13, 2021

mkoeppe modified the milestones: sage-9.4, sage-9.5 Jul 19, 2021

mkoeppe modified the milestones: sage-9.5, sage-9.6 Dec 18, 2021

mkoeppe modified the milestones: sage-9.6, sage-9.7 Apr 2, 2022

mkoeppe modified the milestones: sage-9.7, sage-9.8 Aug 31, 2022

mkoeppe removed this from the sage-9.8 milestone Jan 29, 2023

This was referenced Jan 29, 2023

Meta ticket: Simplify and speed up graph backends #28895

Open

Optimize has_edge of graph backend #30645

Closed

Implement an improved subgraph test for graph backends #30776

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

equality of graphs is 60 times slower than equality of their list of edges #30641

equality of graphs is 60 times slower than equality of their list of edges #30641

videlec commented Sep 23, 2020

videlec commented Sep 23, 2020

dcoudert commented Sep 23, 2020

videlec commented Sep 23, 2020

dcoudert commented Sep 23, 2020

videlec commented Sep 23, 2020

kliem commented Sep 23, 2020

dcoudert commented Sep 23, 2020

This comment has been minimized.

kliem commented Sep 23, 2020

kliem commented Sep 23, 2020

kliem commented Sep 26, 2020

kliem commented Sep 26, 2020

videlec commented Sep 26, 2020

dcoudert commented Sep 26, 2020

videlec commented Sep 26, 2020

dcoudert commented Sep 26, 2020

kliem commented Sep 26, 2020

kliem commented Oct 16, 2020

kliem commented Oct 16, 2020

kliem commented Oct 16, 2020

dcoudert commented Oct 16, 2020

seblabbe commented Oct 16, 2020

dcoudert commented Dec 29, 2020

mkoeppe commented Feb 13, 2021

equality of graphs is 60 times slower than equality of their list of edges #30641

equality of graphs is 60 times slower than equality of their list of edges #30641

Comments

videlec commented Sep 23, 2020

videlec commented Sep 23, 2020

dcoudert commented Sep 23, 2020

videlec commented Sep 23, 2020

dcoudert commented Sep 23, 2020

videlec commented Sep 23, 2020

kliem commented Sep 23, 2020

dcoudert commented Sep 23, 2020

This comment has been minimized.

kliem commented Sep 23, 2020

kliem commented Sep 23, 2020

kliem commented Sep 26, 2020

kliem commented Sep 26, 2020

videlec commented Sep 26, 2020

dcoudert commented Sep 26, 2020

videlec commented Sep 26, 2020

dcoudert commented Sep 26, 2020

kliem commented Sep 26, 2020

kliem commented Oct 16, 2020

kliem commented Oct 16, 2020

kliem commented Oct 16, 2020

dcoudert commented Oct 16, 2020

seblabbe commented Oct 16, 2020

dcoudert commented Dec 29, 2020

mkoeppe commented Feb 13, 2021