Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

graph add_edges/delete_edges are dramatically slow #28259

Open
videlec opened this issue Jul 25, 2019 · 17 comments
Open

graph add_edges/delete_edges are dramatically slow #28259

videlec opened this issue Jul 25, 2019 · 17 comments

Comments

@videlec
Copy link
Contributor

videlec commented Jul 25, 2019

When dealing with dynamical graph (where vertices and edges have to change many times) then I suffer a lot from the slowness of add_edges and delete_edges. Compared to a pair of dictionary of lists (to save adjacencies) one get a x3 factor

sage: edges = [(randint(0,1000), randint(0,1000)) for _ in range(100000)]

plain defaultdict: 46.5ms

sage: dout = defaultdict(list); din = defaultdict(list)
sage: %time for i,j in edges: dout[i].append(j); din[j].append(i)
CPU times: user 46.9 ms, sys: 0 ns, total: 46.9 ms
Wall time: 46.5 ms

Sage DiGraph: 129ms

sage: D = DiGraph(multiedges=True, loops=True)
sage: %time D.add_edges(edges)
CPU times: user 129 ms, sys: 0 ns, total: 129 ms
Wall time: 129 ms

igraph: 18.7ms

sage: import igraph
sage: g = igraph.Graph()
sage: g.add_vertices(1001)
sage: %time g.add_edges(edges)
CPU times: user 18.5 ms, sys: 129 µs, total: 18.7 ms
Wall time: 18.7 ms

CC: @dcoudert

Component: graph theory

Keywords: days100

Issue created by migration from https://trac.sagemath.org/ticket/28259

@videlec videlec added this to the sage-8.9 milestone Jul 25, 2019
@videlec
Copy link
Contributor Author

videlec commented Jul 25, 2019

Changed keywords from none to days100

@videlec

This comment has been minimized.

@dcoudert
Copy link
Contributor

comment:3

It's true that we should work on optimizing the backend, and also simplifying it. But we have to be careful as it might have a strong impact (slowdown) on other kinds of operations.

@videlec
Copy link
Contributor Author

videlec commented Jul 29, 2019

Attachment: cf_prof_K2_Q2000_R5.pdf.gz

@videlec
Copy link
Contributor Author

videlec commented Jul 29, 2019

comment:4

For me, it is the other way around. My code is slow because of the slowness of vertex/edge manipulation... have a look at attachment: cf_prof_K2_Q2000_R5.pdf.

@videlec
Copy link
Contributor Author

videlec commented Jul 29, 2019

comment:5

The profiling is impressive: 95% of the computation is spent by the vertex/edge management

  • add_edge 19% (for ~250K calls)
  • delete_vertex 50% (for ~5K calls)
  • has_vertex 2.7% (for ~180K calls)
  • iterator_in_edges 1.8% (for ~250K calls)
  • iterator_out_edges 11% (for ~800K calls)

Of course, my computation use the graphs extensively. But still it does not only use that... The only thing I can do to make it faster is to either make Sage graphs better or get rid of them in my program. At this stage, I am not sure which direction to take. The sparse graph backend is a huge mess of code. Rewriting any part of it will take a considerable amount of time. I will probably write a custom graph class from scratch that fits my need. In a second time, I will see whether it can be plugged in as a backend for Sage.

@videlec

This comment has been minimized.

@dcoudert
Copy link
Contributor

comment:7

I agree that we must rewrite and simplify the graph backend and that it's not an easy task. As far as I know, it has been designed like that to allow some optimization of some specific use when vertices are only integers in 0..n-1 and are created in this order, but it has a cost for other usage.

Another important task is to better distinguish the label and the id of a vertex. Roughly, for most internal use, we should only manipulate integers and not the labels of the vertices.

@kliem
Copy link
Contributor

kliem commented Dec 20, 2019

comment:8

I created a meta ticket #28895 for all those tickets simplifying or improving graph backends.

Currently I'm working on improving the deleting of vertices. The solution should be, to move the reversed graph to the CGraph structure. The backend keeps a reversed copy of the graph, but the CGraph doesn't know it and can't use it. Therefore deleting vertices takes forever (the CGraph doesn't know the incoming edges, but the reversed one would have known it easily).

@embray
Copy link
Contributor

embray commented Dec 30, 2019

comment:9

Ticket retargeted after milestone closed

@embray embray modified the milestones: sage-8.9, sage-9.1 Dec 30, 2019
@mkoeppe
Copy link
Member

mkoeppe commented Apr 14, 2020

comment:10

Batch modifying tickets that will likely not be ready for 9.1, based on a review of the ticket title, branch/review status, and last modification date.

@mkoeppe mkoeppe modified the milestones: sage-9.1, sage-9.2 Apr 14, 2020
@mkoeppe mkoeppe modified the milestones: sage-9.2, sage-9.3 Aug 29, 2020
@kliem
Copy link
Contributor

kliem commented Oct 14, 2020

comment:12

#30769 might be a good step towards this goal.

With this ticket, I get the following timings:

  • igraph: 10 ms
  • dense DiGraph: 15 ms
  • sparse DiGraph: 32 ms

Note that sparse graphs also carry a copy of the reversed graph, so the actual arc is added twice.

Profiling suggests that this business of get_vertex_checked should be improved, especially in case of integers. For dense graphs this seems to take up almost all the time. (I'm not sure how to proceed with this.)

This also implies that improving the data structure for sparse graphs could give us a large speedup here. (But I'm even less sure how to proceed with this.)

@videlec
Copy link
Contributor Author

videlec commented Oct 14, 2020

comment:13

Replying to @kliem:

#30769 might be a good step towards this goal.

With this ticket, I get the following timings:

  • igraph: 10 ms
  • dense DiGraph: 15 ms
  • sparse DiGraph: 32 ms

Nice!

Note that sparse graphs also carry a copy of the reversed graph, so the actual arc is added twice.

Profiling suggests that this business of get_vertex_checked should be improved, especially in case of integers. For dense graphs this seems to take up almost all the time. (I'm not sure how to proceed with this.)

This is a pity. Maybe a flag .has_integral_vertices which implies that the graph has vertex set {0, 1, ..., n-1}? When the flag is set, one could bypass translations.

This also implies that improving the data structure for sparse graphs could give us a large speedup here. (But I'm even less sure how to proceed with this.)

I did try in the past to rewrite sparse graph (do not remember which ticket), but it ended up being painful. Ideally we should just be wrapping some pure C implementation but vertex labels and edge labels are delicate to deal with.

Also, we could introduce a new (simpler) datastructure similar to static sparse graphs but for which one can reallocate individually the arrays used for neighbor lists. This can be amortized in terms of graph creation and would benefit from all algorithms already implemented for static sparse graph. However, such data structure will be a bit costly for merging vertices.

@kliem
Copy link
Contributor

kliem commented Oct 14, 2020

comment:14

Apparently, our heuristics are better for strings than for integers. So something must be wrong:

This is on top of #30769:

sage: set_random_seed(0)                                                                                                                                                                                                                                                                                                                                                   
sage: edges = [(randint(0,1000), randint(0,1000)) for _ in range(100000)]                                                                                                                                                                                                                                                                                                  
sage: D = DiGraph(multiedges=False, loops=True, sparse=False)
sage: D.add_vertices([Integer(i) for i in range(1001)])                                                                                                                                                                                                                                                           
sage: %time D.add_edges(edges)                                                                                                                                                                                                                                                                                                                                             
CPU times: user 14.9 ms, sys: 0 ns, total: 14.9 ms
Wall time: 14.9 ms
                                                                                                                                                                                                                                                                                    
sage: set_random_seed(0)       
sage: strings = ["{}".format(i) for i in range(1001)]                                                                                                                                                                                                                                                                                       
sage: edges = [(strings[randint(0,1000)], strings[randint(0,1000)]) for _ in range(100000)]                                                                                                                                                                                                                                                                                
sage: D = DiGraph(multiedges=False, loops=True, sparse=False)
sage: D.add_vertices(strings)                                                                                                                                                                                                                                                                                     
sage: %time D.add_edges(edges)                                                                                                                                                                                                                                                                                                                                             
CPU times: user 11.2 ms, sys: 0 ns, total: 11.2 ms
Wall time: 11.3 ms

sage: set_random_seed(0)                                                                                                                                                                                                                                                                                                                                                   
sage: edges = [(int(randint(0,1000)), int(randint(0,1000))) for _ in range(100000)]                                                                                                                                                                                                                                                                                        
sage: D = DiGraph(multiedges=False, loops=True, sparse=False)                                                                                                                                                                                                                                                                                                              
sage: D.add_vertices(range(1001))                                                                                                                                                                                                                                                                                                                                          
sage: %time D.add_edges(edges)                                                                                                                                                                                                                                                                                                                                             
CPU times: user 14.6 ms, sys: 0 ns, total: 14.6 ms
Wall time: 14.6 ms

@dcoudert
Copy link
Contributor

comment:15

in your timing, strings is not defined.

@kliem
Copy link
Contributor

kliem commented Oct 14, 2020

comment:16

Replying to @dcoudert:

in your timing, strings is not defined.

Fixed.

@mkoeppe
Copy link
Member

mkoeppe commented Feb 13, 2021

comment:17

Setting new milestone based on a cursory review of ticket status, priority, and last modification date.

@mkoeppe mkoeppe modified the milestones: sage-9.3, sage-9.4 Feb 13, 2021
@mkoeppe mkoeppe removed this from the sage-9.4 milestone Jul 19, 2021
@mkoeppe mkoeppe added this to the sage-9.5 milestone Jul 19, 2021
@mkoeppe mkoeppe modified the milestones: sage-9.5, sage-9.6 Dec 18, 2021
@mkoeppe mkoeppe modified the milestones: sage-9.6, sage-9.7 Apr 1, 2022
@mkoeppe mkoeppe modified the milestones: sage-9.7, sage-9.8 Aug 31, 2022
@mkoeppe mkoeppe removed this from the sage-9.8 milestone Jan 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants