New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up some graph iterations #13730
Comments
comment:1
Hello, dcoudert!! First thing that I am wondering is why is sage constructing a set from the iterator and then returns the iterator of the set. Does anyone happen to know that? It does not seem to affect computational speed but I do not see why it is necessary. Anyone happens to know why is set() required in this case?
|
comment:2
I'm not expert enough in iterators to answer. Furthermore, the backend for graphs also constructs sets:
I don't know the reasons behind but this can certainly be improved/simplified. David. |
comment:3
Agree! It seems like something suspiciously complex is being done in these iterators (lists, to iterators, to sets, to iterators to sets..) I've played around and removed the (intuitively) unneeded parts and observed that there is a minor performance gain. But if we want to be as fast as networkx then we'd need to keep a hash table of neighbors for each vertex. Before trying to implement this - does sage allow graphs with unhashable vertices? |
comment:4
NOnonono, vertices are hashable or everything would break. I mean, or instance the code you just printed would not work at all !
Nathann |
comment:5
Hello guys! I have implemented a stupid "fix" by adding a hash table to the graph object storing neighbours for every vertex of the graph. I then modified the add_edge/delete_edge methods to update the hash table accordingly. The speedup was drastic (as with networkX graphs) and it also speed up other graph theory functions (since many things rely on obtaining the neighbours of some vertices) HOWEVER. Many of the graph theory tests made sage crash which makes uncomfortable in posting the patch here. I am not aware enough of the graph internals to be able to write a sane patch that is not just some naive fix. Hence I call someone that is more experienced with the c_graph implementation to add this feature. It is really really worth doing it! |
comment:6
Hello, you should post your patch so that others can try it and help identifying and fixing the problem. The status of this patch could be "needs work" or "needs info". Best, |
comment:8
Attachment: trac_17710_needs_info.patch.gz Hello. I tried to implement what azi did. I added the hash tabel and update some methods. I didn't make other tests. Best regards, slani
|
comment:11
.... This patch does not even contain an instruction to remove an element from your cached list when an edge is deleted.... This problem has to be fixed, but it has to be fixed properly... If the backend is slow it has to be fixed there or the backend should be changed. Nathann |
comment:12
Hello I've added the function to handle vertex removals as well. I am aware that this solution is not good but I just wanted to write something to get us going. To make it work properly I would have to change the respective backends but I am clueless about the internal structure of the backends. Is there anyone here that can shed some light on it or point to the relevant documentation - I was not able to find any. Where should this improvement be implemented? |
Attachment: trac_17711_needs_info.patch.gz |
comment:13
Your updated patch would fail if there are two edges between two vertices, and the user removes one of the two edges. You unnecessarily check whether the dictionaries contain a list associated to a vertex of the graph, and it would probably be better to associate an empty list to each vertex from the beginning. Anyway you have to make TESTS to decide what to do. The functions you add are not doctested. You do not document the features you add either. I think that it would be better to add what you need in the code directly, without creating 2 Please, do not write this patch.
There isn't, that's one of the problems. I write some comments in the code about some parts of it when I have to understand how it works to fix a bug.
I do not know. What I would do if I were to write this patch, or what I will do if you force me to write it, is try to understand why the current implementation is so slow compared to dictionaries. I would try to locate the place which makes a difference in the timings. This has to come from the way edges are stored, and from what I remember this part of the code is not that clear. Nathann |
Attachment: trac_17710_iter_set.patch.gz |
comment:16
Hello, on picture you can see a diagram what happens when we call neighbors() on graph. You can see there is a lot conversion from set/iter and then back again. Where this conversion is not necessary I remove it. But most time reduction was made (for ours specific test), when function neighbor_iterator (F2) start calling iterator_out_nbrs (F4.2) for undirected graph instead iterator_nbrs (F3.1). If graph is undirected we only need out nbrs. or in nbrs.??????????
I also remove
in iterator_in_nbrs (F 4.1). I don't know what relation this has with sparse/dense. But most time consuming problem is conversion from vertex int to vertex label in iterator_out_nbrs (F4.2) and iterator_in_nbrs (F4.1) This conversion for our specific test takes almost 500 µs. I tried to implement array of python object in cython, where we can save vertex labels, but I was unsuccessful, because I don't know cython well (yet). Maybe someone can help me? If this is a good solution then we can return iter of vertex labels instead of list of vertex int from cpdef list out_neighbors (F5). So what do you think? Best, |
Attachment: trac_13730-v2-vd.patch.gz |
comment:21
Hi, I appologize, I was a bit old school in my precedent message. A long time ago, Cython did not support the yield statement and it was necessary either to write a class dedicated to iteration or to build the set on which we would like to iterate and return I upload a patch where all iterators does not build the underlying set. The gain is not extraordinary (only a x2) and the output order is completely messed up (this need to be fixed). |
comment:22
Hello, I tried with yield in iterator_out_nbrs (l:1745, c_graph.pyx) but it does not improve anything.
I did a lot of modification and testing (also tried with new function in cython which use yield for returning nbrs), but didn't improve anything. I also made a function (in generic_graph.py) which directly call cpdef list out_neighbors (base/sparse_graph.pyx; l:798):
It is still slower than NetworkX! I don't know, but I came to the conclusion that we have tow problems:
Vincent, what do you think? This is also interesting for me:
Best, |
comment:23
Replying to @lobiCode:
This should not slow down too much (not x4). It is a lookup in an array. Did you make some timings to see whether this is slow ? Perhaps we have to dig the way it is implemented.
It might be a good idea to see whether iterator over int vertices are faster compared to the labeled one. Other possibilities
|
comment:24
Replying to @videlec:
This is a timing when I'm calling cpdef list out_neighbors (base/sparse_graph.pyx; l:798) directly (no conversions vertex int to vertex labels)
|
comment:25
Replying to @videlec:
|
comment:26
That may be interesting
|
comment:27
What about #14806 ? Nathann |
comment:28
Okay this is even more interesting and it indicates the problem is not really in the backend per se.
BUT:
What do you guys thin is going on here? As for your static graphs thing If nobody wakes up then I promise to review it by the weekend! :-) |
comment:29
And also the relevant profilings.. If they are of any help..
|
comment:30
Welll... Is it just forwarding the values returned by out_neighbors through neighbor_iterator ?
That'd be cool Nathann |
comment:31
Replying to @nathanncohen:
YES. But why does it take so long? This changes the runtime by a huge order of magnitude.
|
comment:32
HMmmmm... I'd say "just Python being Python" Nathann |
comment:33
Yes! The only confusing thing to me is why then is NetworkX (called without going through the frontend) efficient? (confusing because NetworkX is in Python)
So there is some Python being Python thing going on in the Graph frontend class. |
comment:35
Here is a wrap up of what's going on with this problem https://groups.google.com/forum/#!topic/sage-devel/CwKSNWXVJtU |
comment:39
in view of recent progresses (see #28895), may be we can move this ticket to wont fix and close it. |
Several graph methods can be significantly speed up improving the data structure, or at least the way some basic operations are performed.
For instance, many functions are faster using networkx graphs (conversion time included) instead of sage graphs.
In fact, NetworkX uses dictionaries to store edges. If G is a networkx graph, it is also a dictionary indexed by the vertices, and G[u] is a dictionary indexed by neighbors and containing edge data. This way, iterations are fast.
We should at least improve iteration over the neighbors.
CC: @sagetrac-azi @nathanncohen @lobiCode @sagetrac-brunellus @videlec
Component: graph theory
Issue created by migration from https://trac.sagemath.org/ticket/13730
The text was updated successfully, but these errors were encountered: