New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhancement of is_triangle_free' addition of
triangles_count' and a minor change in `spanning_trees_count'
#13503
Comments
Author: Jernej Azarija |
comment:4
The patch needs a fix of the documentation before its ready for review. |
comment:5
Hello, I added Nathann in Cc because we had a discussion about counting triangles last week ;) The proposed method for testing if the graph is triangle free is not a good one. Indeed, the time complexity for testing if the graph is triangle free is at most N^w, where 2 <= w < 3 is the best possible exponent for computing the square of a matrix of side N. You compute the square of the adjacency matrix to get all paths of length 2, and then you test for every edge (u,v) if there is a path of length 2 from u to v. Furthermore, the basic algorithm of testing if some neighbors of a vertex are incident is fast. Let us compare the following method with your proposition for testing if the graph is triangle free.
We get the following running times:
For very small graphs it is apparently faster to do For counting triangles, the situation is rather different and the density of the graph matters. Furthermore, it is interesting to use networkx graphs:
We get the following running time:
As you can see, when the graph is dense, the matrix multiplication method is the fastest, but for spare graphs it is better to use iterations and to convert the graph into a networkx graph. However, I don't know how to choose the threshold and I don't have better proposals in mind yet. |
Reviewer: David Coudert |
comment:6
Hello, I agree with what you said. As you mentioned it is hard to deduce a threshold that is going to be good for all platforms (and times) so I suppose we could simply add an optional argument for the user to specify which algorithm to use. We could then also add the even faster triangle detection algorithms based on probability. |
comment:7
Yes, adding an argument to specify the algorithm is the best solution. |
comment:8
Also not to forget that the code for triangle detection would be much much faster if we compute AAA in the real field. Could you also check that? To compare with your other results? |
comment:9
It is true that AAA is a bit faster, but not in the real field where it seems very slow. |
comment:10
Replying to @dcoudert:
|
comment:11
Since the adjacency matrix is a matrix of integers (in fact booleans or bits), I expect matrix multiplication to be significantly faster than in any other rings. |
comment:12
HMmmm... I just read your tests, and did not get this one :
Could it be that there would be another difference in runtimes if one requires the adjacency matrix to be dense ? Nathann |
comment:13
If the graph is dense, the first visited vertex will have a triangle with high probability. So after a very small number of iterations and so operations, the algorithm returns False. But with matrix multiplications, you have to pay the full cost whatever the result. Of course, if the graph is dense and triangle free (not so many such graphs), my algorithm will take long time, and possibly more than matrix multiplications, but in average it is faster. Do your own experiments to convince yourself. So for testing if G is triangle free, it is good to have a parameter to choose the algorithm, and a default behavior like: if G has less than 15 nodes, then do matrix multiplications, else use other algo. For counting triangles, the situation is different and we clearly have to take into account the density to choose the best algorithm. |
comment:14
Two interesting (worst case) examples:
|
comment:15
Replying to @dcoudert:
I'd guess that managing (potentially) large integers has more overhead than managing doubles:
I can't recall seeing a Sage class for integers of a bounded size. |
comment:16
I'm certainly not used to matrix multiplications and rings in sage, and I'm surprised that computing with doubles is faster than with integer. For instance, if G has Following your example, and after some search, I found a much faster solution: use GF(2)
However, this is not always the best solution and it is extremely slow for sparse matrix O_o
I don't know how to force a matrix to be dense. for "small" graphs, it is apparently always dense. Clearly, the algorithm should use tradeoffs between size and density. |
comment:17
|
comment:18
While matrix multiplication over GF(2) is quite fast, it is wrong for the adjacency matrix. The trace of the cube of an adjacency matrix is always zero over GF(2). |
comment:19
That's a pity :( Any interesting alternatives? |
comment:20
"dense" in Sage is an implementation detail. Any matrix can be made dense or sparse, even if it is not a good idea for the algorithm at hand. In the discussion here, the actual "density" of the adjacency matrix is relevant for which approach (matrix algebra, vertex neighborhoods) might be faster. |
Changed reviewer from David Coudert to none |
Changed author from Jernej Azarija to Jernej Azarija, David Coudert |
comment:25
Hellooooooooo David !! First, you do not know how glad I am to hear things like "the difficulty with graph algorithms is that the efficiency depends on the data structure" from the mouth of researchers. "Adjacency test ? O(1) !" is their usual answer. I have been thinking of reimplementing another version using FastGraph, but that would be more trouble than necessary for the moment. And I could actually use the same technique to reimplement SubgraphSearch a bit better anyway, so I guess I will go there directly. About the patch :
Thanks for that patch !! nathann |
comment:26
It was in the original patch from Jernej Azarija ([trac_17334_triangle_free_speedup.patch]). I don't know if its true or not, and since the gain is very limited, we could put it back to be on the safe side.
Jernej should answer that remark.
OK. I have implemented some of the modifications.
I was also thinking of some nice and fast methods from Alon et al., or from Latapy (see the code in c and the survey at http://www-rp.lip6.fr/~latapy/Triangles/). We could add them (in particular the compact-forward method) at the cost of adding an interface and a method for converting the graph into the tricky data structure used by Latapy. |
comment:27
Replying to @dcoudert:
abs is not needed. Perhaps we can still remove it from this patch and I add it later in a generic patch fixing various small details in graph.py. The reason it is not needed is that Kirchhoff matrix tree theorem states that the number of spanning trees equals det(L')*(-1)^(i+j}+) where L' is the matrix that is obtained from the Laplacian after removing row i and column j.
I am not really sure what to do here.
Haven't looked at the site yet, but is it hard to implement the algorithm directly into sage? |
comment:28
Replying to @sagetrac-azi:
yes, it is better to remove the abs in a patch dedicated to small details ;-)
It's difficult in python. The method is fast because it uses optimized data structure. It can be done in cython, but then it is certainly better to only implement the interfaces with the original c code of Latapy. We can easily obtained agreement from him for integrating is code in sage. |
comment:30
Can someone review this patch? |
comment:31
Well. Why wouldn't you ? It looks like David updated the patch last, so you can review his changes.. I mean, for as long as two persons look at the file together and agree that it is all good, we usually accept this as a review and put both names as Authors and Reviewers Nathann |
comment:32
Oh, didn't know I can do that too. Here we go... I have tested the code briefly and it looks good to me as far as correction goes. I have two additional remarks though.
Am I missing something or this should be equivalent (to the slightly more optimal):
|
comment:34
Attachment: trac_13503_triangles-v2.patch.gz
|
comment:35
Hello!
What do you guys think? |
comment:36
I agree. |
comment:37
Okay. Could you (dcoudert) pleaase make another ticket that basically explains what you said under 1? We can then discuss this there. I'll mark this ticket with a positive review now. |
Reviewer: Jernej Azarija |
Merged: sage-5.6.beta1 |
The main change of this patch is to improve the `is_triangle_free' method. The introduced algorithm uses the fact that given that A is the adjacency matrix of a simple graph G, then the number of triangles in G is tr(A^3)/6.
The change has been tested resulting in the following benchmarks:
I can provide the test programs if needed.
Since there are many applications in which one needs to count the number of triangles in a graph the named routine was also added.
Finally a minor change has been made to the generic_graph function spanning_trees_count() which calls abs on a determinant that is always positive.
Apply:
CC: @rbeezer @nathanncohen
Component: graph theory
Keywords: triangles, number of triangles
Author: Jernej Azarija, David Coudert
Reviewer: Jernej Azarija
Merged: sage-5.6.beta1
Issue created by migration from https://trac.sagemath.org/ticket/13503
The text was updated successfully, but these errors were encountered: