[Feature Request] Count triangles for edges #967

Open
alexperrone opened this Issue Oct 28, 2016 · 8 comments

Projects

None yet

4 participants

@alexperrone

The function count_triangles() gives the number of triangles each
vertex is in.

Feature request: a function to count the number of triangles each edge occurs in.

Justification: the number of triangles an edge occurs in is important for computing certain clustering algorithms, for example the k-truss algorithm which requires the "support" (number of triangles) for each edge.

For reference of k-truss, see: https://arxiv.org/pdf/1205.6693.pdf

@szhorvat
Contributor

This is very easy to compute without igraph.

Suppose i-j is an edge.

Take rows i and j of the adjacency matrix and compute their scalar product.
That's your answer.

On 28 October 2016 at 20:04, alexperrone notifications@github.com wrote:

The function count_triangles() gives the number of triangles each
vertex is in.

Feature request: a function to count the number of triangles each edge
occurs in.

Justification: the number of triangles an edge occurs in is important for
computing certain clustering algorithms, for example the k-truss algorithm
which requires the "support" (number of triangles) for each edge.

For reference of k-truss, see: https://arxiv.org/pdf/1205.6693.pdf


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#967, or mute the thread
https://github.com/notifications/unsubscribe-auth/ABKBx03H9GeYRqOCni4cDf4PEF62EXsHks5q4jlCgaJpZM4Kjs_P
.

@alexperrone

That seems to work, but how to do it in igraph? I still need the edge IDs, for example. It took over 1 second just to run get.adjacency(g) on a small (8 nodes, 14 edges) undirected igraph object g, before even doing the i-j scalar product. Whereas count_triangles(g) is basically instant to run (~.001 seconds).

@gaborcsardi
Member

@alexperrone I am not sure what you are doing, but get.adjacency does not take 1 second for me:

g <- sample_gnm(8, 14)
❯ system.time(A <- get.adjacency(g))
   user  system elapsed
  0.001   0.000   0.001
``
@alexperrone

Wow, I was doing the equivalent of:

> g <- sample_gnm(8, 14)
> plot(g)
> system.time(A <- get.adjacency(g))
   user  system elapsed 
  0.050   0.006   0.054 
> system.time(count_triangles(g))
   user  system elapsed 
 0.002   0.000   0.002 

Apparently, the system.time somehow is sensitive to commands before it, like plotting. I don't know how I got 1 second on my data (I think more intensive plotting preceding), but anyway, you should be able to reproduce above to see how one could get perception that .05 >> .002

@gaborcsardi
Member

Because it probably contains garbage collection.

@alexperrone

Thanks to @szhorvat I was able to accomplish this. However, it still may be good to have in igraph.

Here's what I did.

library(igraph)
set.seed(1)
g <- sample_gnm(6, 9)
plot(g)
A <- get.adjacency(g)
triangle_matrix <- as.matrix(Matrix::tcrossprod(A))  # scalar product of rows
E(g)$num_triangles <- triangle_matrix[ends(g, E(g))]  # look up each edge
E(g)[[]]  # show results
@alexperrone

Any ideas on how to do this for each edge that would be more efficient than matrix multiplication shown in previous post (Oct 29)? Or do you think it is not worth it to pursue? The functions in igraph/src/triangles.c, e.g. igraph_transitivity_local_undirected, compute for each vertex and seem difficult to transform to do it for each edge.

@ntamas
Member
ntamas commented Dec 13, 2016

If you are using a sparse matrix for A, then this is probably as good as it gets - the alternative would be to query the neighbor sets for each vertex in advance, and do repeated set intersections for pairs of vertices i and j if they are connected by an edge. This is exactly what the matrix multiplication does anyway. There might be some leeway in optimizing the set intersection operation, but it would have a measurable effect only if the entire looping over the edges and the set intersection operation are both implemented in C -- otherwise a sparse matrix multiplication (which I assume is also implemented in C) probably beats it hands down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment