# Avg. clustering coefficient calculated incorrectly on graph #625

Closed
opened this Issue May 16, 2012 · 4 comments

Projects
None yet
4 participants
Contributor

### brycethomas commented May 16, 2012

 Hi, Consider the following graph: http://i.imgur.com/7hMZq.png Gephi (0.8.1) calculates the following local clustering coefficients: http://i.imgur.com/ilT1f.png Gephi averages these: `(1.0 + 0.33333 + 1.0 + 0.0) / 4 = 0.583`, and reports this number (`0.583`) as the Avg. Clustering Coefficient. This result is inconsistent with the algorithm from Main-memory Triangle Computations for Very Large (Sparse (Power-Law)) Graphs that Gephi claims to implement. The author of the aforementioned paper (Latapy) has his own C implementation of the algorithm, which produces `0.7777`. I surmise that the mistake Gephi is making is that it follows Latapy's advice of not calculating a local clustering coefficient for nodes with degree `<= 1`, but then includes this node in the count of total nodes anyway. In the example I gave above, I am referring to node Four, whose clustering coefficient is calculated as `0.0`. Note that if we were to do `(1.0 + 0.33333 + 1.0 + 0.0) / 3` instead of `(1.0 + 0.33333 + 1.0 + 0.0)/4` we would get `0.7777`, which is consistent with the result of Latapy's implementation. To throw another spanner in the works, there is a second potential problem, but I am waiting for this to be confirmed by Latapy. This second problem is that it appears to me that Latapy himself implements the clustering coefficient algorithm differently to how it was first described in Collective dynamics of small world networks by Watts & Strogatz, which he references. I didn't see any mention in Watts & Strogatz paper to suggest nodes with degree `<= 1` should be excluded from the calculations at all. To summarise, I believe clustering coefficient is implemented incorrectly in Gephi. I suggest as a first step at least ensuring it is consistent with Latapy's implementation, and then later on figure out whether Latapy's implementation is itself inconsistent with the original definition of Avg. clustering coefficient. I haven't worked on the Gephi code base before, but if someone can point me to the right files, I'm willing to have a go at fixing it.
Contributor

### brycethomas commented May 16, 2012

 Just to clarify, I believe the current Gephi implementation to be incorrect regardless of whether you consider Latapy's or Watts & Strogatz to be the "proper" way of implementating Avg. clustering coefficient.

### phreakocious commented May 16, 2012

 I can't speak to your claim, but you can certainly see the source code here: https://github.com/gephi/gephi/blob/master/StatisticsPlugin/src/org/gephi/statistics/plugin/ClusteringCoefficient.java

Merged

Contributor

### brycethomas commented May 17, 2012

 I have heard back from Latapy, here is an excerpt from his email: ``````Clustering coefficient for degree 1 nodes actually is poorly defined: what is the probability for any two neighbors of a degree 1 node to be linked togather? ... As a consequence, some authors (including me) ignore them, some consider their cc to be 0 and others to be 1. As in practice many nodes may have degree 1, this has a huge impact on the average cc in the graph. Consider for instance a binary tree: no triangle at all, but half nodes have degree 1. If one considers them to have cc 1 then the average is 0.5 still with no triangle. In general, authors do not explicitly state how then handle degree 1 nodes, as you noticed, and I even suspect that some authors choose the definition best suited to their expected outcome. To make the story short, there is no satisfactory way to handle degree 1 nodes in cc computations. One should first give the number of such nodes and then the average cc for all other nodes (or, better, the degree-cc correlations: then degree 1 nodes are naturally separated from others and the obtained information is much richer) `````` What I garner from this is that depending on how you define local clustering coefficient for degree 1 nodes, either the old implementation or the implementation in my pull request could both be considered correct. Given that Gephi references Latapy's paper though, it is at least in my opinion best that the implementation also be consistent with Latapy's.

### sheymann added a commit that referenced this issue May 20, 2012

``` Merge pull request #626 from brycethomas/master ```
`Fixes calculation of clustering coefficient on graph (Issue #625)`
``` 00d4318 ```
Member

### sheymann commented May 20, 2012

 Thanks for the report, we should indeed be consistent with Latapy's implementation. I merged your code.