New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avg. clustering coefficient calculated incorrectly on graph #625
Comments
Just to clarify, I believe the current Gephi implementation to be incorrect regardless of whether you consider Latapy's or Watts & Strogatz to be the "proper" way of implementating Avg. clustering coefficient. |
I can't speak to your claim, but you can certainly see the source code here: https://github.com/gephi/gephi/blob/master/StatisticsPlugin/src/org/gephi/statistics/plugin/ClusteringCoefficient.java |
I have heard back from Latapy, here is an excerpt from his email:
What I garner from this is that depending on how you define local clustering coefficient for degree 1 nodes, either the old implementation or the implementation in my pull request could both be considered correct. Given that Gephi references Latapy's paper though, it is at least in my opinion best that the implementation also be consistent with Latapy's. |
Fixes calculation of clustering coefficient on graph (Issue #625)
Thanks for the report, we should indeed be consistent with Latapy's implementation. I merged your code. |
Hi,
Consider the following graph:
http://i.imgur.com/7hMZq.png
Gephi (0.8.1) calculates the following local clustering coefficients:
http://i.imgur.com/ilT1f.png
Gephi averages these:
(1.0 + 0.33333 + 1.0 + 0.0) / 4 = 0.583
, and reports this number (0.583
) as the Avg. Clustering Coefficient. This result is inconsistent with the algorithm from Main-memory Triangle Computations for Very Large (Sparse (Power-Law)) Graphs that Gephi claims to implement. The author of the aforementioned paper (Latapy) has his own C implementation of the algorithm, which produces0.7777
.I surmise that the mistake Gephi is making is that it follows Latapy's advice of not calculating a local clustering coefficient for nodes with degree
<= 1
, but then includes this node in the count of total nodes anyway. In the example I gave above, I am referring to node Four, whose clustering coefficient is calculated as0.0
. Note that if we were to do(1.0 + 0.33333 + 1.0 + 0.0) / 3
instead of(1.0 + 0.33333 + 1.0 + 0.0)/4
we would get0.7777
, which is consistent with the result of Latapy's implementation.To throw another spanner in the works, there is a second potential problem, but I am waiting for this to be confirmed by Latapy. This second problem is that it appears to me that Latapy himself implements the clustering coefficient algorithm differently to how it was first described in Collective dynamics of small world networks by Watts & Strogatz, which he references. I didn't see any mention in Watts & Strogatz paper to suggest nodes with degree
<= 1
should be excluded from the calculations at all.To summarise, I believe clustering coefficient is implemented incorrectly in Gephi. I suggest as a first step at least ensuring it is consistent with Latapy's implementation, and then later on figure out whether Latapy's implementation is itself inconsistent with the original definition of Avg. clustering coefficient. I haven't worked on the Gephi code base before, but if someone can point me to the right files, I'm willing to have a go at fixing it.
The text was updated successfully, but these errors were encountered: