New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avg. clustering coefficient calculated incorrectly on graph #625

brycethomas opened this Issue May 16, 2012 · 4 comments


None yet
4 participants
Copy link

brycethomas commented May 16, 2012


Consider the following graph:

Gephi (0.8.1) calculates the following local clustering coefficients:

Gephi averages these: (1.0 + 0.33333 + 1.0 + 0.0) / 4 = 0.583, and reports this number (0.583) as the Avg. Clustering Coefficient. This result is inconsistent with the algorithm from Main-memory Triangle Computations for Very Large (Sparse (Power-Law)) Graphs that Gephi claims to implement. The author of the aforementioned paper (Latapy) has his own C implementation of the algorithm, which produces 0.7777.

I surmise that the mistake Gephi is making is that it follows Latapy's advice of not calculating a local clustering coefficient for nodes with degree <= 1, but then includes this node in the count of total nodes anyway. In the example I gave above, I am referring to node Four, whose clustering coefficient is calculated as 0.0. Note that if we were to do (1.0 + 0.33333 + 1.0 + 0.0) / 3 instead of (1.0 + 0.33333 + 1.0 + 0.0)/4 we would get 0.7777, which is consistent with the result of Latapy's implementation.

To throw another spanner in the works, there is a second potential problem, but I am waiting for this to be confirmed by Latapy. This second problem is that it appears to me that Latapy himself implements the clustering coefficient algorithm differently to how it was first described in Collective dynamics of small world networks by Watts & Strogatz, which he references. I didn't see any mention in Watts & Strogatz paper to suggest nodes with degree <= 1 should be excluded from the calculations at all.

To summarise, I believe clustering coefficient is implemented incorrectly in Gephi. I suggest as a first step at least ensuring it is consistent with Latapy's implementation, and then later on figure out whether Latapy's implementation is itself inconsistent with the original definition of Avg. clustering coefficient. I haven't worked on the Gephi code base before, but if someone can point me to the right files, I'm willing to have a go at fixing it.


This comment has been minimized.

Copy link

brycethomas commented May 16, 2012

Just to clarify, I believe the current Gephi implementation to be incorrect regardless of whether you consider Latapy's or Watts & Strogatz to be the "proper" way of implementating Avg. clustering coefficient.


This comment has been minimized.

Copy link

phreakocious commented May 16, 2012

I can't speak to your claim, but you can certainly see the source code here:


This comment has been minimized.

Copy link

brycethomas commented May 17, 2012

I have heard back from Latapy, here is an excerpt from his email:

Clustering coefficient
for degree 1 nodes actually is poorly defined: what is the
probability for any two neighbors of a degree 1 node to be
linked togather? ...

As a consequence, some authors (including me) ignore them,
some consider their cc to be 0 and others to be 1.

As in practice many nodes may have degree 1, this has a huge
impact on the average cc in the graph. Consider for instance
a binary tree: no triangle at all, but half nodes have degree
1. If one considers them to have cc 1 then the average is 0.5
still with no triangle.

In general, authors do not explicitly state how then handle
degree 1 nodes, as you noticed, and I even suspect that
some authors choose the definition best suited to their
expected outcome.

To make the story short, there is no satisfactory way to
handle degree 1 nodes in cc computations. One should first
give the number of such nodes and then the average cc for
all other nodes (or, better, the degree-cc correlations:
then degree 1 nodes are naturally separated from others and
the obtained information is much richer)

What I garner from this is that depending on how you define local clustering coefficient for degree 1 nodes, either the old implementation or the implementation in my pull request could both be considered correct. Given that Gephi references Latapy's paper though, it is at least in my opinion best that the implementation also be consistent with Latapy's.

sheymann added a commit that referenced this issue May 20, 2012

Merge pull request #626 from brycethomas/master
Fixes calculation of clustering coefficient on graph (Issue #625)

This comment has been minimized.

Copy link

sheymann commented May 20, 2012

Thanks for the report, we should indeed be consistent with Latapy's implementation. I merged your code.

@sheymann sheymann closed this May 20, 2012

@mbastian mbastian added Fix Released and removed Fix Committed labels Nov 21, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment