### Community detection: Triangle Counting and Clustering Coefficient

Triangle counting is a community detection graph algorithm which is used to determine the number of triangles passing through each vertex in the graph data set. A vertex is part of a triangle when it has two adjacent vertices with an edge between. The triangle is a three-node subgraph, where every two nodes are connected. This algorithm returns a Graph object and we extract vertices from this triangle counting graph.

Triangle counting is used heavily in social network analysis. It provides a measure of clustering in the graph data which is useful for finding communities and measuring the cohesiveness of local communities in social network websites like LinkedIn or Facebook. 

Triangle counting is a message heavy and computationally expensive algorithm compared to other graph algorithms. So, make sure you run the Spark program on a decent computer when you test Triangle Count algorithm. Note that PageRank is a measure of relevancy whereas Triangle Count is a measure of clustering.

Clustering Coefficient, an important metric in a social network, shows how much community around one node is tightly connected. The notion of clustering coefficient is inspired by this observation, and is the standard method of summarizing triangle counts. It is well known that some networks, especially social networks, have much higher clustering coefficients than random networks.

#### Network Average Clustering Coefficient

Objective:
The clustering coefficient, along with the mean shortest path, can indicate a "small-world" effect. For the clustering coefficient to be meaningful it should be significantly higher than in version of the network where all of the edges have been shuffled.

Description:
The neighborhood of a node, u, is the set of nodes that are connected to u. If every node in the neighborhood of u is connected to every other node in the neighborhood of u, then the neighborhood of u is complete and will have a clustering coefficient of 1. If no nodes in the neighborhood of u are connected, then the clustering coefficient will be 0.

In [1]:
#Loading library
import py2neo as py2neo

In [14]:
#Accessing local Neo4j Server
py2neo.authenticate("localhost:7474", "neo4j", "neo4j")

In [15]:
graph = py2neo.Graph("http://localhost:7474/db/data/")

In [25]:
#Querying triangle number for each node
query = """
CALL algo.triangleCount.stream('Person', 'HAS_CONTACT', {concurrency:100})
YIELD nodeId, triangles;
"""

In [26]:
results = graph.data(query)

In [6]:
#Sorting dictionary containing list
from operator import itemgetter
newlist = sorted(results, key=itemgetter('triangles'),reverse=True) 

In [7]:
#Top 10 triangle nodes
newlist[:10]

[{'nodeId': 1454, 'triangles': 165},
 {'nodeId': 1435, 'triangles': 139},
 {'nodeId': 1412, 'triangles': 138},
 {'nodeId': 1427, 'triangles': 137},
 {'nodeId': 1486, 'triangles': 128},
 {'nodeId': 1498, 'triangles': 125},
 {'nodeId': 1431, 'triangles': 123},
 {'nodeId': 1484, 'triangles': 114},
 {'nodeId': 1453, 'triangles': 108},
 {'nodeId': 1468, 'triangles': 107}]

In [8]:
x = newlist[:10]

In [28]:
#Creating nodeid array
top_10 = []
for a in range(10):
    top_10.append(list(x[a].values())[0])

In [30]:
#string concat olarak degisitrilecek
query = """
MATCH (person:Person)
WHERE ID(person) in [1454, 1435, 1412, 1427, 1486, 1498, 1431, 1484, 1453, 1468]
RETURN person.name
"""
graph.data(query)

[{'person.name': 'Charlize Theron'},
 {'person.name': 'Rob Reiner'},
 {'person.name': 'Val Kilmer'},
 {'person.name': 'Tony Scott'},
 {'person.name': 'Helen Hunt'},
 {'person.name': 'Greg Kinnear'},
 {'person.name': 'Parker Posey'},
 {'person.name': 'Bruno Kirby'},
 {'person.name': 'Liv Tyler'},
 {'person.name': 'Richard Harris'}]

In [11]:
#Computing cluster coefficient
query = """
CALL algo.triangleCount('Person', 'HAS_CONTACT',
{concurrency:4, write:true, writeProperty:'triangles',clusteringCoefficientProperty:'coefficient'})
YIELD loadMillis, computeMillis, writeMillis, nodeCount, triangleCount, averageClusteringCoefficient;
"""

In [12]:
results = graph.data(query)

In [13]:
results

[{'averageClusteringCoefficient': 0.19542368158538823,
  'computeMillis': 8,
  'loadMillis': 1,
  'nodeCount': 133,
  'triangleCount': 2442,
  'writeMillis': 1}]