<a href="https://colab.research.google.com/github/jjcrofts77/TMB-MATH34041/blob/main/content/notebooks/Chapter1/BSNetworkClustering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1.4 Network clustering

Alongside the characteristic path-length perhaps the most important network metric is that of clustering, which measures the extent to which a nodes neighbours are connected. A feature of highly clustered networks is the presence of a large number of triangles. For instance, in a friendship network it is highly likely that if Bob and Phil are both friends of Joe then they will eventually meet each other via their relation with Joe, thus forming a triangle. Mathematically, a highly clustered network, i.e. one in which  $i\sim j$ and $i\sim k$ implies a high probability that $j\sim k$ is said to be *highly transitive*.

### Watts-Strogatz clustering
The first proposal for a *clustering coefficient* was given by Watts and Strogatz in their seminal paper on `small-world' networks in 1998. Given a node, $i$ say, the local clustering coefficient measures the proportion of triangles amongst its neighbours:

$$
 C(i) = \frac{\text{number of triangles centred on node $i$}}{\text{total number of possible triangles centred on node $i$}}.
$$

Mathematically, we can write the above equation as

$$
 C(i) = \frac{t_i}{k_i(k_i-1)/2} = \frac{2t_i}{k_i(k_i-1)},
$$ (WSClusteringCoeficient)

where $k_i$ gives the degree of the $i$th node and we define $t_i$ to be the total number of triangles centred on node $i$. Given Equation {eq}`WSClusteringCoeficient` it is straightforward to construct a global clustering coefficient by averaging the local clustering coefficient over all network nodes:

$$
 \langle C\rangle = \frac{1}{n}\sum_i C(i).
$$

{numref}`ClusteringExample1p4` illustrates the above ideas for 3 toy networks models.

```{figure} ../../images/Clustering.png
---
height: 150px
name: ClusteringExample1p4
---
 Simple illustration of the Watts-Strogatz clustering coefficient: (LHS) $C(i) = \{0,0,0,0\}$ and $\langle C\rangle = 0$; (Middle) $C(i) = \{1,1,1,1\}$ and $\langle C\rangle = 1$; and (RHS) $C(i) = \{1,\frac{1}{3},0,1\}$ and $\langle C\rangle = \frac{7}{12}$.
```

**Exercise**: Compute $t_i$ from Equation {eq}`WSClusteringCoeficient` in terms of the network adjacency matrix.

### Transitivity index
Another way of quantifying global clustering within a network is by means of the so-called *transitivity index*. Let $|C_3|$ denote the total number of triangles present within the network, and $|P_2|$ the total number of paths of length 2 in the network - note that a path of length two represents a potential triangle. Then we define the transitivity index as

$$
 T = \frac{3|C_3|}{|P_2|}.
$$ (GlobalClusteringCoefficient)

The factor of 3 is due to the fact that each triangle contributes 3 2-paths, i.e. paths of length 2.

```{figure} ../../images/NetworkExample1p4.png
---
height: 350px
name: SimpNetEx
---
Simple network on 8 nodes.
```

Let us consider the simple network given in {numref}`SimpNetEx` whose adjacency matrix is shown below.

$$
A = \begin{bmatrix}
      0 & 1 & 1 &1 & 1 & 1& 0 & 0\\
      1 & 0 & 1 &0 & 0 & 1& 0 & 0\\
      1 & 1 & 0 &1 & 0 & 0& 1 & 0\\
      1 & 0 & 1 &0 & 0 & 0& 0 & 1\\
      1 & 0 & 0 &0 & 0 & 0& 0 & 0\\
      0 & 1 & 0 &0 & 0 & 0& 0 & 0\\
      0 & 0 & 1 &0 & 0 & 0& 0 & 0\\
      0 & 0 & 0 &1 & 0 & 0& 0 & 0
      \end{bmatrix}      
$$

We can compute the number of triangles in this network by considering the above matrix. In fact, recalling {prf:ref}`Walks` we see that

$$
|C_3| = \frac{1}{6}\mathrm{Tr}\left(A^3\right) = 2.
$$

The number of paths of length two in the network can be obtained using the following formula (which you are asked to justify in the problem sheets)

$$
|P_2| = \sum_{i=1}^n{k_i\choose 2} = \sum_{i=1}^n\frac{k_i(k_i-1)}{2} = 18.
$$

Thus,

$$
T = \frac{3\times 2}{18} = \frac{1}{3}.
$$

**Exercise**: Use {numref}`SimpNetEx` to confirm the above calculations. (Hint: the amount of work is greatly reduced by noting that several of the nodes in this network are `equivalent'.)