<a href="https://colab.research.google.com/github/jjcrofts77/TMB-MATH34041/blob/main/content/notebooks/Chapter1/BSNetworkClustering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1.2 Basic Statistics (Ctd)

## 1.2.3 Network clustering

Alongside the characteristic path-length perhaps the most important network metric is that of clustering, which measures the extent to which a nodes neighbours are connected. A feature of highly clustered networks is the presence of a large number of triangles. For instance, in a friendship network it is highly likely that if Bob and Phil are both friends of Joe then they will eventually meet each other via their relation with Joe, thus forming a triangle. Mathematically, a highly clustered network, i.e. one in which  $i\sim j$ and $i\sim k$ implies a high probability that $j\sim k$ is said to be \emph{highly transitive}. 

\begin{figure}[t]
\centering
 \includegraphics[scale=0.6]{Figures/Clustering}
 \caption{Simple illustration of the Watts-Strogatz clustering coefficient: (LHS) $C(i) = \{0,0,0,0\}$ and $\langle C\rangle = 0$; (Middle) $C(i) = \{1,1,1,1\}$ and $\langle C\rangle = 1$; and (RHS) $C(i) = \{1,\frac{1}{3},0,1\}$ and $\langle C\rangle = \frac{7}{12}$.}\label{fig4:ClusteringExample}
\end{figure}

\bigskip

\noindent\textbf{Watts-Strogatz clustering}\\
The first proposal for a \emph{clustering coefficient} was given by Watts and Strogatz in their seminal paper on `small-world' networks in 1998. Given a node, $i$ say, the local clustering coefficient measures the proportion of triangles amongst its neighbours:
\[
 C(i) = \frac{\text{number of triangles centred on node $i$}}{\text{total number of possible triangles centred on node $i$}}.
\]
Mathematically, we can write the above equation as
\begin{equation}\label{eqn:WSClusteringCoeficient}
 C(i) = \frac{t_i}{k_i(k_i-1)/2} = \frac{2t_i}{k_i(k_i-1)}, 
\end{equation}
where $k_i$ gives the degree of the $i$th node and we define $t_i$ to be the total number of triangles centred on node $i$. Given Equation (\ref{eqn:WSClusteringCoeficient}) it is straightforward to construct a global clustering coefficient by averaging the local clustering coefficient over all network nodes:
\begin{equation}
 \langle C\rangle = \frac{1}{n}\sum_i C(i).
\end{equation}
Figure \ref{fig4:ClusteringExample} illustrates the above ideas for 3 toy networks models.

\bigskip
\noindent\textbf{Exercise:} Compute the WS clustering coefficient (both local and global) for the three toy models in Figure \ref{fig:WSexamples}.

\bigskip
\noindent\textbf{Exercise:} Compute $t_i$ from Equation (\ref{eqn:WSClusteringCoeficient}) in terms of the network adjacency matrix.

\bigskip

\noindent\textbf{Transitivity index}\\
Another way of quantifying global clustering within a network is by means of the so-called \emph{transitivity index}. Let $|C_3|$ denote the total number of triangles present within the network, and $|P_2|$ the total number of paths of length 2 in the network - note that a path of length two represents a potential triangle. Then we define the transitivity index as 
\begin{equation}\label{eqn:GlobalClusteringCoefficient}
 T = \frac{3|C_3|}{|P_2|}.
\end{equation}
The factor of 3 is due to the fact that each triangle contributes 3 2-paths, i.e. paths of length 2.

\begin{figure}[t]
\centering
 \includegraphics[scale=0.4]{Figures/NetworkExample}
 \caption{Simple network on 8 nodes.}\label{fig:SimpleNetworkExample}
\end{figure}

Let us consider the simple network given in Figure \ref{fig:SimpleNetworkExample} whose adjacency matrix is shown below.
\begin{align*}
A &= \left(
      \begin{tabular}{cccccccc}
      0 & 1 & 1 &1 & 1 & 1& 0 & 0\\
      1 & 0 & 1 &0 & 0 & 1& 0 & 0\\
      1 & 1 & 0 &1 & 0 & 0& 1 & 0\\
      1 & 0 & 1 &0 & 0 & 0& 0 & 1\\
      1 & 0 & 0 &0 & 0 & 0& 0 & 0\\
      0 & 1 & 0 &0 & 0 & 0& 0 & 0\\
      0 & 0 & 1 &0 & 0 & 0& 0 & 0\\
      0 & 0 & 0 &1 & 0 & 0& 0 & 0
      \end{tabular}
      \right)
\end{align*}
We can compute the number of triangles in this network by considering the above matrix. In fact, recalling Theorem \ref{Theorem:Walks} we see that
\[
|C_3| = \frac{1}{6}\Tr\left(A^3\right) = 2.
\]
The number of paths of length two in the network can be obtained using the following formula (which you are asked to justify in the problem sheets)
\[
|P_2| = \sum_{i=1}^n{k_i\choose 2} = \sum_{i=1}^n\frac{k_i(k_i-1)}{2} = 18.
\]
Thus, 
\[
T = \frac{3\times 2}{18} = \frac{1}{3}.
\]

\bigskip
\noindent\textbf{Exercise:} Use Figure \ref{fig:SimpleNetworkExample} to confirm the above calculations. (Hint: the amount of work is greatly reduced by noting that several of the nodes in this network are `equivalent'.)