### CS4423 - Networks
Prof. Götz Pfeiffer<br />
School of Mathematics, Statistics and Applied Mathematics<br />
NUI Galway

#### 3. Random Networks

# Lecture 11: Properties of Random Graphs.

What are the typical features of a random network with $n$ nodes and $m$ edges?

* When does it contain a **tree** that's not a straight line?
* When does it contain a **triangle**, or indeed a **cycle** of any length? If so, how many?
* When does it contain a small **complete** graph of a given size?
* When does it contain a **large component**, larger than all other components?
* When does the network form a single **connected commponent**?
* How do these properties depend on $n$ and $m$, or $p$?

In [None]:
import numpy as np
import pandas as pd
import networkx as nx
opts = { "with_labels": True, "node_color": 'y'}

## Probability Distributions

* Denote by $G_n$ the set of *all* graphs on the $n$ points $X = \{0, \dots, n{-}1\}$.

* Regard the ER models $A$ and $B$ as **probability distributions** $P \colon G_n \to \mathbb{R}$.

**Notation:**

* $N = \binom{n}{2}$, the maximal number of edges of a graph $G \in G_n$.

* $m(G)$: the number of edges of a graph $G$.

<div class="alert alert-success">
    
* $G(n,m)$:
$$
P(G) = \begin{cases}
\binom{N}{m}^{-1}, & \text{if } m(G) = m, \\
0, & \text{else.}
\end{cases}
$$

* $G(n, p)$:
$$
P(G) = p^m (1-p)^{N-m},
$$
where $m = m(G)$.
    
</div>

## Expected Size and Average Degree

<div class="alert alert-success">

In $G(n, m)$:

* the expected **size** is
$$
\bar{m} = m,
$$
as every graph $G$ in $G(n, m)$ has exactly $m$ edges.

* the expected **average degree** is 
$$
\langle k \rangle = \frac{2m}{n},
$$
as every graph has average degree $2m/n$.
</div>

* Other properties of $G(n, m)$ are less straightforward, and it is easier to work with 
the $G(n, p)$.

<div class="alert alert-success">
    
In $G(n, p)$, with $N = \binom{n}{2}$:

* the **expected size** is
$$
\bar{m} = pN
$$

* with **variance**
$$
\sigma_m^2 = N p (1-p);
$$

* the expected **average degree** is
$$
\langle k \rangle = p (n-1).
$$

* with **standard deviation**
$$
\sigma_k = \sqrt{p(1-p)(n-1)}
$$

</div>

* In particular, the **relative standard deviation** (or the **coefficient of variation**) of the size of
a random model $B$ graph is
$$
\frac{\sigma_m}{\bar{m}} = \sqrt{\frac{1-p}{pN}} 
= \sqrt{\frac{2(1-p)}{p n (n-1)}}
= \sqrt{\frac{2}{n \langle k \rangle} - \frac{2}{n (n-1)}}
,
$$
a quantity that converges to $0$ as $n \to \infty$ if $p (n-1) = \langle k \rangle$, the average node degree, is kept constant.

* In this sense, for large graphs, the fluctuations in the size of random graphs in model $B$ can be neglected.

* Model $B$ random graphs are used for theoretical purposes.

* Model $A$ random graphs are preferred for sampling random graphs on a computer.

* A model $A$ graph with $n$ vertices and $m$ edges corresponds to a model $B$ graph with $n$ vertices and
$p = m/N$, where $N = \binom{N}{2}$.

## Degree distribution

<div class="alert alert-warning">

**Definition.**
The **degree distribution** $p\colon \mathbb{N}_0 \to \mathbb{R},\, k \mapsto p_k$ of a graph $G$
is defined as
$$
p_k = \frac{n_k}{n},
$$
where, for $k \geq 0$, $n_k$ is the number of nodes of degree $k$ in $G$.
</div>

This definition can be extended to ensembles of graphs with $n$ nodes (like the random graphs $G(n, m)$ and
$G(n, p)$), by setting
$$
p_k = \bar{n}_k/n,
$$
where $\bar{n}_k$ denotes the expected value of the random variable $n_k$ over the ensemble of graphs.

<div class="alert alert-danger">

* The degree distribution in a random graph $G(n, p)$ is a **binomial distribution**:
$$
p_k = \binom{n-1}{k}p^k (1-p)^{n-1-k} = \mathrm{Bin}(n-1, p, k)
$$
    
</div>

* Thus the **average degree** of a randomly chosen node is
$$
\langle k \rangle = \sum_{k=0}^{n-1} k p_k = p (n-1)
$$
with standard deviation
$$
\sigma_k = \sqrt{p(1-p)(n-1)}
$$

* When $n$ is large, the typical $p_k$ is a product of a binomial coefficient $\binom{n-1}{k}$
which can be very large, and the probabilities $p^k (1-p)^{n-1-k}$ which can be very small.
This can lead to errors in the computation.

* In the limit $n \to \infty$, with $\bar{k} = p (n-1)$ kept constant,
the binomial distribution $\mathrm{Bin}(n-1, p, k)$ is well approximated by the **Poisson distribution**:
$$
p_k = e^{-\lambda} \frac{\lambda^k}{k!} = \mathrm{Pois}(\lambda, k),
$$
where $\lambda = p (n-1)$.

In [None]:
from math import factorial
factorial(16)

In [None]:
# python 3.8 has:
# from math import comb

$$\binom{n}{k} = \frac{n \cdot (n-1) \dotsm (n-k+1)}{1 \cdot 2 \dotsm k}$$

In [None]:
def binomial(n, k):
    prod, top, bot = 1, n, 1
    for i in range(k):
        prod = (prod * top) // bot
        top, bot = top - 1, bot + 1
    return prod

In [None]:
l = [binomial(16, k) for k in range(17)]
print(l)

In [None]:
df = pd.DataFrame(l)
df.plot.bar()

For $n$ larger than $k$, [Stirlings formula](https://en.wikipedia.org/wiki/Stirling%27s_approximation)
$$
n! \sim \sqrt{2 \pi n} \left(\tfrac{n}{e}\right)^n
$$
can be used to approximate a binomial coefficient as follows:

$$
\binom{n}{k}  = \frac{n \cdot (n-1) \dots (n-k+1)}{k!}
\approx \frac{(n-k/2)^k}{k^k e^{-k} \sqrt{2 \pi k}}
= \frac{(n/k - 0.5)^k e^k}{\sqrt{2 \pi k}}
$$

In [None]:
from math import exp, sqrt, pi, log
def binom_approx(n, k):
    return (n/k - 0.5)**k * exp(k) / sqrt(2 * pi * k)

In [None]:
n = binomial(100, 2)
k = 50
print(binomial(n, k))

In [None]:
print(binom_approx(n, k))
print(binomial(n, k) / 10**120)

## Phase Transitions

* Point of view: for the random graph $G(n, p)$, suppose that $p = p(n)$ is a function of $n$, the number of nodes,
and study the ensemble of graphs $G(n, p(n))$, as $n \to \infty$.
* Typically, $p(n) = c n^e$, for real numbers $c, e$.

* Then, to say that **almost every graph has a property $Q$** means that the
probability of a graph in the ensemble to have property $Q$ tends to $1$,
as $n \to \infty$.

* In other words, for any given property $Q$, there is a **critical probablity function** $p_Q(n)$
such that a.e. graph in $G(n, p(n))$ has property $Q$ if $p(n)$ **decays slower** than $p_Q(n)$,
and a.e. graph in $G(n, p(n))$ fails to have property $Q$ if $p(n)$ **decays faster** than $p_Q(n)$.

* If $Q$ is concerned with specific subgraphs, and if $p(n) \propto p_Q(n)$, we can determine the number of
such subgraph contained in the network.

* This is made precise by the following theorem.  (For a proof see Bollobás, *Random Graphs*, CUP 2001.)

<div class="alert alert-danger">
    
**Theorem (Appearance of Subgraphs).**
Let $F$ be a connected graph with $a$ nodes and $b$ edges.
* If $p(n)/n^{-a/b} \to 0$ then almost every graph in the ensemble $G(n, p(n))$ does not contain a copy of $F$.
* If $p(n)/n^{-a/b} \to \infty$ then almost every graph in the ensemble $G(n, p(n))$ does contain a copy of $F$.
* If $p(n) = c n^{-a/b}$ then, as $n \to \infty$, the number $n_F$ of $F$-subgraphs in $G$ has distribution
$\mathrm{Pois}(\lambda, r)$, where $\lambda = c^b/|\mathrm{Aut(F)}|$,
with $|\mathrm{Aut}(F)|$ being the number of *automorphisms* of $F$.
</div>

* So the critical probability function $p_Q(n)$ for $Q$ being the appearance of a subgraph $F$ with $a$ nodes and $b$ edges is $p_Q(n) = c n^{-a/b}$.

<div class="alert alert-success">

For example:

* **Trees**  with $a$ nodes ($b = a - 1$) appear when $p(n) = c n^{-a/(a-1)}$.
* **Cycles** of order $a$ (where $b = a$) appear when $p(n) = c n^{-1}$.
* **Complete** subgraphs of order $a$ (where $b = \binom{a}{2} = \frac12 a(a-1)$) appear when $p(n) = c n^{-2/(a-1)}$.
    
</div>

* To determine $n_F$:
* there are $\binom{n}{a}$ ways to pick $a$ nodes from $G$s $n$ nodes;
* between them, $b$ links are present with probability $p^b$;
* permuting the $a$ nodes of $F$ yields $a!/|\mathrm{Aut}(F)|$ copies;
* in total, there are 
$$
n_F = \frac{n! p^b}{(n-a)! |\mathrm{Aut}(F)|} \approx \frac{n^a p^b}{|\mathrm{Aut}(F)|}
$$
copies of $F$ in $G$.
* Note how this number depends only on $a$ and $b$.

<div class="alert alert-success">
    
Example: Numbers of 
* triads with $a = 3$ and $b = 2$:  $3 \binom{n}{3} p^2 = \tfrac12 n(n-1)(n-2)p^2 \approx \frac12 n^3 p^2$,
* star graph with $a = 4$, $b =3$ and $|\mathrm{Aut}(F)| = 6$: $4 \binom{n}{4} p^3 = \tfrac16 n(n-1)(n-2)(n-3) p^3 \approx \tfrac16 n^4 p^3$.
* triangles where $a = b = 3$ and $|\mathrm{Aut}(F)| = 6$: $\binom{n}{3} p^3 = \tfrac16 n(n-1)(n-2)p^3 \approx \frac16 n^3 p^3$.
    
</div>

* The actual **number of triads** in a graph $G$ can be determined from its **degree distribution**, as
each node of degree $k$ is the central node of exactly $\binom{k}{2}$ triads (why?)
* `networkx` does not seem to have a function for that specific purpose, but `degree` is good enough.

In [None]:
n, m = 100, 100
p = 2*m/n/(n-1)
print(f"Expect {n*(n-1)*(n-2) * p**2/2} triads")

In [None]:
G = nx.gnm_random_graph(n, m)
sum(k*(k-1)//2 for k in dict(G.degree()).values())

* The actual **number of triangles** in a graph $G$ with adjacency matrix $A$ is $\frac16$ of the trace of $A^3$ (why?)
* Also, the `networkx` function `triangles` returns a dictionary, reporting for each node the number of triangles it is involved in.

In [None]:
n, m = 100, 300
p = 2*m/n/(n-1)
print(f"Expect {n*(n-1)*(n-2) * p**3/6} triangles")

In [None]:
G = nx.gnm_random_graph(n, m)
sum(nx.triangles(G).values())//3

In [None]:
A = nx.adjacency_matrix(G)
np.trace((A**3).toarray())//6

<div class="alert alert-danger">

* Moreover, $p(n) = \frac1n \ln n$ is the threshold probability for $G$ to be connected.
(This corresponds to $m = \frac12 n \ln n$ in model $A$.)
    
</div>

##  The Giant Connected Component

<div class="alert alert-warning">
    
**Definition (Giant Component).**
A connected component of a graph $G$ is called a **giant component**
if its number of nodes increases with the order $n$ of $G$ as
some positive power of $n$.
</div>

Suppose $p(n) = cn^{-1}$ for some positive constant $c$.  (Then the average degree $\langle k \rangle = pn = c$ remains fixed as $n \to \infty$.)

<div class="alert alert-danger">

**Theorem (Erdös-Rényi)**.
* If $c < 1$ the graph contains many small components, orders bounded by $O(\ln n)$.
* If $c = 1$ the graph has large components of order $O(n^{2/3})$.
* If $c > 1$ there is a unique **giant component** of order $O(n)$.
<div>

* In practice, in a given graph $G$ of order $n$, 
a giant component 
appears when the average degree is $1$, i.e., if $m = \frac12n$, and then
has order $n^{2/3}$
(e.g., $100^{2/3} = 21.5$.)

## Code Corner

### `networkx`

* `triangles`: [[doc]](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.cluster.triangles.html)

##  Exercises

1. Design an experiment with random graphs of suitable degree and size to verify the predicted numbers of triads above.

1. Show that a node of degree $k$ is the central node of exactly $\binom{k}{2}$ triads.

1. Design an experiment with random graphs of suitable degree and size to verify the predicted numbers of triangles above.

1.  Show that the number $n_{\Delta}$ of triangles in an graph $G$ with adjacency matrix $A$ is
$n_{\Delta} = \frac16 \mathrm{tr}(A^3)$, where $\mathrm{tr}(B) = \sum_i b_{ii}$ denotes the **trace**
of a square matrix $B$, i.e., the sum of its diagonal entries.

1. Design an experiment with random graphs of suitable degree and size to verify the predicted numbers of star graphs on $4$ nodes above.