# Erd&ouml;s-R&eacute;nyi (ER) Random Networks

We will start our description with the simplest random network model. Consider a social network, with $50$ students. Our network will have $50$ nodes, where each node represents a single student in the network. Edges in the social network represent whether or not a pair of students are friends. What is the simplest way we could describe whether two people are friends?

In this case, we have a yes or no question: are a pair of people friends, or are they not friends? In this case, the simplest possible thing to do would be to say, for any two students in our network, there is some probability (which we will call $p$) that describes how likely they are to be friends. In the below example, for the sake of argument, we will let $p=0.3$. What does a realization from this network look like? Below is an example of a realization of a 

In [None]:
from graphbook_code import draw_multiplot
from graspologic.simulations import er_np

n = 50  # network with 50 nodes
p = 0.3  # probability of an edge existing is .3

# sample a single simple adjacency matrix from ER(50, .3)
A = er_np(n=n, p=p, directed=False, loops=False)

draw_multiplot(A, title="ER(0.3) Simulation");

Using this example, now let's get down to business. This simple random network model is called the Erd&ouml;s R&eacute;nyi (ER) model<sup>1</sup>. The way we can think of the $ER$ random network is that the edges depend *only* on a probability, $p$, and each edge is totally independent of all other edges. We can think of this example as though a coin flip is performed, where the coin has a probability $p$ of landing on heads, and $1-p$ of landing on tails. For each edge in the network, we conceptually flip the coin, and if it lands on heads (with probability $p$), the edge exists, and if it lands on tails (with probability $1-p$) the edge does not exist. If $\mathbf A$ is a random network which is $ER_n(p)$ with $n$ nodes and probability $p$, we will often say that $\mathbf A$ is an $ER_n(p)$ random network. 

This approach which we will use to describe random networks is called a *generative model*, which means that we have described an observable network realization $A$ of the random network $\mathbf A$ in terms of the parameters of $\mathbf A$. In the case of the $ER_n(p)$ random networks, we have described $\mathbf A$ in terms of the probability parameter, $p$. Generative models are convenient in that we can easily adapt them to tell us exactly how to simulate realizations of the underlying random network. The procedure below will produce for us a network $A$, which has nodes and edges, where the underlying random network  $\mathbf A$ is an $ER_n(p)$ random network:

```{admonition} Simulating a realization from an $ER_n(p)$ network
1. Determine a probability, $p$, of an edge existing.
2. Obtain a weighted coin which has a probability $p$ of landing on heads, and a probability $1 - p$ of landing on tails. Note that this probability $p$ might differ from the "traditional" coin with a probability of landing on heads of approximately $0.5$.
3. Flip the once for each *possible* edge $(i, j)$ between nodes $i$ and $j$ in the network. For a simple network, we will repeat the coin flip $\binom n 2$ times. 
4. For each coin flip which landed on heads, define that the corresponding edge exists, and define that the corresponding entry $a_{ij}$ in the adjacency matrix is $1$. For each coin flip which lands on tails, define that the corresponding edge does not exist, and define that $a_{ij} = 0$.
5. The adjacency matrix we produce, $A$, is a realization of an $ER_n(p)$ random network.
```

## When do we use an $ER_n(p)$ Network?

In practice, the $ER_n(p)$ model seems like it might be a little too simple to be useful. Why would it ever be useful to think that the best we can do to describe our network is to say that connections exist with some probability? Does this miss a *lot* of useful questions we might want to answer? Fortunately, there are a number of ways in which the simplicity of the $ER_n(p)$ model is useful. Given a probability and a number of nodes, we can easily describe the properties we would expect to see in a network if that network were ER. For instance, we know how many edges on average the nodes of an $ER_n(p)$ random nework should have. We can reverse this idea, too: given a network we think might *not* be ER, we could check whether it's different in some way from an $ER_n(p)$ random network. For instance, if we see that half the nodes have a ton of edges (meaning, they have a high degree), and half don't, we might be able to determine that the network is poorly described by an $ER_n(p)$ random network. If this is the case, we might look for other models that could describe our network which are more complex. 

<!-- Another utility of the $ER_n(p)$ model is that we might often want to benchmark network algorithms on simulated networks with a given *sparsity*. **Network sparsity** is a feature of a network which describes the degree to which the network possesses fewer edges than the maximum number of possible edges. As an example, when we know ahead of time that the network is going to be sparse (the network has a *small* number of edges which exist relative the number of possible edges), we can use network machine learning techniques which anticipate this sparsity to make the algorithm faster. In a simple network, for instance, the maximum number of possible edges is $\binom n 2$. In an $ER_n(p)$ network with probability $p$, we would expect the network to have on average about $p \binom n 2$ edges; that is, $p$ describes the fraction of total possible edges that we would expect to exist. $ER_n(p)$ networks are extremely cheap to simulate computationally, because "flipping weighted coins" (if you are curious, this is called a *Bernoulli sample* with probability $p$) is usually able to be performed with extremely optimized code in most standard programming languages such as python. Being able to generate networks very easily with a given number of nodes $n$ and a given sparsity allows us to test just how efficient our network machine learning technique is.
-->

In the next code block, we are going to sample a single $ER_n(p)$ network with $50$ nodes and an edge probability $p$ of $0.3$:

In [None]:
from graphbook_code import draw_multiplot
from graspologic.simulations import er_np

n = 10  # network with 50 nodes
p = 0.3  # probability of an edge existing is .3

# sample a single simple adjacency matrix from ER(50, .3)
A = er_np(n=n, p=p, directed=False, loops=False)

# and plot it
draw_multiplot(A, title="$ER_{50}(0.3)$ Simulation", xticklabels=10, yticklabels=10);

Above, we visualize the network using a heatmap. The dark squares indicate that an edge exists between a pair of nodes, and white squares indicate that an edge does not exist between a pair of nodes.

Next, let's see what happens when we use a higher edge probability, like $p=0.7$:

In [None]:
p = 0.7  # network has an edge probability of 0.7

# sample a single adjacency matrix from ER(50, 0.7)
A = er_np(n=n, p=p, directed=False, loops=False)

# and plot it
draw_multiplot(A, title="$ER_{50}(0.7)$ Simulation", xticklabels=10, yticklabels=10);

As the edge probability increases, the sampled adjacency matrix tends to indicate that there are more connections in the network. This is because there is a higher chance of an edge existing when $p$ is larger.

## References

[1] Erd&ouml;s P, R&eacute;nyi A. 1959. "On random graphs, I." Publ. Math. Debrecen 6:290–297.
