

# Configuration models

**Configuration models** are models of random graphs with a given degree sequence.

From a mathematical perspective, the idea of a configuration model is elegant and flexible. However, sampling from these models can be challenging and subtle. There are two primary strategies for thinking about configuration models.

## Types of configuration models

Similar to $G(n,p)$ models, we can understand a configuration model by understanding what is fixed (and what is random). 

One way to set up a configuration model is to fix a degree sequence $\{k_1, \dots, k_n\}$. Note that this also fixes the number of edges:
$$
    m = \frac{1}{2} \sum_j k_j \,.
$$

::: {.callout-note icon=false appearance="minimal"}
::: {#def-config-model-sequence}

Consider a graph with $n$ nodes. Let ${\bf k} \in \mathbb{N}^n$ be the degree sequence, where $k_i$ is the degree of node $i$.

A **configuration model** is a uniform distribution over graphs with degree sequence ${\bf k}.$

:::
:::

The classic algorithmic strategy for sampling from such a configuration model is through *stub-matching*. Recall that each edge has two ends of edges, or *stubs*. This means that node $i$ has $k_i$ stubs.

1. Choose two stubs uniformly at random and connect these stubs to form an edge.
2. Choose another pair from the remaining unmatched stubs; repeat the process until all stubs are matched.

The stub-matching algorithm can produce multiedges and self-loops, which can cause the graph to not be simple. However, @bollobas1980probabilistic proved that, when the graph is sparse, the expected number of multi-edges and self-loops does not grow with network size. One can, as a result, show these structures are rare, and can often be ignored in arguments. Another challenge with this algorithm is we require an even number of stubs to get a graph realization that exactly matches the degree sequence; although in practice if we are working from an empirical network this should not be a problem. 

@fosdick2018configuring discuss the subtleties that arise due to whether we allow or disallow multiedges and self-loops and whether we choose to label stubs distinctly (as opposed to only labeling vertices). These choices result in different spaces of graphs from which we are sampling. 

### Fixing a degree distribution

When making mathematical arguments, you may want to sample from a particular degree distribution $p_k$ instead of sampling graphs with a particular degree sequence $k$.

We can do this using a slight modification to the strategy described above. 

1. Draw a degree sequence $\{k_i\}$ from the given distribution $p_k.$ 
    - In practice, this is most likely achieved by $n$ independent draws from $p_k$. A particular degree sequence then appears with probability $\Pi_i p_{k_i}.$
2. Construct a graph with this degree sequence by proceeding via stub matching as described above.

Once again, we can easily run into some challenges here. With the algorithm described above it is very possible to generate a degree sequence with an odd number of stubs, and such degree sequences would need to be discarded. The concerns about self-edges, multiedges, and labeling still apply. 

While the two models we describe here are different, we expect them to behave similarly in the large-$n$ limit, where a sequence drawn from a degree distribution more accurately captures the underlying distribution.

There are some important special cases that we've already encountered:

- Using a Poisson degree distribution nearly recovers the $G(n,p)$ model, excepting that we are able to generate self- and multiedges with this configuration model variant.
- Using a power-law degree distribution helps us mathematically study the properties of scale-free networks, like those generated by preferential attachment.

### Fixing a degree sequence in expectation (Chung--Lu model)

Some of the challenges with stub-matching arise from the requirement that we generate a graph with a specified degree sequence ${\bf k}.$ @chung2002connected relax this constraint and generate networks whose degree sequences are ${\bf k}$ in expectation, which avoids some of these issues.

Consider two nodes $i$ and $j$ with degress $k_i$ and $k_j$ respectively. From the perspective of node $i$, the total number of stubs it could connect to is $2m-k_i$ (if we exclude the possibility of self-loops). Thus,
$$
    \mathbb{P}(\text{node $i$ stub connects to node $j$}) = \frac{k_j}{2m-k_i}.
$$

Assuming that our network is large and sparse, we expect the argument above to hold independently for each stub of $i$. Defining $\mathbb{P}_{ij}$ as the probability the node $i$ is connected to node $j$, we have

$$
    \mathbb{P}_{ij} = k_i \left( \frac{k_j}{2m-k_i} \right) \approx \frac{k_ik_j}{2m}\,.
$$

We might notice that this feels a little similar to the $G(n,p)$ model, and in fact, this is by construction. The degree of node $i$ will be Poisson distributed with mean $k_i$ under the assumptions given above. [Another assumption required here is that the maximum degree is not too large.]{.aside} This means that, instead of stub-matching, we can use an algorithm like for a $G(n,p)$ network by placing an edge between two nodes with probability $\mathbb{P}_{ij}$.

## References
