

# Probability Generating Functions

In this lecture we'll introduce a powerful tool for calculating network properties: probability generating functions.

## Properties of probability generating functions

::: {.callout-note icon=false appearance="minimal"}
::: {#def-pgf}

Let $K$ be the random variable denoting node degree, with degree distribution $p_k$.  The **probability generating function** $g(z)$ is given by

$$
    g(z) = p_0 + p_1z + p_2z^2 + \dots = \sum_{k=0}^\infty p_kz^k \,.
$$

:::
:::

We note that the degree distribution and the probability generating function give two different mathematical representations of the same idea. The probability generating function is a **power series** representation of the degree distribution of $K$. This is quite useful because the probability generating function is always a polynomial, and there is a lot of rich theory we can leverage.

::: {.callout-caution}
**Exercise**: Suppose we have a network where we observe the following data.

| Node Degree | Percentage of Nodes with Given Degree|
|  |  also called second neighbors --- of a node. 

What is the probability that a node has exactly $k$ second neighbors in our configuration model? It depends on how many neighbors you have! If you have $m$ neighbors, then this probability is
$$
    \mathbb{P}(m \text{ neighbors})\mathbb{P}(k \text{ second neighbors} \vert m \text{ neighbors}) \,.
$$

Summing over all possible values of $m$ gives us our expression:
$$
     \mathbb{P}(k \text{ second neighbors}) = \sum_{m=0}^\infty \mathbb{P}(m \text{ neighbors})\mathbb{P}(k \text{ second neighbors} \vert m \text{ neighbors})
$$

This calculation could be very hard! Let's see how generating functions can help us perform this calculation. We will define the generating function for $\mathbb{P}(k \text{ second neighbors})$ to be $g_2(z).$ Then

\begin{align}
    g_2(z) &= \sum_{k=0}^\infty \mathbb{P}(k \text{ second neighbors}) z^k \\
    &= \sum_{k=0}^\infty \sum_{m=0}^\infty p_m \mathbb{P}(k \text{ second neighbors} \vert m \text{ neighbors}) z^k \\
    & = \sum_{m=0}^\infty p_m \sum_{k=0}^\infty \mathbb{P}(k \text{ second neighbors} \vert m \text{ neighbors}) z^k \,.
\end{align}

However, notice that the quantity $\mathbb{P}(k \text{ second neighbors} \vert m \text{ neighbors})$ is related to the excess degree: we are interested in the sum of the excess degrees of $m$ neighbors. Using the multiplicative property we proved above, we know that this sum has the generating function $(g_J(z))^m$.

$$
     \sum_{k=0}^\infty \mathbb{P}(k \text{ second neighbors} \vert m \text{ neighbors}) z^k = \prod_{i=1}^m\sum_{k=0}^\infty q_kz^k = (g_J(z))^m\,.
$$

Substituting this expression into our calculation for $g_2(z)$ yields
$$
g_2(z) = \sum_{m=0}^\infty p_m (g_J(z))^m = g_K(g_J(z)) \,.
$$

That is, the generating function for the distribution of second neighbors can be calculated from the generating functions of the degree and excess degree distribution!

We can now find the expected number of second neighbors by calculating $g_2'(1)$. By the chain rule, $g_2'(z) = g_K'(g_J(z))g_J'(z).$ 

$$
    g_2'(1) = g_K'(g_J(1))g_J'(1).
$$

However, we know that $g_J(1) = 1$, because it is the zeroth moment of the distribution. Thus,
$$
    \mathbb{E}(\text{second neighbors}) = g_K'(1)g_J'(1) = (\text{mean degree})(\text{mean excess degree})\,.
$$

Referring to our previous calculations for these quantities, we have
$$
    \mathbb{E}(\text{second neighbors}) = \langle k \rangle \frac{\langle k^2 \rangle - \langle k \rangle}{\langle k \rangle} = \langle k^2 \rangle - \langle k \rangle \,.
$$

## References
