We would like to determine for a given $M$, $N$, $q$, and $L$, how many (on average) sets of $L$ non-overlapping associations can be stored and recalled in a network. We will normalize by the number $N^*_L$ of possible sets of $L$ non-overlapping associations.

This normalized "capacity" is given by

$$E[C(M, N, q, L)] = E_W\left[\cfrac{1}{N^*_L}\sum\limits_{s_L} \mathbb{1}\left[s_L \textrm{ is recallable}|M, N, q\right]\right] = \cfrac{1}{N^*_L}\sum\limits_{s_L} E_W\left[\mathbb{1}\left[s_L \textrm{ is recallable}\right]\right]$$

$$= \cfrac{N^*_L}{N^*_L}p(s_L \textrm{ is recallable}|M, N, q) = p(s_L \textrm{ is recallable}|M, N, q)$$

where the last step arises because the probability of a set $s_L$ of associations being recallable doesn't depend on which items are in the associations, because of the homogeneous randomness throughout the network.

Without loss of generality, we can therefore assume that $s_L = \{(1, 2), (3, 4), ... (2L-1, 2L)\}$. Then, denoting $V_j$ to be the set of association units connecting to item unit $j$, we can expand

$$p(s_L \textrm{ is recallable}|M, N, q) = \sum\limits_{V_1, ..., V_{2L}}p(V_1, ..., V_{2L})p(s_L \textrm{ is recallable}|M, N, q, V_1, ..., V_{2L}).$$

Next, let $A$ be the set of association/memory units moved to their hyperexcitable state after storage of the associations. This is given by 

$$A = \bigcup\limits_{i = 1}^L \left(V_{2i - 1}\bigcap V_{2i}\right).$$

Then, we can state that the $i$-th pair in $s_L$ is recallable if:

$$\left\vert V_{2i - 1} \bigcap V_{2i}\right\vert > \left\vert V_{2i - 1} \bigcap A \bigcap V_j \right\vert \textrm{ and } \left\vert V_{2i - 1} \bigcap V_{2i}\right\vert > \left\vert V_{2i} \bigcap A \bigcap V_j \right\vert \textrm{ for all } j \neq 2i-1, 2i.$$

Since this is deterministic for $j <= L$ and probabilistic for $j > L$, we can write

$$p(s_L \textrm{ is recallable}|M, N, q, V_1, ..., V_{2L}) = f(V_1, ..., V_{2L})g(M, N, q, V_1, ..., V_{2L})$$

where

$$f(V_1, ..., V_{2L}) = \prod\limits_{i = 1}^L \mathbb{1}\left[ \left\vert V_{2i - 1} \bigcap V_{2i}\right\vert > \left\vert V_{2i - 1} \bigcap A \bigcap V_j \right\vert \textrm{ and } \left\vert V_{2i - 1} \bigcap V_{2i}\right\vert > \left\vert V_{2i} \bigcap A \bigcap V_j \right\vert \forall j \leq 2L, \neq 2i-1, 2i\right]$$

and is deterministic, and

$$g(M, N, q, V_1, ..., V_{2L}) = p\left( \left\vert V_{2i - 1} \bigcap V_{2i}\right\vert > \left\vert V_{2i - 1} \bigcap A \bigcap V_j \right\vert \textrm{ and } \left\vert V_{2i - 1} \bigcap V_{2i}\right\vert > \left\vert V_{2i} \bigcap A \bigcap V_j \right\vert \forall j > 2L, i \leq 2L \right).$$

However, since all the $V_j$ are sampled i.i.d. for $j > 2L$, we can simplify to

$$g(M, N, q, V_1, ..., V_{2L}) = h(N, q, V_1, ..., V_{2L})^{M - 2L}$$

where

$$h(N, q, V_1, ..., V_{2L}) = p\left( \left\vert V_{2i - 1} \bigcap V_{2i}\right\vert > \left\vert V_{2i - 1} \bigcap A \bigcap V_j \right\vert \textrm{ and } \left\vert V_{2i - 1} \bigcap V_{2i}\right\vert > \left\vert V_{2i} \bigcap A \bigcap V_j \right\vert \forall i \leq 2L \right).$$

Therefore, we have that

$$p(s_L \textrm{ is recallable}|M, N, q) = \sum\limits_{V_1, ..., V_{2L}}p(V_1, ..., V_{2L})f(V_1, ..., V_{2L})h(N, q, V_1, ..., V_{2L})^{M - 2L}.$$

If $f(V_1, ..., V_{2L})h(N, q, V_1, ..., V_{2L})^{M - 2L}$ can be computed analytically, then we can approximate $p(s_L \textrm{ is recallable}|M, N, q)$ via a standard Monte Carlo estimation, since we can sample easily from $p(V_1, ..., V_{2L})$. However, while $f(V_1, ..., V_{2L})$ is computable via a small matrix multiplication, $h(N, q, V_1, ..., V_{2L})$ is not so simple, since it is equal to the following joint probability distribution:

$$h(N, q, V_1, ..., V_{2L}) = \\ p\left(
\left\vert V_1 \bigcap V_2\right\vert > \left\vert V_1 \bigcap A \bigcap V_j \right\vert, \left\vert V_1 \bigcap V_2 \right\vert > \left\vert V_2 \bigcap A \bigcap V_j \right\vert, \\
\left\vert V_3 \bigcap V_4\right\vert > \left\vert V_3 \bigcap A \bigcap V_j \right\vert, \left\vert V_3 \bigcap V_4 \right\vert > \left\vert V_4 \bigcap A \bigcap V_j \right\vert, \\
\vdots \\
\left\vert V_{2L-1} \bigcap V_{2L}\right\vert > \left\vert V_{2L-1} \bigcap A \bigcap V_j \right\vert, \left\vert V_{2L-1} \bigcap V_{2L} \right\vert > \left\vert V_{2L} \bigcap A \bigcap V_j \right\vert\right)$$

and any errors in estimating $h$ will be amplified by the power of $M - 2L$.

Fortunately, it happens to be the case that this joint distribution is always greater than or equal to the product of the corresponding marginals:

$$h(N, q, V_1, ..., V_{2L}) \geq \prod\limits_{i = 1}^L p\left(
\left\vert V_{2i - 1} \bigcap V_{2i}\right\vert > \left\vert V_{2i-1} \bigcap A \bigcap V_j \right\vert\right) p\left(\left\vert V_{2i - 1} \bigcap V_{2i} \right\vert > \left\vert V_{2i} \bigcap A \bigcap V_j \right\vert\right) = h^*(N, q, V_1, ..., V_{2L}).$$

The intuition behind this is that knowing that $\left\vert V_j \bigcap X_1 \right\vert < y_1$ won't ever decrease the probability that $\left\vert V_j \bigcap X_2 \right\vert < y_2$.

And calculating $p\left(
\left\vert V_1 \bigcap V_2\right\vert > \left\vert V_1 \bigcap A \bigcap V_j \right\vert\right)$, for example, is quite straightforward, since each element of $V_j$ is sampled independently. Specifically, if we rewrite $y_{12} = \left\vert V_1 \bigcap V_2\right\vert$ and $Z_1 = V_1 \bigcap A$, then

$$p\left(
\left\vert V_1 \bigcap V_2\right\vert > \left\vert V_1 \bigcap A \bigcap V_j \right\vert\right) = p\left(y_{12} > \left\vert V_j \bigcap Z_1 \right\vert\right) = p\left(\left\vert V_j \bigcap Z_1 \right\vert < y_{12}\right),$$

but this is just given by 

$$p\left(\left\vert V_j \bigcap Z_1 \right\vert < y_{12}\right) = \sum\limits_{\left\vert V_j \bigcap Z_1 \right\vert = 0}^{y_{12} - 1} \textrm{Binomial}\left(\left\vert V_j \bigcap Z_1 \right\vert | \left\vert Z_1 \right\vert, q\right) = 
\textrm{CDF}_{\textrm{Bi}}\left(y_{12} - 1 | n=\left\vert Z_1 \right\vert, p=q\right)$$

which is easy to compute exactly.