## Symmetric connectivity

Our goal is to calculate a measure of recall capacity error for a network as a function of the network's parameters. First, some definitions:

|variable|definition|
|-|
|$M$|number of item units|
|$N$|number of association units|
|$q$|connection probability|
|$L$|number of non-overlapping conjunctions to remember|

For random connections between the item and association units, we would like to determine how many sets of $L$ conjunctions will be recalled incorrectly. We thus define the normalized error rate $z$ for a connection matrix $W$ as

$$z(L; W) = \cfrac{1}{N_L^*} \sum\limits_{s_L} \mathbb{1}[s_L \textrm{ recalled incorrectly}; W]$$

where $s_L$ is a set of $L$ non-overlapping pairwise feature conjunctions, e.g., $\{(1, 5), (7, 9), (10, 3)\}$ and $N_L^*$ is the number of possible size-$L$ sets of said conjunctions. $z(L; W)$ is thus the probability that a randomly chosen set of $L$ non-overlapping item conjunctions is recalled incorrectly given a network structure $W$. If $z(L; W) = 0$ then all sets of conjunctions are recalled correctly and the network has maximal capacity; if $z(L; W) = 1$ then all sets of conjunctions are recalled incorrectly and the network has minimal capacity. 

This problem can be approached more easily in expectation, since

$$E_W[z(L; W)|M, N, q] = E_W\left[\cfrac{1}{N_L^*} \sum \limits_{s_L} \mathbb{1}[s_L \textrm{ recalled incorrectly}; W]| M, N, q\right]$$

$$= \cfrac{1}{N_L^*} \sum \limits_{s_L} E_W\left[\mathbb{1}[s_L \textrm{ recalled incorrectly}; W]|M, N, q\right]$$

$$= \cfrac{1}{N_L^*} \sum \limits_{s_L} p(s_L \textrm{ recalled incorrectly}|M, N, q)$$

However, the value in the sum is the same for all $s_L$, so the expression evaluates to

$$\cfrac{N_L^*}{N_L^*} p(s_L \textrm{ recalled incorrectly}|M, N, q) = p(s_L \textrm{ recalled incorrectly}|M, N, q).$$

Thus, without loss of generality we can assume that $s_L = \{(1, 2), (3, 4), ..., (2L - 1, 2L)\}$.

This probability depends on the probability of different connection patterns arising in the matrix. To formalize this we first introduce some new definitions.

|variable|definition|
|-|
|$V_i$|item $i$'s neighbor set: the set of association units connected to item unit $i$|
|$A$|maintained set: the set of association units that remain hyperexcitable after the presentation of the initial sequence of item conjunctions; this is given by $A = \bigcup\limits_{i = 1}^L \left(V_{2i - 1} \bigcap V_{2i}\right)$|
|$X_i$|item $i$'s recall set: the set of association units activated by item $i$'s activation during the recall phase; this is given by the intersection $X_i = V_i \bigcap A$|

Next, instead of calculating $$p(s_L \textrm{ recalled incorrectly}|M, N, q),$$ we will calculate $$p(s_L \textrm{ recalled correctly}|M, N, q) = 1 - p(s_L \textrm{ recalled incorrectly}|M, N, q).$$ 

Our strategy is to marginalize over $V_1, ..., V_{2L}$ by writing

$$p(s_L \textrm{ recalled correctly}|M, N, q) = \sum\limits_{V_1, ..., V_{2L}} p(V_1, ..., V_{2L})p(s_L \textrm{ recalled correctly}|V_1, ..., V_{2L}, M, N, q).$$

Since we can easily sample $V_1, ..., V_{2L}$, we can use a Monte Carlo approach to accurately estimate $p(s_L \textrm{ recalled correctly}|M, N, q)$, so long as we can calculate $p(s_L \textrm{ recalled correctly}|V_1, ..., V_{2L}, M, N, q)$ analytically.

The guiding intuition for calculating the latter, given $V_1, ..., V_{2L}$ and consequently $A, X_1, X_2, ..., X_{2L}$, is that when item $1$ activates $X_1$ during the recall phase, item $2$ must receive more inputs from the reactivated association units $X_1$ than any other item in order for it to be recalled correctly. If another item $j \neq 1, 2$ receives more input from $X_1$ than item $2$ receives, then it *interferes* with recall, since it will be recalled instead of the correct item $2$. Thus, for a given connection matrix, $s_L$ will be recalled correctly if none of the conjunctions $\{(1, 2), (3, 4), ..., (2L-1, 2L)\}$ suffer from interference. 

# Writing things a bit more explicitly, we have

$$p(s_L \textrm{ recalled correctly}|V_1, ..., V_{2L}, M, N, q) = p( \\
|X_1 \cap V_2| > |X_1 \cap V_j| ~~ \forall j \notin \{1, 2\}, ~~ |X_2 \cap V_1| > |X_2 \cap V_j| ~~ \forall j \notin \{1, 2\}, \\
|X_3 \cap V_4| > |X_3 \cap V_j| ~~ \forall j \notin \{3, 4\}, ~~ |X_4 \cap V_3| > |X_4 \cap V_j| ~~ \forall j \notin \{3, 4\}, \\
\vdots \\
|X_{2L-1} \cap V_{2L}| > |X_{2L-1} \cap V_j| ~~ \forall j \notin \{2L-1, 2L\}, ~~ |X_{2L} \cap V_{2L-1}| > |X_{2L} \cap V_j| ~~ \forall j \notin \{2L-1, 2L\}, \\
|V_1, ..., V_{2L}, M, N, q).$$

Rearranging terms, we arrive at

$$p(s_L \textrm{ recalled correctly}|V_1, ..., V_{2L}, M, N, q) = p( \\
|X_1 \cap V_2| > |X_1 \cap V_j| ~~ \forall j \in \{3, ..., 2L\}, |X_2 \cap V_1| > |X_2 \cap V_j| ~~ \forall j \in \{3, ..., 2L\}, \\
|X_3 \cap V_4| > |X_3 \cap V_j| ~~ \forall j \in \{1, 2, 5, 6, ..., 2L\}, |X_4 \cap V_3| > |X_4 \cap V_j| ~~ \forall j \in \{1, 2, 5, 6, ..., 2L\}, \\
\vdots \\
|X_{2L-1} \cap V_{2L}| > |X_{2L-1} \cap V_j| ~~ \forall j \in \{1, 2, ..., 2L-3, 2L-2\}, |X_{2L} \cap V_{2L-1}| > |X_{2L} \cap V_j| ~~ \forall j \in \{1, 2, ..., 2L-3, 2L-2\}, \\
\textrm{ * * * * }\\
|X_1 \cap V_2| > |X_1 \cap V_j| ~~ \forall j > 2L, |X_2 \cap V_1| > |X_2 \cap V_j| ~~ \forall j > 2L, \\
|X_3 \cap V_4| > |X_3 \cap V_j| ~~ \forall j > 2L, |X_4 \cap V_3| > |X_4 \cap V_j| ~~ \forall j > 2L, \\
\vdots \\
|X_{2L-1} \cap V_{2L}| > |X_{2L-1} \cap V_j| ~~ \forall j > 2L, |X_{2L} \cap V_{2L-1}| > |X_{2L} \cap V_j| ~~ \forall j > 2L, \\
|V_1, ..., V_{2L}, M, N, q).$$

Note now that the portion above the $\textrm {* * * *}$ is just

$$p(\textrm{no interference among first } 2L \textrm{ items}|V_1, ..., V_{2L}, M, N, q),$$

So $p(s_L \textrm{ recalled correctly}|V_1, ..., V_{2L}, M, N, q)$ becomes equal to 

$$p(\textrm{no interference among first } 2L \textrm{ items}|V_1, ..., V_{2L}, M, N, q)\times p( \\
|X_1 \cap V_2| > |X_1 \cap V_j| ~~ \forall j > 2L, |X_2 \cap V_1| > |X_2 \cap V_j| ~~ \forall j > 2L, \\
|X_3 \cap V_4| > |X_3 \cap V_j| ~~ \forall j > 2L, |X_4 \cap V_3| > |X_4 \cap V_j| ~~ \forall j > 2L, \\
\vdots \\
|X_{2L-1} \cap V_{2L}| > |X_{2L-1} \cap V_j| ~~ \forall j > 2L, |X_{2L} \cap V_{2L-1}| > |X_{2L} \cap V_j| ~~ \forall j > 2L, \\
|V_1, ..., V_{2L}, M, N, q, \textrm{no interference among first } 2L \textrm{ items}).$$

Motivated by the fact that connections are sampled i.i.d., we can rearrange the second term to

$$p(\\
|X_1 \cap V_2| > |X_1 \cap V_{2L+1}|, |X_2 \cap V_1| > |X_2 \cap V_{2L+1}|, |X_3 \cap V_4| > |X_3 \cap V_{2L+1}|, ...,\\
|X_1 \cap V_2| > |X_1 \cap V_{2L+2}|, |X_2 \cap V_1| > |X_2 \cap V_{2L+2}|, |X_3 \cap V_4| > |X_3 \cap V_{2L+2}|, ...,\\
\vdots\\
|X_1 \cap V_2| > |X_1 \cap V_M|, |X_2 \cap V_1| > |X_2 \cap V_M|, |X_3 \cap V_4| > |X_3 \cap V_M|, ...,\\
|V_1, ..., V_{2L}, M, N, q),$$

where we have noted also that the above quantity does not depend on whether the first $2L$ items interfere with each other or not.

But because connections are i.i.d., each item's neighbor set is independent of all the other items' neighbor sets. And since each line of the above expression depends only on one item's neighbor sets, the probabilities of the events in each line are independent, so the whole expression equals

$$\prod\limits_{j = 2L + 1}^M p(|X_1 \cap V_2| > |X_1 \cap V_j|, |X_2 \cap V_1| > |X_2 \cap V_j|, |X_3 \cap V_4| > |X_3 \cap V_j|, ...|V_1, ..., V_{2L}, N, q).$$

Further, since all the connections are identically distributed, we also have that the probability inside the sum is independent of $j$. Therefore the expression reduces to

$$p(|X_1 \cap V_2| > |X_1 \cap V_{j > 2L}|, |X_2 \cap V_1| > |X_1 \cap V_{j > 2L}|, |X_3 \cap V_4| > |X_3 \cap V_{j > 2L}|, ...|V_1, ..., V_{2L}, N, q)^{M - 2L}.$$

Combining what we know so far, we arrive at

$$p(s_L \textrm{ recalled correctly}|V_1, ..., V_{2L}, M, N, q) = \\
p(\textrm{no interference among first } 2L \textrm{ items}|V_1, ..., V_{2L}, M, N, q) \times \\
p(|X_1 \cap V_2| > |X_1 \cap V_j|, |X_2 \cap V_1| > |X_2 \cap V_j|, |X_3 \cap V_4| > |X_3 \cap V_j|, ...|V_1, ..., V_{2L}, N, q)^{M - 2L}.$$

Next, we note that $p(\textrm{no interference among first } 2L \textrm{ items}|V_1, ..., V_{2L}, M, N, q)$ is the completely deterministic function $f(V_1, ..., V_{2L})$. To handle the second term on the right side of the equation, we introduce a new definitions:

|variable|definition|
|------|
|$r_{kl}$|$|X_k \cap V_l|$, i.e., the size of the intersection of $X_k$ and $V_l$|

Then the second term (sans the exponential) becomes:

$$p(|X_1 \cap V_j| < r_{12}, |X_2 \cap V_j| < r_{21}, |X_3 \cap V_j| < r_{34}, ...|r_{12}, r_{21}, r_{34}, ..., X_1, X_2, ..., N, q) = \\
p(|X_1 \cap V_j| < r_{12}|r_{12}, X_1, N, q) \times \\
p(|X_2 \cap V_j| < r_{21} | |X_1 \cap V_j| < r_{12}, r_{21}, X_2, ..., N, q) \times \\
\vdots \\
\times p(|X_{2L} \cap V_j| < r_{2L, 2L-1} | |X_{2L-1} \cap V_j < r_{2L-1, 2L}, ..., |X_1 \cap V_j| < r_{12}, r_{2L, 2L-1}, X_{2L}, N, q),$$

where we've broken up the joint distribution into a chain of conditionals.

The final useful insight is that each conditional term in the product is larger than the corresponding marginal, e.g., 

$$p(|X_2 \cap V_j| < r_{21} | |X_1 \cap V_j| < r_{12}, r_{21}, X_2, ..., N, q) \geq p(|X_2 \cap V_j| < r_{21} | r_{21}, X_2, ..., N, q).$$

This is because knowing that the intersection of $V_j$ with one set of association units is bounded from above can never increase the probability that its intersection with another (potentially overlapping) set of assocation units is larger than a previously determined size ($r_{kl}$). Thus, we have

$$p(|X_1 \cap V_j| < r_{12}, |X_2 \cap V_j| < r_{21}, |X_3 \cap V_j| < r_{34}, ...|r_{12}, r_{21}, r_{34}, ..., X_1, X_2, ..., N, q) \geq \\
p(|X_1 \cap V_j| < r_{12}|r_{12}, X_1, N, q)p(|X_2 \cap V_j| < r_{21}|r_{21}, X_2, N, q) ... p(|X_{2L} \cap V_j| < r_{2L, 2L-1}|r_{2L, 2L-1}, X_{2L}, N, q)$$

and therefore

$$p(s_L \textrm{ recalled correctly}|V_1, ..., V_{2L}, M, N, q) \geq \\
f(V_1, ..., V_{2L}) \times \\
\left[p(|X_1 \cap V_j| < r_{12}|r_{12}, X_1, N, q)p(|X_2 \cap V_j| < r_{21}|r_{21}, X_2, N, q) ... p(|X_{2L} \cap V_j| < r_{2L, 2L-1}|r_{2L, 2L-1}, X_{2L}, N, q)\right]^{M-2L},$$

so

$$p(s_L \textrm{ recalled correctly}|M, N, q) = \sum\limits_{V_1, ..., V_{2L}} p(V_1, ..., V_{2L})p(s_L \textrm{ recalled correctly}|V_1, ..., V_{2L}, M, N, q) \geq \\
\sum\limits_{V_1, ..., V_{2L}} p(V_1, ..., V_{2L})f(V_1, ..., V_{2L}) \times \\
\left[p(|X_1 \cap V_j| < r_{12}|r_{12}, X_1, N, q)p(|X_2 \cap V_j| < r_{21}|r_{21}, X_2, N, q) ... p(|X_{2L} \cap V_j| < r_{2L, 2L-1}|r_{2L, 2L-1}, X_{2L}, N, q)\right]^{M-2L},$$

giving us a lower bound on the capacity of the network. To clean things up we can write the last term as 

$$h(V_1, ..., V_{2L}, N, q)^{M-2L} = \left(\prod\limits_{i=1}^{2L}c_i\right)^{M-2L}$$

where 

$$c_1 = p(|X_1 \cap V_j| < r_{12}|r_{12}, X_1, N, q), \\ c_2 = p(|X_2 \cap V_j| < r_{21}|r_{21}, X_2, N, q), \\ \vdots \\ c_{2L} = p(|X_{2L} \cap V_j| < r_{2L, 2L-1}|r_{2L, 2L-1}, X_{2L}, N, q).$$

Then:

$$p(s_L \textrm{ recalled correctly}|M, N, q) \geq
\sum\limits_{V_1, ..., V_{2L}} p(V_1, ..., V_{2L})f(V_1, ..., V_{2L})h(V_1, ..., V_{2L}, N, q)^{M-2L}$$

The terms inside the sum now become easily computable, since $f(V_1, ..., V_2L)$ can be determined via a two small nested ```for``` loops, and each $c_i$ is just the CDF of the binomial distribution with $n=|X_i|, p=q$ evaluated at $r_{kl} - 1$. Our strategy for evaluating the whole sum will be to simply sample $V_1, ..., V_{2L}$ a large number of times, compute the term inside the sum for each of them, and then take the average. As the number of samples of $V_1, ..., V_{2L}$ increases, our approximation will approach the true lower bound in an unbiased way. That is, we can now let the sum run over the $N_{MC}$ random samples of $\{V_1, ..., V_{2L}\}$ such that we can approximate

$$p(s_L \textrm{ recalled correctly}|M, N, q) \geq
\cfrac{1}{N_{MC}}\sum\limits_{V_1, ..., V_{2L}} f(V_1, ..., V_{2L})h(V_1, ..., V_{2L}, N, q)^{M-2L}$$

### Pseudocode preliminaries

The main challenge in computing this quantity will be retaining numerical precision, since our motivating hypothesis is that $p(s_L \textrm{ recalled correctly}|M, N, q)$ might often be very close to unity. In fact, it might even be exponentially close to unity, so it could be useful to think instead in terms of the error, and specifically its logarithm. We can adjust our calculation accordingly by noting:

$$p(s_L \textrm{ recalled incorrectly}|M, N, q) = 1 - p(s_L \textrm{ recalled correctly}|M, N, q) \\ \approx \cfrac{1}{N_{MC}}\sum\limits_{V_1, ..., V_{2L}}(1 - p(s_L \textrm{ recalled correctly}|V_1, ..., V_{2L}, M, N, q)) \leq \\
\cfrac{1}{N_{MC}}\sum\limits_{V_1, ..., V_{2L}}\left[1 - f(V_1, ..., V_{2L})h(V_1, ..., V_{2L}, N, q)^{M-2L}\right].$$

Taking the logarithm we then have:

$$\log p(s_L \textrm{ recalled incorrectly}|M, N, q) \leq \\
-\log N_{MC} + \log\sum\limits_{V_1, ..., V_{2L}}\left[1 - f(V_1, ..., V_{2L})h(V_1, ..., V_{2L}, N, q)^{M-2L}\right].$$

However, if any of terms in the sum is rounded to zero from numerical imprecision, the entire sum will be evaluated to be $-\infty$, preventing us from exploring the dependence of error on $N, q, M$ and $L$ when it is already very low. To deal with this, we will instead calculate the logarithm of the terms inside the sum, along with the fact that even though there's not an obviously useful mathematical identity expressing the logarithm of a sum as a function of a the logarithm of its individual terms, computationally this can be done very practically while still avoiding significant numerical errors (see appendix).

To determine

$$\log\left[1 - f(V_1, ..., V_{2L})h(V_1, ..., V_{2L}, N, q)^{M-2L}\right]$$

we first recall that $f$ is always either $0$ or $1$, so when it is $0$ this quantity will equal $0$. When $f = 1$ (which hypothesize to often be true) we must consider two numerical cases: (1) $h^{M-2L}$ is sufficiently less than unity that numerical errors are insignificant, or (2) $h^{M-2L}$ is close enough to unity that numerical errors might cause a problem.

In case (1) we can simply calculate $\log\left[1 - h^{M-2L}\right]$. In case (2) we must remember that we are fundamentally trying to get an accurate description of how close $h^{M - 2L}$ is to $1$, and since it is presumably very close, we would like to use the logarithm to maintain accuracy.

To proceed, we use the following Taylor expansion, valid when $1 - x < \epsilon$, i.e., $x$ is very close to unity:

$$1 - x \approx -\log x \implies \log(1 - x) \approx \log(-\log x) $$

or in our specific situation, since in case (2) we have that $h^{M-2L}$ is very close to unity,

$$\log(1 - h^{M-2L}) \approx \log(-\log h^{M-2L}) = \log\left(-\log\left(\prod\limits_{i=1}^{2L}c_i\right)^{M-2L}\right) = \\
\log\left(-(M - 2L)\left(\sum\limits_{i=1}^{2L}\log c_i\right)\right) =
\log(M-2L) + \log\left(-\left(\sum\limits_{i=1}^{2L}\log c_i\right)\right).$$

The final issue we must address is that $c_i$ may be so close to unity that numerical errors in calculating its log will cause us to lose desired precision. When this is the case, i.e., when $c_i > 1 - \epsilon$, it can be more practical to use the log of the *survival function* $\log s_i$, where $s_i = 1 - c_i$ but can be calculated without any significant loss of precision when $c_i > 1 - \epsilon$. To use this to our advantage, we note that when $c_i > 1 - \epsilon$ we have

$$\log c_i = \log(1 - s_i) \approx -s_i.$$

Thus, when calculating the inner sum we simply need to replace $\log c_i$ by $-s_i$ whenever $c_i > 1 - \epsilon$. This final adjustment allows us to calculate $\log(1 - h^{M-2L})$ without any significant loss of accuracy. And when all is said and done the calculation of our upper bound on the log error

$$\log p(s_L \textrm{ recalled incorrectly}|M, N, q)$$

can be completed in $O(N_{MC}L)$.

### Pseudocode

We can summarize the final algorithm for calculating the upper bound on $\log p(s_L \textrm{ recalled incorrectly}|M, N, q)$ as follows:

```
given N, L, and q: draw N_MC samples of {V_1, ..., V_2L}

for each {V_1, ..., V_2L}_j:

    f_j = f({V_1, ..., V_2L})
    
    if f_j == 0: log_sum_term_j = 0
    
    else:
        
        for each i in 1, ..., 2L:
            
            calculate r_i
            calculate |X_i|
            
            ... to be completed
        
```

## Asymmetric connectivity

Given the structure of solving for the capacity/recall error in the symmetric network, generalizing to asymmetric connectivity involves only minor changes. We would like to generate connections that are as random as possible but under the constraint that the reciprocity (the probability of getting a bidirectional connection given a unidirectional connection) is controlled by a parameter $R$. Given the following definitions

|variable|definition|
|--|
|$q$|marginal probability of a connection from an item to an association unit or vice versa|
|$R$|factor by which probability of laying a connection from item to association unit is scaled given that there exists a connection from the association to the item unit (and vice versa)|

we can define the most natural random process that creates the connection matrix:

1. Without loss of generality, sample the connections from the association to the item units first, i.i.d., with probability $q$.
2. Loop through possible item to association connections, adding them in the following way:
    * if connection exists from association to item unit: add a connection from item to association unit with probability $Rq$
    * else: add connection from item to association unit with probability $Dq$

Given $q$ and $R$ we can solve for $D$ by enforcing that the marginal item to association connection probability should be the same as the marginal association to item connection probability:

$$q = p(\textrm{cxn from assoc})Rq + p(\textrm{no cxn from assoc})Dq = q^2R + (1-q)qD.$$

This gives us 

$$D = \cfrac{1 - qR}{1 - q}.$$

Remark: since $D$ must be nonnegative, our choice of $R$ is limited by:

$$\cfrac{1 - qR}{1 - q} \geq 0 \implies R \leq \cfrac{1}{q}.$$

When $R = 1$, this relaxes to ER connections, and when $R = 1/q$ this creates a perfectly symmetric network. Anyhow, this provides a very natural way to create random asymmetric connections between the item and association units with a parameterized reciprocity.

The first chunk of the derivation is the same as the symmetric case. The first thing we need to modify is to change the computation of $p(s_L \textrm{ recalled correctly}|M, N, q)$ to $p(s_L \textrm{ recalled correctly}|M, N, q, R)$. To do so we need to modify our definitions to deal with the asymmetry:

|variable|definition|
|-|
|$V_i$|item $i$'s *upstream* neighbor set: the set of association units that project to item $i$|
|$U_i$|item $i$'s *downstream* neighbor set: the set of association units to which item $i$ projects|
|$A$|maintained set: the set of association units that remain hyperexcitable after the presentation of the initial sequence of item conjunctions; this is given by $A = \bigcup\limits_{i = 1}^L \left(U_{2i - 1} \bigcap U_{2i}\right)$|
|$X_i$|item $i$'s recall set: the set of association units activated by item $i$'s activation during the recall phase; this is given by the intersection $X_i = U_i \bigcap A$|

Similar to before, this is given by:

$$p(s_L \textrm{ recalled correctly}|M, N, q, R) = \sum\limits_{\substack{U_1, ..., U_{2L}, \\ V_1, ..., V_{2L}}} p(U_1, ..., U_{2L}, V_1, ..., V_{2L})p(s_L \textrm{ recalled correctly}|U_1, ..., U_{2L}, V_1, ..., V_{2L}, M, N, q, R).$$

### Pseudocode