### Preliminaries

The main challenge in computing this quantity will be retaining numerical precision, since our motivating hypothesis is that $p(s_L \textrm{ recalled correctly}|M, N, q)$ might often be very close to unity. In fact, it might even be exponentially close to unity, so it could be useful to think instead in terms of the error, and specifically its logarithm. We can adjust our calculation accordingly by noting:

$$p(s_L \textrm{ recalled incorrectly}|M, N, q) = 1 - p(s_L \textrm{ recalled correctly}|M, N, q) \\ \approx \cfrac{1}{N_{MC}}\sum\limits_{V_1, ..., V_{2L}}(1 - p(s_L \textrm{ recalled correctly}|V_1, ..., V_{2L}, M, N, q)) \leq \\
\cfrac{1}{N_{MC}}\sum\limits_{V_1, ..., V_{2L}}\left[1 - f(V_1, ..., V_{2L})h(V_1, ..., V_{2L}, N, q)^{M-2L}\right].$$

Taking the logarithm we then have:

$$\log p(s_L \textrm{ recalled incorrectly}|M, N, q) \leq \\
-\log N_{MC} + \log\sum\limits_{V_1, ..., V_{2L}}\left[1 - f(V_1, ..., V_{2L})h(V_1, ..., V_{2L}, N, q)^{M-2L}\right].$$

However, if any of terms in the sum is rounded to zero from numerical imprecision, the entire sum will be evaluated to be $-\infty$, preventing us from exploring the dependence of error on $N, q, M$ and $L$ when it is already very low. To deal with this, we will instead calculate the logarithm of the terms inside the sum, along with the fact that even though there's not an obviously useful mathematical identity expressing the logarithm of a sum as a function of a the logarithm of its individual terms, computationally this can be done very practically while still avoiding significant numerical errors (see appendix).

To determine

$$\log\left[1 - f(V_1, ..., V_{2L})h(V_1, ..., V_{2L}, N, q)^{M-2L}\right]$$

we first recall that $f$ is always either $0$ or $1$, so when it is $0$ this quantity will equal $0$. When $f = 1$ (which hypothesize to often be true) we must consider two numerical cases: (1) $h^{M-2L}$ is sufficiently less than unity that numerical errors are insignificant, or (2) $h^{M-2L}$ is close enough to unity that numerical errors might cause a problem.

In case (1) we can simply calculate $\log\left[1 - h^{M-2L}\right]$. In case (2) we must remember that we are fundamentally trying to get an accurate description of how close $h^{M - 2L}$ is to $1$, and since it is presumably very close, we would like to use the logarithm to maintain accuracy.

To proceed, we use the following Taylor expansion, valid when $1 - x < \epsilon$, i.e., $x$ is very close to unity:

$$1 - x \approx -\log x \implies \log(1 - x) \approx \log(-\log x) $$

or in our specific situation, since in case (2) we have that $h^{M-2L}$ is very close to unity,

$$\log(1 - h^{M-2L}) \approx \log(-\log h^{M-2L}) = \log\left(-\log\left(\prod\limits_{i=1}^{2L}c_i\right)^{M-2L}\right) = \\
\log\left(-(M - 2L)\left(\sum\limits_{i=1}^{2L}\log c_i\right)\right) =
\log(M-2L) + \log\left(-\left(\sum\limits_{i=1}^{2L}\log c_i\right)\right).$$

The final issue we must address is that $c_i$ may be so close to unity that numerical errors in calculating its log will cause us to lose desired precision. When this is the case, i.e., when $c_i > 1 - \epsilon$, it can be more practical to use the log of the *survival function* $\log s_i$, where $s_i = 1 - c_i$ but can be calculated without any significant loss of precision when $c_i > 1 - \epsilon$. To use this to our advantage, we note that when $c_i > 1 - \epsilon$ we have

$$\log c_i = \log(1 - s_i) \approx -s_i.$$

Thus, when calculating the inner sum we simply need to replace $\log c_i$ by $-s_i$ whenever $c_i > 1 - \epsilon$. This final adjustment allows us to calculate $\log(1 - h^{M-2L})$ without any significant loss of accuracy. And when all is said and done the calculation of our upper bound on the log error

$$\log p(s_L \textrm{ recalled incorrectly}|M, N, q)$$

can be completed in $O(N_{MC}L)$.

Remark: we can get a rough estimate for the log error $\log(1 - h^{M-2L})$ by noticing that the survival function, which is a stand-in for $-\log c_i$, will decrease roughly exponentially with $N$, such that the error equation is roughly:

$$\log(M - 2L) + \log(\alpha^N) = \log(M - 2L) - \alpha'N$$

since $\alpha < 1$.
This means that to keep the error under a fixed amount, M should be able to grow exponentially faster than $N$.

### Pseudocode

We can summarize the final algorithm for calculating the upper bound on $\log p(s_L \textrm{ recalled incorrectly}|M, N, q)$ as follows:

```python
LOG_EPSILON = -9*log(10)

def log_upper_error_bound(M, N, L, q):
# def log_upper_error_bound(M, N, L, q, R): # for asymmetric case

    vs = sample_vs(N, L, q, N_MC)
    # us, vs = sample_us_and_vs(N, L, q, R, N_MC) # for asymmetric case
    
    fs = np.nan * np.zeros(len(vs))
    log_neg_log_hs = np.nan * np.zeros(len(vs))

    # calculate some reusable quantities for each sample v
    
    for v_ctr, v in enumerate(vs):
    # for uv_ctr, (u, v) in enumerate(zip(us, vs)): # for asymmetric case

        xs, rs = calc_xs_and_rs(v)
        # xs, rs = calc_xs_and_rs_asymmetric(u, v) # for asymmetric case
        
        fs[v_ctr] = calc_f(v, xs, rs)

        x_sizes = np.sum(xs, axis=1)
        log_neg_log_hs[v_ctr] = calc_log_neg_log_h(x_sizes, rs, q)
        
    # calculate the log of each term in the sum,
    # using approximations where necessary
    
    log_sum_terms = np.nan * np.zeros(len(vs))
    
    for ctr, (f, log_neg_log_h) in enumerate(zip(fs, log_neg_log_hs)):
    
        if f == 0: log_sum_terms[ctr] = 0; continue
        
        # check if h^{M - 2L} is close to 1
        
        if log(M - 2L) + log_neg_log_h < LOG_EPSILON:
            log_sum_terms[ctr] = log(M - 2L) + log_neg_log_h
        
        else:
        
            # calculate 1 - h^{M - 2L} directly
            # NOTE: do we need to check if inner exponent is too big?
            
            h_to_m_minus_2l = exp(-exp(log_neg_log_h + log(M - 2L)))
            log_sum_terms[ctr] = log(1 - h_to_minus_2l)
            
    # calculate the log of the upper error bound
    
    return log_sum(log_sum_terms)
```

```python
def sample_vs(n, l, q):

    return (np.random.rand(2*l, n) < q).astype(int)
    
def calc_xs_and_rs(v):

    # calculate pairwise intersections
    isctns = [v[ctr] * v[ctr + 1] for ctr in range(0, len(v), 2)]
    isctns = np.array([val for pair in zip(isctns, isctns) for val in pair])
    
    # calculate maintained set and xs
    maintained = np.sum(isctns, axis=0)
    
    xs = [np.sum(v[ctr] * maintained) for ctr in range(len(v))]
    rs = isctns.sum(axis=1)
    
    return xs, rs
    
def calc_f(v, xs, rs):

    if len(v) <= 2: return 1
    
    for ctr_0, (x, r) in enumerate(zip(xs, rs)):
        
        # pair to which item belongs
        pair = (ctr_0 - 1, ctr_0) if x % 2 else (ctr_0, ctr_0 + 1)
        
        for ctr_1 in [j for j in range(len(vs)) if j not in pair]:
        
            if (v[ctr_1] * x).sum() >= r: return 0
            
    return 1
    
def calc_log_neg_log_h(x_sizes, rs, q):

    # calculate the log survival function for each x, r
    log_sfs = binom_log_sf(rs, x_sizes, q)
    
    # replace log_sfs by log(-log_cdf) when log_sf is too big for taylor approx
    mask = log_sfs > LOG_EPSILON
    log_sfs[mask] = log(-binom_log_cdf(rs[mask], x_sizes[mask], q))
    
    # calculate log of sum of sfs in terms of log_sfs
    return log_sum(log_sfs)

def log_sum(log_xs):

    ... calculate the logarithm of a sum given the logarithms of its terms 
```

## Asymmetric connectivity

Given the structure of solving for the capacity/recall error in the symmetric network, generalizing to asymmetric connectivity involves only minor changes. We would like to generate connections that are as random as possible but under the constraint that the reciprocity (the probability of getting a bidirectional connection given a unidirectional connection) is controlled by a parameter $R$. Given the following definitions

|variable|definition|
|--|
|$q$|marginal probability of a connection from an item to an association unit or vice versa|
|$R$|factor by which probability of laying a connection from item to association unit is scaled given that there exists a connection from the association to the item unit (and vice versa)|

we can define the most natural random process that creates the connection matrix:

1. Without loss of generality, sample the connections from the association to the item units first, i.i.d., with probability $q$.
2. Loop through possible item to association connections, adding them in the following way:
    * if connection exists from association to item unit: add a connection from item to association unit with probability $Rq$
    * else: add connection from item to association unit with probability $Dq$

Given $q$ and $R$ we can solve for $D$ by enforcing that the marginal item->association connection probability should be the same as the marginal association->item connection probability:

$$q = p(\textrm{cxn from assoc})Rq + p(\textrm{no cxn from assoc})Dq = q^2R + (1-q)qD.$$

This gives us 

$$D = \cfrac{1 - qR}{1 - q}.$$

Remark: since $D$ must be nonnegative, our choice of $R$ is limited by:

$$\cfrac{1 - qR}{1 - q} \geq 0 \implies R \leq \cfrac{1}{q}.$$

When $R = 1$, this relaxes to ER connections, and when $R = 1/q$ this creates a perfectly symmetric network. Anyhow, this provides a very natural way to create random asymmetric connections between the item and association units with a parameterized reciprocity.

The first chunk of the derivation is the same as the symmetric case. The first thing we need to modify is to change the computation of $p(s_L \textrm{ recalled correctly}|M, N, q)$ to $p(s_L \textrm{ recalled correctly}|M, N, q, R)$. To do so we need to modify our definitions to deal with the asymmetry:

|variable|definition|
|-|
|$V_i$|item $i$'s *upstream* neighbor set: the set of association units that project to item $i$|
|$U_i$|item $i$'s *downstream* neighbor set: the set of association units to which item $i$ projects|
|$A$|maintained set: the set of association units that remain hyperexcitable after the presentation of the initial sequence of item conjunctions; this is given by $A = \bigcup\limits_{i = 1}^L \left(U_{2i - 1} \bigcap U_{2i}\right)$|
|$X_i$|item $i$'s recall set: the set of association units activated by item $i$'s activation during the recall phase; this is given by the intersection $X_i = U_i \bigcap A$|
|$r_{kl}$|$|X_k \cap V_l|$, i.e., the size of the intersection of $X_k$ and $V_l$|

Similar to before, the quantity of interest is given by:

$$p(s_L \textrm{ recalled correctly}|M, N, q, R) = \\\sum\limits_{\substack{U_1, ..., U_{2L}, \\ V_1, ..., V_{2L}}}p(U_1, ..., U_{2L}, V_1, ..., V_{2L})p(s_L \textrm{ recalled correctly}|U_1, ..., U_{2L}, V_1, ..., V_{2L}, M, N, q, R),$$

and as before, we will take a Monte Carlo approach to approximating this quantity, such that our main goal becomes expressing the final term in the sum analytically.

This is given by 

$$p(s_L \textrm{ recalled correctly}|U_1, ..., U_{2L}, V_1, ..., V_{2L}, M, N, q, R) = \\
p(\textrm{no interference among first } 2L \textrm{ items} | U_1, ..., U_{2L}, V_1, ..., V_{2L}, M, N, q, R) \times \\p(\textrm{no interference from items } 2L+1 \textrm{ to } M|\textrm{ no interference among first } 2L \textrm{ items}, U_1, ..., U_{2L}, V_1, ..., V_{2L}, M, N, q, R).$$

And as before, the first term is a deterministic function $f_*(U_1, ..., U_{2L}, V_1, ..., V_{2L})$, and the second term is independent of the first, such that

$$p(s_L \textrm{ recalled correctly}|U_1, ..., U_{2L}, V_1, ..., V_{2L}, M, N, q, R) = \\
f_*(U_1, ..., U_{2L}, V_1, ..., V_{2L}) \times \\
p(\textrm{no interference from items } 2L+1 \textrm{ to } M|U_1, ..., U_{2L}, V_1, ..., V_{2L}, M, N, q, R).$$

And similar to what we had before, the probability of item $j > 2L + 1$ interfering is independent of the probability of item $k \neq j, > 2L + 1$ interfering, given $U_1, ..., U_{2L}, V_1, ..., V_2L, M, N, q, $ and $R$. Further, the quantity is equal for all $j > 2L + 1$. So:

$$p(s_L \textrm{ recalled correctly}|U_1, ..., U_{2L}, V_1, ..., V_{2L}, M, N, q, R) = \\
f_*(U_1, ..., U_{2L}, V_1, ..., V_{2L}) \times \\
p(\textrm{no interference from item } j|U_1, ..., U_{2L}, V_1, ..., V_{2L}, M, N, q, R)^{M - 2L}.$$

Further, the last term's dependence on $U_1, ..., U_{2L}, V_1, ..., V_{2L}$ can be condensed into a dependence on the recall sets $X_1, ..., X_{2L}$ and intersection sizes $r_{kl}$ of each recall set and the relevant upstream neighbor sets, with the latter determining the number of inputs received by the correct item units during recall. Thus:

$$p(\textrm{no interference from item } j|U_1, ..., U_{2L}, V_1, ..., V_{2L}, M, N, q, R) = \\
p(\textrm{no interference from item } j|X_1, ..., X_{2L}, r_{12}, r_{21}, ..., r_{2L, 2L-1}, M, N, q, R).$$

And since the interference from item $j$ depends only its upstream neighbors $V_j$, we recover a form for this expression identical to that for the symmetrical case:

$$p(\textrm{no interference from item } j|X_1, ..., X_{2L}, r_{12}, r_{21}, ..., r_{2L, 2L-1}, M, N, q, R) = \\
p(|X_1 \cap V_j| < r_{12}, |X_2 \cap V_j| < r_{21}, |X_3 \cap V_j| < r_{34}, ...|X_1, ..., X_{2L}, r_{12}, r_{21}, r_{34}, ..., N, q).$$

Note that we lose the explicit dependence on $R$ because $V_j$ depends only on $q$. Its effect here has instead been absorbed through $r_{12}, r_{21}, ...$, since these quantities will decrease as $R$ decreases towards 1 (it also affects $f_*$ in the same way).

Thus, we can use the same tools we used for the rest of the derivation for the symmetrical case, specifically that the joint probability is larger than the product of the marginals, so that

$$p(s_L \textrm{ recalled correctly}|M, N, q, R) \geq \\
\sum\limits_{\substack{U_1, ..., U_{2L}, \\ V_1, ..., V_{2L}}} p(U_1, ..., U_{2L}, V_1, ..., V_{2L}|N, q, R)f_*(U_1, ..., U_{2L}, V_1, ..., V_{2L})h(V_1, ..., V_{2L}, N, q)^{M-2L}$$

where 

$$h(V_1, ..., V_{2L}, N, q)^{M-2L} = \left(\prod\limits_{i=1}^{2L}c_i\right)^{M-2L}$$

and 

$$c_1 = p(|X_1 \cap V_j| < r_{12}|r_{12}, X_1, N, q), \\ c_2 = p(|X_2 \cap V_j| < r_{21}|r_{21}, X_2, N, q), \\ \vdots \\ c_{2L} = p(|X_{2L} \cap V_j| < r_{2L, 2L-1}|r_{2L, 2L-1}, X_{2L}, N, q).$$

### Pseudocode

## Appendices

### Appendix A

### Appendix B