## Encoding of pairwise associations through activation-triggered hyperexcitability

Consider encoding a set of $L$ pairwise associations (e.g., "pink hat", "blue sock", etc.) using activation-triggered hyperexcitability (ATH) in the following way.

### single node dynamics

First a reminder about the dynamics of the single nodes. The total input $v_i(t)$ and activation $r_i(t)$ to a node $i$ at time $t$ is given by 

$v_i(t) = s_i(t) + (W\mathbf{r}(t-1))_i + g_x x_i(t)$

$r_i(t) = 1 ~~ \textrm{if} ~~ v_i(t) \geq v_{th} ~~ \textrm{and 0 otherwise}$

where $s_i(t)$ is any stimulus to node $i$ at time $t$, $W$ is the connectivity matrix, $x_i(t)$ is a binary variable indicating whether node $i$ is hyperexcitable at time $t$, and $g_x$ is a "gain" scaling the influence of the hyperexcitability term.

A node becomes hyperexcitable (i.e., $x_i$ goes to 1) after it activates and remains hyperexcitable for the next $T_x$ time steps (with the counter renewing upon each activation). Here we consider $T_x$ to be large enough so that hyperexcitable nodes remain hyperexcitable for the full course of an "experiment".

### network architecture

The network is separated into an "item" layer (outer ring) and an "association layer" (inner square). Each unit in the item layer corresponds to single item (for example, "blue", "pink", "sock", "hat", etc.), so pairing two items together corresponds to pairing two item units together.

<img src="files/diagrams/demo1.png" width="500" />

There are no item-to-item or association-to-association connections. Item-to-association and association-to-item connections are random and independent (erdos-renyi) and exist with probably $q$ and reciprocity coefficient $R$. When $R = 1/q$, the connections are completely symmetric, and when $R = 1$ the connections are completely Erdos-Renyi. More on controlling reciprocity in the "asymmetric connectivity" section below.

The item-to-association units all have connection strength $w_{ai}$, and association-to-item units all have connection strength $w_{ia}$. We assume that $w_{ai}$ and $g_x$ are such that an association unit activates if and only if either (1) it receives at least two simultaneous item inputs or (2) it is hyperexcitable and receives at least one item input. I haven't thought through $w_{ia}$ in quite as much detail, but the idea is to have it be less than $w_{ai}$ and such that an item unit will activate if and only if it (1) is hyperexcitable and (2) receives input from a "large" number of association units.

We now demonstrate the basic idea behind how such a network would be used to store a set of pairwise associations. For the demo we assume completely symmetric connections between item units and association units (except that they have different magnitudes $w_{ai}$ and $w_{ia}$.

### demo

First assume we start with all units inactive and non-hyperexcitable:
<img src="files/diagrams/demo2.png" width="500" />
Next, strongly stimulate the "pink" and "hat" item units simultaneously:
<img src="files/diagrams/demo3.png" width="500" />
This makes the "pink" and "hat" units activate simultaneously. Since each of these projects randomly to the association layer, some association units will receive an inputs from just the "pink" unit or just the "hat" unit (red outlines) and a handful will receive inputs from both (yellow outlines):
<img src="files/diagrams/demo4.png" width="500" />
At this point, only the association units receiving inputs from both item units will activate:
<img src="files/diagrams/demo5.png" width="500" />
We now strongly inhibit the whole network so that all the units turn off. The ones that were active, however, (the "pink" and "hat" item units and the association units that received projections from both of them) remain hyperexcitable. This static set of hyperexcitable units encodes the "pink hat" association. 
<img src="files/diagrams/demo6.png" width="500" />

We can recall the association in a content-addressable way by activating just one of the items in it. Here we stimulate the "hat" unit:
<img src="files/diagrams/demo7.png" width="500" />
This makes the "hat" unit activate, sending inputs to all of the association units it projects to (red outlines):
<img src="files/diagrams/demo8.png" width="500" />
The only association units that consequently activate, however, will be those that were hyperexcitable. Importantly, this happens to be the set that connect to both the "pink" and the "hat" units. Consequently, the "pink" and the "hat" units will receive a "large" number of inputs from the association units.
<img src="files/diagrams/demo9.png" width="500" />
This large number of association inputs to the "pink" and "hat" units will combine with the hyperexcitability of the "pink" and "hat" units to make them both cross over their thresholds and activate, whereas inputs to any other item units will be too weak to activate them. Consequently, the "pink" and the "hat" units activate.
<img src="files/diagrams/demo10.png" width="500" />
Thus, stimulating the "hat" unit has recalled the "pink hat" association.

#### superimposing a second pairwise association without interfering with the first
We would now like to add "blue sock" to the network's memory, so that we can later recall both "blue sock" and "pink hat", without accidentally recalling "blue hat" or "pink sock".

We first strongly inhibit the network to inactivate all the units, so that only their "hyperexcitability" trace remains:
<img src="files/diagrams/demo11.png" width="500" />
Next, we simultaneously stimulate the "blue" unit and the "sock" unit:
<img src="files/diagrams/demo12.png" width="500" />
This makes the "blue" and "sock" units activate. As with the "pink hat" example, some association units will receive an input from just the "blue" unit (red outline), some will receive an input from just the "sock" unit (red outline), and some will receive an input from both the "blue" and "sock" units (yellow outline):
<img src="files/diagrams/demo13.png" width="500" />
The association units that activate will be those that receive inputs from both the "blue" and "sock" units or those that receive input from just one, but which were already hyperexcitable from the "pink hat" association. Luckily, the latter set is rather small compared to the former:
<img src="files/diagrams/demo14.png" width="500" />
We now apply a blanket inhibition again to silence all the units. The two associations ("pink hat", "blue sock") are now encoded in the set of item and association units that are hyperexcitable.
<img src="files/diagrams/demo15.png" width="500" />

We now try to read out the "blue sock" association by first stimulating the "blue" item unit:
<img src="files/diagrams/demo16.png" width="500" />
This makes the "blue" item unit activate, and all of the association units it connects to receive an input from it (red outlines):
<img src="files/diagrams/demo17.png" width="500" />
As before, the only association units that activate will be those that are both hyperexcitable and receive an input from the "blue" item unit. Importantly, this set of active association units will mostly comprise association units that connect to both the "blue" and the "sock" units. Consequently, the "blue" and "sock" units will receive a large number of inputs from the association units:
<img src="files/diagrams/demo18.png" width="500" />
Because only the "blue" and "sock" units are both hyperexcitable and receive a large number of inputs from the association layer, whereas this is not true for any other item units, only the "blue" and "sock" units will activate:
<img src="files/diagrams/demo19.png" width="500" />
Thus, activating the "blue" unit recalls the "sock" unit, indicating that the association has been correctly stored and recalled, and not confused with, say, "blue hat" or "blue pink".

Further, while I haven't shown it in these diagrams, activating the "sock" unit will recall "blue sock", and activating either the "pink" or the "hat" units will recall "pink hat". Thus, the two associations are held simultaneously in the pattern of hyperexcitable units and can be independently recalled without interfering with one another.

### Intuition

An association between two item units is fundamentally stored by making hyperexcitable the set of association units that connects to both of them. A very large number of unique associations can be specified because there are a very large number of unique sets of association units. E.g., if the average size of an item-pair-defined set of association units is $N^*$, and there are $N$ association units, then there are about $N ~ choose ~ N^*$ unique sets, which is a very large number if $N^*$ is sufficiently far from 0 and $N$.

Multiple associations can be stored because the average overlap between the set of association units defined by one arbitrarily chosen item pair and the set of association units defined by another arbitrarily chosen item pair is very small. Consequently, multiple item-pair-defined sets of association units can be made hyperexcitable without much interference. The readout is enabled by the symmetrical connections.

## Capacity analysis

There are two ways I've been thinking about the "capacity" of such a network. The first is in terms of how many pairwise associations can be simultaneously stored, i.e., what the maximum $L$ is for a given number of item and association units that still allows for a high probability of accurate recall. This is going to be very limited, since with every additional pairwise association that gets stored, a new set of association units gets made hyperexcitable. Thus, when $L$ gets large enough, you basically end up making all the association units hyperexcitable, so the association layer stops containing any information. This aligns with the intuition that in human short-term memory we can't hold that many associations in mind at once.

The other way you can think about capacity is to ask: given $L$ and $N$ (the number of association units), what is the maximum size of the item layer (i.e., how large of an alphabet can the items/symbols be taken from) such that $L$ pairwise associations can be stored and recalled with a fixed maximum error rate. The idea is that our intuitions about human short-term memory suggest that the set of items/symbols we can draw from when storing our small number of simultaneous associations can be very large. E.g., you can store and recall arbitrary pairs of words in your native language, even though your language might contain tens of thousands of words. This is the analysis undertaken in the rest of this notebook.

### Important note:

I did this capacity analysis for a slightly different version of the network dynamics, which I was using previously before switching to the dynamics described in the sequence of diagrams above. The main differences are:

1. In the version that follows the item units do not have the activation-triggered hyperexcitability property.
2. Instead, during the recall phase, we assume the two item units that get activated are those that receive the most inputs from the association layer. For example, if the "pink" and "hat" item units receive more inputs than all the rest of the item units, then they will activate, regardless of their total inputs relative to their thresholds. That is, during the recall phase, we basically assume a 2-WTA dynamics among the item nodes.

Updating the analysis to use the newer version of the network dynamics depicted in the diagrams will require some small changes to the mathematics, but I don't think the main arguments or conclusions will be that different.

In any case, the main thing to keep in mind the following analysis is that we say that an association between item $i$ and item $j$ is recalled correctly if during the recall time step, if you sort the item units by the amount of input they receive from the association layer, $i$ and $j$ are in the top two slots.

## Symmetric connectivity

Our goal is to calculate a measure of recall capacity for the network as a function of the network's parameters. First, some definitions:

|variable|definition|
|-|
|$M$|number of item units|
|$N$|number of association units|
|$q$|connection probability|
|$L$|number of non-overlapping conjunctions to remember|

Further, we label item units as $1, 2, 3, ..., M$. 

First, for random connections between the item and association units, we would like to determine how many sets of $L$ conjunctions will be recalled incorrectly. We thus define the normalized error rate $z$ for a connection matrix $W$ as

$$z(L; W) = \cfrac{1}{N_L^*} \sum\limits_{s_L} \mathbb{1}[s_L \textrm{ recalled incorrectly}; W]$$

where $s_L$ is a set of $L$ non-overlapping pairwise feature conjunctions, e.g., $\{(1, 5), (7, 9), (10, 3)\}$ and $N_L^*$ is the number of possible size-$L$ sets of said conjunctions. $z(L; W)$ is thus the probability that a randomly chosen set of $L$ non-overlapping item conjunctions is recalled incorrectly given a network structure $W$. If $z(L; W) = 0$ then all sets of conjunctions are recalled correctly and the network has maximal capacity; if $z(L; W) = 1$ then all sets of conjunctions are recalled incorrectly and the network has zero capacity. 

This problem can be approached more easily in expectation, since

$$E_W[z(L; W)|M, N, q] = E_W\left[\cfrac{1}{N_L^*} \sum \limits_{s_L} \mathbb{1}[s_L \textrm{ recalled incorrectly}; W]| M, N, q\right]$$

$$= \cfrac{1}{N_L^*} \sum \limits_{s_L} E_W\left[\mathbb{1}[s_L \textrm{ recalled incorrectly}; W]|M, N, q\right]$$

$$= \cfrac{1}{N_L^*} \sum \limits_{s_L} p(s_L \textrm{ recalled incorrectly}|M, N, q)$$

However, the value in the sum is the same for all $s_L$, so the expression evaluates to

$$\cfrac{N_L^*}{N_L^*} p(s_L \textrm{ recalled incorrectly}|M, N, q) = p(s_L \textrm{ recalled incorrectly}|M, N, q).$$

Without loss of generality we will thus assume that $s_L = \{(1, 2), (3, 4), ..., (2L - 1, 2L)\}$.

This probability depends on the probability of different connection patterns arising in the matrix. To formalize this we first introduce some new definitions.

|variable|definition|
|-|
|$V_i$|item $i$'s neighbor set: the set of association units connected to item unit $i$|
|$A$|the "maintained" set: the set of association units that remain hyperexcitable after the presentation of the initial sequence of item conjunctions; this is given by $A = \bigcup\limits_{i = 1}^L \left(V_{2i - 1} \bigcap V_{2i}\right)$|
|$X_i$|item $i$'s recall set: the set of association units activated by item $i$'s activation during the recall phase; this is given by the intersection $X_i = V_i \bigcap A$, i.e., the set of maintained hyperexcitable units that connect to item $i$|

Next, instead of calculating $$p(s_L \textrm{ recalled incorrectly}|M, N, q),$$ we will calculate $$p(s_L \textrm{ recalled correctly}|M, N, q) = 1 - p(s_L \textrm{ recalled incorrectly}|M, N, q).$$ 

Our strategy is to marginalize over $V_1, ..., V_{2L}$ by writing

$$p(s_L \textrm{ recalled correctly}|M, N, q) = \sum\limits_{V_1, ..., V_{2L}} p(V_1, ..., V_{2L}|N, q)p(s_L \textrm{ recalled correctly}|V_1, ..., V_{2L}, M, N, q).$$

Note that $p(V_1, ..., V_{2L}|N, q)$ doesn't depend on $M$ since all connections are independent and random, and we're only concerned with the connections between item units $1, .., 2L$ and the association layer of $N$ units.

Since we can easily sample $V_1, ..., V_{2L}$, we can use a Monte Carlo approach to accurately estimate $p(s_L \textrm{ recalled correctly}|M, N, q)$, so long as we can calculate $p(s_L \textrm{ recalled correctly}|V_1, ..., V_{2L}, M, N, q)$ analytically.

The guiding intuition for calculating the latter, given $V_1, ..., V_{2L}$ and consequently $A, X_1, X_2, ..., X_{2L}$, is that when item $1$ activates $X_1$ during the recall phase, item $2$ must receive more inputs from the reactivated association units $X_1$ than any other item (except item $1$) in order for it to be recalled correctly. That is, the intersection $X_1 \cap V_2$ of item 1's recall set with item 2's neighbor set must be larger than the intersection of item 1's recall set $X_1$ with any other item's neighbor set (except item 1's). If another item $j \neq 1, 2$ receives more input from $X_1$ than item $2$ receives, then it *interferes* with recall, since it will be recalled instead of the correct item $2$. Thus, for a given connection matrix, $s_L$ will be recalled correctly if none of the conjunctions $\{(1, 2), (3, 4), ..., (2L-1, 2L)\}$ suffer from interference. 

Writing things a bit more explicitly, we have

$$p(s_L \textrm{ recalled correctly}|V_1, ..., V_{2L}, M, N, q) = p( \\
|X_1 \cap V_2| > |X_1 \cap V_j| ~~ \forall j \notin \{1, 2\}, ~~ |X_2 \cap V_1| > |X_2 \cap V_j| ~~ \forall j \notin \{1, 2\}, \\
|X_3 \cap V_4| > |X_3 \cap V_j| ~~ \forall j \notin \{3, 4\}, ~~ |X_4 \cap V_3| > |X_4 \cap V_j| ~~ \forall j \notin \{3, 4\}, \\
\vdots \\
|X_{2L-1} \cap V_{2L}| > |X_{2L-1} \cap V_j| ~~ \forall j \notin \{2L-1, 2L\}, ~~ |X_{2L} \cap V_{2L-1}| > |X_{2L} \cap V_j| ~~ \forall j \notin \{2L-1, 2L\}, \\
|V_1, ..., V_{2L}, M, N, q).$$

Rearranging terms, we arrive at

$$p(s_L \textrm{ recalled correctly}|V_1, ..., V_{2L}, M, N, q) = p( \\
|X_1 \cap V_2| > |X_1 \cap V_j| ~~ \forall j \in \{3, ..., 2L\}, |X_2 \cap V_1| > |X_2 \cap V_j| ~~ \forall j \in \{3, ..., 2L\}, \\
|X_3 \cap V_4| > |X_3 \cap V_j| ~~ \forall j \in \{1, 2, 5, 6, ..., 2L\}, |X_4 \cap V_3| > |X_4 \cap V_j| ~~ \forall j \in \{1, 2, 5, 6, ..., 2L\}, \\
\vdots \\
|X_{2L-1} \cap V_{2L}| > |X_{2L-1} \cap V_j| ~~ \forall j \in \{1, 2, ..., 2L-3, 2L-2\}, |X_{2L} \cap V_{2L-1}| > |X_{2L} \cap V_j| ~~ \forall j \in \{1, 2, ..., 2L-3, 2L-2\}, \\
\textrm{ * * * * }\\
|X_1 \cap V_2| > |X_1 \cap V_j| ~~ \forall j > 2L, |X_2 \cap V_1| > |X_2 \cap V_j| ~~ \forall j > 2L, \\
|X_3 \cap V_4| > |X_3 \cap V_j| ~~ \forall j > 2L, |X_4 \cap V_3| > |X_4 \cap V_j| ~~ \forall j > 2L, \\
\vdots \\
|X_{2L-1} \cap V_{2L}| > |X_{2L-1} \cap V_j| ~~ \forall j > 2L, |X_{2L} \cap V_{2L-1}| > |X_{2L} \cap V_j| ~~ \forall j > 2L, \\
|V_1, ..., V_{2L}, M, N, q).$$

Note now that the portion above the $\textrm {* * * *}$ is just

$$p(\textrm{no interference among first } 2L \textrm{ items}|V_1, ..., V_{2L}, M, N, q),$$

(which is actually deterministic given $V_1, ..., V_{2L}$, as we discuss in a moment).

So $p(s_L \textrm{ recalled correctly}|V_1, ..., V_{2L}, M, N, q)$ becomes equal to 

$$p(\textrm{no interference among first } 2L \textrm{ items}|V_1, ..., V_{2L}, M, N, q)\times p( \\
|X_1 \cap V_2| > |X_1 \cap V_j| ~~ \forall j > 2L, |X_2 \cap V_1| > |X_2 \cap V_j| ~~ \forall j > 2L, \\
|X_3 \cap V_4| > |X_3 \cap V_j| ~~ \forall j > 2L, |X_4 \cap V_3| > |X_4 \cap V_j| ~~ \forall j > 2L, \\
\vdots \\
|X_{2L-1} \cap V_{2L}| > |X_{2L-1} \cap V_j| ~~ \forall j > 2L, |X_{2L} \cap V_{2L-1}| > |X_{2L} \cap V_j| ~~ \forall j > 2L, \\
|V_1, ..., V_{2L}, M, N, q, \textrm{no interference among first } 2L \textrm{ items}).$$

Motivated by the fact that connections are sampled i.i.d., we can rearrange the second term to

$$p(\\
|X_1 \cap V_2| > |X_1 \cap V_{2L+1}|, |X_2 \cap V_1| > |X_2 \cap V_{2L+1}|, |X_3 \cap V_4| > |X_3 \cap V_{2L+1}|, ...,\\
|X_1 \cap V_2| > |X_1 \cap V_{2L+2}|, |X_2 \cap V_1| > |X_2 \cap V_{2L+2}|, |X_3 \cap V_4| > |X_3 \cap V_{2L+2}|, ...,\\
\vdots\\
|X_1 \cap V_2| > |X_1 \cap V_M|, |X_2 \cap V_1| > |X_2 \cap V_M|, |X_3 \cap V_4| > |X_3 \cap V_M|, ...,\\
|V_1, ..., V_{2L}, M, N, q),$$

where we have noted also that the above quantity does not depend on whether the first $2L$ items interfere with each other or not.

But because connections are i.i.d., each item's neighbor set is independent of all the other items' neighbor sets. And since each line of the above expression depends only on one item's neighbor sets, the probabilities of the events in each line are independent, so the whole expression equals

$$\prod\limits_{j = 2L + 1}^M p(|X_1 \cap V_2| > |X_1 \cap V_j|, |X_2 \cap V_1| > |X_2 \cap V_j|, |X_3 \cap V_4| > |X_3 \cap V_j|, ...|V_1, ..., V_{2L}, N, q).$$

Further, since all the connections are identically distributed, we also have that the probability inside the sum is independent of $j$. Therefore the expression reduces to

$$p(|X_1 \cap V_2| > |X_1 \cap V_{j > 2L}|, |X_2 \cap V_1| > |X_1 \cap V_{j > 2L}|, |X_3 \cap V_4| > |X_3 \cap V_{j > 2L}|, ...|V_1, ..., V_{2L}, N, q)^{M - 2L}.$$

Combining what we know so far, we arrive at

$$p(s_L \textrm{ recalled correctly}|V_1, ..., V_{2L}, M, N, q) = \\
p(\textrm{no interference among first } 2L \textrm{ items}|V_1, ..., V_{2L}, M, N, q) \times \\
p(|X_1 \cap V_2| > |X_1 \cap V_{j > 2L}|, |X_2 \cap V_1| > |X_2 \cap V_{j > 2L}|, |X_3 \cap V_4| > |X_3 \cap V_{j > 2L}|, ...|V_1, ..., V_{2L}, N, q)^{M - 2L}.$$

Next, we note that $p(\textrm{no interference among first } 2L \textrm{ items}|V_1, ..., V_{2L}, M, N, q)$ is the completely deterministic function $f(V_1, ..., V_{2L})$. To handle the second term on the right side of the equation, we introduce a new definition:

|variable|definition|
|------|
|$r_{kl}$|$|X_k \cap V_l|$, i.e., the size of the intersection of $X_k$ and $V_l$|

Further, we reduce the notation a bit by assuming that $j$ refers to an item not included in $1, 2, ..., 2L$. Then the second term (sans the exponent) becomes:

$$p(|X_1 \cap V_j| < r_{12}, |X_2 \cap V_j| < r_{21}, |X_3 \cap V_j| < r_{34}, ...|r_{12}, r_{21}, r_{34}, ..., X_1, X_2, ..., N, q) = \\
p(|X_1 \cap V_j| < r_{12}|r_{12}, X_1, N, q) \times \\
p(|X_2 \cap V_j| < r_{21} | |X_1 \cap V_j| < r_{12}, r_{21}, X_2, ..., N, q) \times \\
\vdots \\
\times p(|X_{2L} \cap V_j| < r_{2L, 2L-1} | |X_{2L-1} \cap V_j| < r_{2L-1, 2L}, ..., |X_1 \cap V_j| < r_{12}, r_{2L, 2L-1}, X_{2L}, N, q),$$

where we've broken up the joint distribution into a chain of conditionals.

The final useful insight is that each conditional term in the product is larger than the corresponding marginal, e.g., 

$$p(|X_2 \cap V_j| < r_{21} | |X_1 \cap V_j| < r_{12}, r_{21}, X_2, ..., N, q) \geq p(|X_2 \cap V_j| < r_{21} | r_{21}, X_2, ..., N, q).$$

This is because knowing that the intersection of $V_j$ with one set of association units is bounded from above can never increase the probability that its intersection with another (potentially overlapping) set of assocation units is larger than a previously determined size ($r_{kl}$), since the connections are all drawn independently (**NOTE: I've reduced this statement to an equivalent statement that should be much easier to prove, but I have actually gotten kind of stuck proving it, even though it seems like it almost has to be true**). Thus, we have

$$p(|X_1 \cap V_j| < r_{12}, |X_2 \cap V_j| < r_{21}, |X_3 \cap V_j| < r_{34}, ...|r_{12}, r_{21}, r_{34}, ..., X_1, X_2, ..., N, q) \geq \\
p(|X_1 \cap V_j| < r_{12}|r_{12}, X_1, N, q)p(|X_2 \cap V_j| < r_{21}|r_{21}, X_2, N, q) ... p(|X_{2L} \cap V_j| < r_{2L, 2L-1}|r_{2L, 2L-1}, X_{2L}, N, q)$$

and therefore

$$p(s_L \textrm{ recalled correctly}|V_1, ..., V_{2L}, M, N, q) \geq \\
f(V_1, ..., V_{2L}) \times \\
\left[p(|X_1 \cap V_j| < r_{12}|r_{12}, X_1, N, q)p(|X_2 \cap V_j| < r_{21}|r_{21}, X_2, N, q) ... p(|X_{2L} \cap V_j| < r_{2L, 2L-1}|r_{2L, 2L-1}, X_{2L}, N, q)\right]^{M-2L},$$

so

$$p(s_L \textrm{ recalled correctly}|M, N, q) = \sum\limits_{V_1, ..., V_{2L}} p(V_1, ..., V_{2L}|N, q)p(s_L \textrm{ recalled correctly}|V_1, ..., V_{2L}, M, N, q) \geq \\
\sum\limits_{V_1, ..., V_{2L}} p(V_1, ..., V_{2L}|N, q)f(V_1, ..., V_{2L}) \times \\
\left[p(|X_1 \cap V_j| < r_{12}|r_{12}, X_1, N, q)p(|X_2 \cap V_j| < r_{21}|r_{21}, X_2, N, q) ... p(|X_{2L} \cap V_j| < r_{2L, 2L-1}|r_{2L, 2L-1}, X_{2L}, N, q)\right]^{M-2L},$$

giving us a lower bound on the capacity of the network. To clean things up we can write the last term as 

$$h(V_1, ..., V_{2L}, N, q)^{M-2L} = \left(\prod\limits_{i=1}^{2L}c_i\right)^{M-2L}$$

where 

$$c_1 = p(|X_1 \cap V_j| < r_{12}|r_{12}, X_1, N, q), \\ c_2 = p(|X_2 \cap V_j| < r_{21}|r_{21}, X_2, N, q), \\ \vdots \\ c_{2L} = p(|X_{2L} \cap V_j| < r_{2L, 2L-1}|r_{2L, 2L-1}, X_{2L}, N, q).$$

Then:

$$p(s_L \textrm{ recalled correctly}|M, N, q) \geq
\sum\limits_{V_1, ..., V_{2L}} p(V_1, ..., V_{2L}|N, q)f(V_1, ..., V_{2L})h(V_1, ..., V_{2L}, N, q)^{M-2L}$$

The terms inside the sum now become easily computable, since $f(V_1, ..., V_2L)$ can be determined via a two small nested ```for``` loops, and each $c_i$ is just the CDF of the binomial distribution with $n=|X_i|, p=q$ evaluated at $r_{kl} - 1$. Our strategy for evaluating the whole sum will be to simply sample $V_1, ..., V_{2L}$ a large number of times, compute the term inside the sum for each of them, and then take the average. As the number of samples of $V_1, ..., V_{2L}$ increases, our approximation will approach the true lower bound in an unbiased way. That is, we can now let the sum run over the $N_{MC}$ random samples of $\{V_1, ..., V_{2L}\}$ such that we can approximate

$$p(s_L \textrm{ recalled correctly}|M, N, q) \geq
\cfrac{1}{N_{MC}}\sum\limits_{V_1, ..., V_{2L}} f(V_1, ..., V_{2L})h(V_1, ..., V_{2L}, N, q)^{M-2L}$$

## Asymmetric connectivity

Given the structure of solving for the capacity/recall error in the symmetric network, generalizing to asymmetric connectivity involves only minor changes. We would like to generate connections that are as random as possible but under the constraint that the reciprocity (the probability of getting a bidirectional connection given a unidirectional connection) is controlled by a parameter $R$. Given the following definitions

|variable|definition|
|--|
|$q$|marginal probability of a connection from an item to an association unit or vice versa|
|$R$|factor by which probability of laying a connection from item to association unit is scaled given that there exists a connection from the association to the item unit (and vice versa)|

we can define the most natural random process that creates the connection matrix:

1. Without loss of generality, sample the connections from the association to the item units first, i.i.d., with probability $q$.
2. Loop through possible item to association connections, adding them in the following way:
    * if connection exists from association to item unit: add a connection from item to association unit with probability $Rq$
    * else: add connection from item to association unit with probability $Dq$

Given $q$ and $R$ we can solve for $D$ by enforcing that the marginal item->association connection probability should be the same as the marginal association->item connection probability:

$$q = p(\textrm{cxn from assoc})Rq + p(\textrm{no cxn from assoc})Dq = q^2R + (1-q)qD.$$

This gives us 

$$D = \cfrac{1 - qR}{1 - q}.$$

Remark: both $R$ and $D$ are bounded by $1/q$, since the sampling probability cannot be greater than 1. $R$ is also bounded from below by $R \geq 2/q - 1/q^2$

When $R = 1$, this relaxes to ER connections, and when $R = 1/q$ this creates a perfectly symmetric network. Anyhow, this provides a very natural way to create random asymmetric connections between the item and association units with a parameterized reciprocity.

The first chunk of the derivation is the same as the symmetric case. The first thing we need to modify is to change the computation of $p(s_L \textrm{ recalled correctly}|M, N, q)$ to $p(s_L \textrm{ recalled correctly}|M, N, q, R)$. To do so we need to modify our definitions to deal with the asymmetry:

|variable|definition|
|-|
|$V_i$|item $i$'s *upstream* neighbor set: the set of association units that project to item $i$|
|$U_i$|item $i$'s *downstream* neighbor set: the set of association units to which item $i$ projects|
|$A$|maintained set: the set of association units that remain hyperexcitable after the presentation of the initial sequence of item conjunctions; this is given by $A = \bigcup\limits_{i = 1}^L \left(U_{2i - 1} \bigcap U_{2i}\right)$|
|$X_i$|item $i$'s recall set: the set of association units activated by item $i$'s activation during the recall phase; this is given by the intersection $X_i = U_i \bigcap A$|
|$r_{kl}$|$|X_k \cap V_l|$, i.e., the size of the intersection of $X_k$ and $V_l$|

Similar to before, the quantity of interest is given by:

$$p(s_L \textrm{ recalled correctly}|M, N, q, R) = \\\sum\limits_{\substack{U_1, ..., U_{2L}, \\ V_1, ..., V_{2L}}}p(U_1, ..., U_{2L}, V_1, ..., V_{2L})p(s_L \textrm{ recalled correctly}|U_1, ..., U_{2L}, V_1, ..., V_{2L}, M, N, q, R),$$

and as before, we will take a Monte Carlo approach to approximating this quantity, such that our main goal becomes expressing the final term in the sum analytically.

This is given by 

$$p(s_L \textrm{ recalled correctly}|U_1, ..., U_{2L}, V_1, ..., V_{2L}, M, N, q, R) = \\
p(\textrm{no interference among first } 2L \textrm{ items} | U_1, ..., U_{2L}, V_1, ..., V_{2L}, M, N, q, R) \times \\p(\textrm{no interference from items } 2L+1 \textrm{ to } M|\textrm{ no interference among first } 2L \textrm{ items}, U_1, ..., U_{2L}, V_1, ..., V_{2L}, M, N, q, R).$$

And as before, the first term is a deterministic function $f_*(U_1, ..., U_{2L}, V_1, ..., V_{2L})$, and the second term is independent of the first, such that

$$p(s_L \textrm{ recalled correctly}|U_1, ..., U_{2L}, V_1, ..., V_{2L}, M, N, q, R) = \\
f_*(U_1, ..., U_{2L}, V_1, ..., V_{2L}) \times \\
p(\textrm{no interference from items } 2L+1 \textrm{ to } M|U_1, ..., U_{2L}, V_1, ..., V_{2L}, M, N, q, R).$$

And similar to what we had before, the probability of item $j > 2L + 1$ interfering is independent of the probability of item $k \neq j, > 2L + 1$ interfering, given $U_1, ..., U_{2L}, V_1, ..., V_2L, M, N, q, $ and $R$. Further, the quantity is equal for all $j > 2L + 1$. So:

$$p(s_L \textrm{ recalled correctly}|U_1, ..., U_{2L}, V_1, ..., V_{2L}, M, N, q, R) = \\
f_*(U_1, ..., U_{2L}, V_1, ..., V_{2L}) \times \\
p(\textrm{no interference from item } j|U_1, ..., U_{2L}, V_1, ..., V_{2L}, M, N, q, R)^{M - 2L}.$$

Further, the last term's dependence on $U_1, ..., U_{2L}, V_1, ..., V_{2L}$ can be condensed into a dependence on the recall sets $X_1, ..., X_{2L}$ and intersection sizes $r_{kl}$ of each recall set and the relevant upstream neighbor sets, with the latter determining the number of inputs received by the correct item units during recall. Thus:

$$p(\textrm{no interference from item } j|U_1, ..., U_{2L}, V_1, ..., V_{2L}, M, N, q, R) = \\
p(\textrm{no interference from item } j|X_1, ..., X_{2L}, r_{12}, r_{21}, ..., r_{2L, 2L-1}, M, N, q, R).$$

And since the interference from item $j$ depends only its upstream neighbors $V_j$, we recover a form for this expression identical to that for the symmetrical case:

$$p(\textrm{no interference from item } j|X_1, ..., X_{2L}, r_{12}, r_{21}, ..., r_{2L, 2L-1}, M, N, q, R) = \\
p(|X_1 \cap V_j| < r_{12}, |X_2 \cap V_j| < r_{21}, |X_3 \cap V_j| < r_{34}, ...|X_1, ..., X_{2L}, r_{12}, r_{21}, r_{34}, ..., N, q).$$

Note that we lose the explicit dependence on $R$ because $V_j$ depends only on $q$. Its effect here has instead been absorbed through $r_{12}, r_{21}, ...$, since these quantities will decrease as $R$ decreases towards 1 (it also affects $f_*$ in the same way).

Thus, we can use the same tools we used for the rest of the derivation for the symmetrical case, specifically that the joint probability is larger than the product of the marginals, so that

$$p(s_L \textrm{ recalled correctly}|M, N, q, R) \geq \\
\sum\limits_{\substack{U_1, ..., U_{2L}, \\ V_1, ..., V_{2L}}} p(U_1, ..., U_{2L}, V_1, ..., V_{2L}|N, q, R)f_*(U_1, ..., U_{2L}, V_1, ..., V_{2L})h(V_1, ..., V_{2L}, N, q)^{M-2L}$$

where 

$$h(V_1, ..., V_{2L}, N, q)^{M-2L} = \left(\prod\limits_{i=1}^{2L}c_i\right)^{M-2L}$$

and 

$$c_1 = p(|X_1 \cap V_j| < r_{12}|r_{12}, X_1, N, q), \\ c_2 = p(|X_2 \cap V_j| < r_{21}|r_{21}, X_2, N, q), \\ \vdots \\ c_{2L} = p(|X_{2L} \cap V_j| < r_{2L, 2L-1}|r_{2L, 2L-1}, X_{2L}, N, q).$$

## Results of monte-carlo calculation for symmetric connectivity

Coding this up was actually rather challenging since everything has to be converted to log-probabilities, and sometimes the log-probabilities had to be estimated by taylor expansions to avoid numerical errors (pseudocode is [here](capacity_analysis_pseudocode.ipynb) for those interested).

Here I used the above results for estimating $p(s_L~\textrm{recalled correctly}|M, N, q, R)$ to find the max $M$ given a fixed $N$, $q$, $L$, and $R$. For each MC calculation I used 1000 samples of $V_1, ..., V_{2L}$ in the estimation. I ran the whole thing to solve for the max $M$ that kept recall below a fixed error rate 20 times for each $N$ and $L$, hence the small scattering of dots of the same color.

I fixed the max error rate at $10^{-4}$ to make the plot below. Also note: "binding unit" = "association unit".

<img src="files/diagrams/cap_analysis.png" />

First, this shows that as $L$ increases, the size of the alphabet must decrease to maintain a fixed maximum error rate.

Second, assuming that the one piece of the analysis relating the conditional to the marginal probabilities is correct, this would seem to suggest that the size of the item alphabet can grow exponentially in the size of the association layer, given $L$. This conforms with our intuition that the association layer can exist in a combinatorially large number of hyperexcitability states.