# Analysis of the predominant phase

$P_a$ = the expected proportion of A-units activated by a stimulus of a given size<br>
$P_c$ = the conditional probability that an A-unit which responds to a given stimulus ($S_1$), will allso respond to another given stimulus ($S_2$)<br><br>
It can be shown that as the size of the retina is increased,<br>
the number of S-points($N_s$) quickly ceases to be a significant parameter,<br>
and the values of $P_a$ and $P_c$ approach the value that they would have for a retina with infinitely many points.<br><br>
For a large retina, therefore, the equations are as follows:<br><br>($P_a$ approach)<br>
$$
P_a = \sum^{x}_{e=\theta} \sum^{min(y, e-\theta)}_{i=\theta} P(e,i)
$$

$$
where
$$

$$
P(e,i) = 
\begin{equation}
    \begin{pmatrix}
    x \\
    e\\
    \end{pmatrix}
R^{e}(1-R)^{x-e} \times 
    \begin{pmatrix}
    y \\
    i\\
    \end{pmatrix}
\end{equation}
R^{i}(1-R)^{y-i}
$$
and<br>
$R$ = proportion of S-points activated by the stimulus<br>
$x$ = number of excitatory connections to each A-unit<br>
$y$ = number of inhibitory connections to each A-unit<br>
$\theta$ = threshold of A-units<br>
(The quantities $e$ and $i$ are the excitatory and inhibitory components of the excitation received by the A-unit from the stimulus.<br>
If the algebraic sum $\alpha = e+i$ is equal to or greater than $\theta$, the A-unit is assumed to respond)

In [1]:
import numpy as np
from scipy.special import comb

def calculate_Pa(R,x,y,theta):
    def P(e, i):
        term1 = comb(x, e) * (R**e) * ((1-R)**(x-e))
        term2 = comb(y, i) * (R**i) * ((1-R)**(y-i))
        return term1 * term2
    Pa = sum(sum(P(e,i) for i in range(max(theta, e - theta), min(y, e - theta)+1)) for e in range(theta, x+1))
    return Pa

($P_{c}$ approach)
$$
P_{c} = \frac{1}{P_{a}} \sum^{x}_{e= \theta } \sum^{y}_{i={e-\theta}} \sum^{e}_{l_{e}=0} \sum^{i}_{l_{i}=0} \sum^{x-e}_{g_{e}=0} \sum^{y-i}_{g_{i}=0} P(e,i,l_{e}, l_{i}, g_{e}, g_{i})
$$
$$
(e - i - l_{e} + l_{i} + g_{e} - g_i \ge \theta)
$$
$$
where
$$
$$
\begin{equation}
\begin{aligned}
    &P(e,i,l_{e}, l_{i}, g_{e}, g_{i})\\
    &=\binom{x}{e} R^{e}(1 - R)^{x - e} \\
    &\times\binom{y}{i} R^{i}(1 - R)^{y - i} \\
    &\times \binom{e}{l_{e}} L^{l_{e}}(1-L)^{e - l_{e}} \\
    &\times \binom{i}{l_{i}} L^{l_{i}}(1-L)^{i - l_{i}} \\
    &\times \binom{x - e}{g_{e}} G^{g_{e}}(1-G)^{x - e - g_{e}} \\
    &\times \binom{y - i}{g_{i}} G^{g_{i}}(1-G)^{y - i - g_{i}} \\
\end{aligned}
\end{equation}
$$
and<br>
$L$ = proportion of the S-points illuminated by the first stimulus, $S_1$, which are not illumintated by $S_2$<br>
$G$ = proportion of the residual S-set (left over from the first stimulus) which is included in the second stimulus($S_2$)
$$$$

In [9]:
from tqdm import tqdm

def calculate_Pa(R,x,y,theta):
    def P(e, i):
        term1 = comb(x, e) * (R**e) * ((1-R)**(x-e))
        term2 = comb(y, i) * (R**i) * ((1-R)**(y-i))
        return term1 * term2
    Pa = sum(sum(P(e,i) for i in range(max(theta, e - theta), min(y, e - theta)+1)) for e in range(theta, x+1))
    return Pa
    
def calculate_Pc(x, y, theta, R, L, G, Pa):
    def P(e,i,le, li, ge, gi):
        term1 = comb(x, e) * (R**e) * ((1 - R)**(x - e))
        term2 = comb(y, i) * (R**i) * ((1 - R)**(y - i))
        term3 = comb(e, le) * (L**le) * ((1 - L)**(e - le))
        term4 = comb(i, li) * (L**li) * ((1 - L)**(i - li))
        term5 = comb(x - e, ge) * (G**ge) * ((1 - G)**(x - e - ge))
        term6 = comb(y - i, gi) * (G**gi) * ((1 - G)**(y - i - gi))
        return term1 * term2 * term3 * term4 * term5 * term6
    
    Pa = calculate_Pa(R, x, y, theta)
    Pc = 0
    for e in tqdm(range(theta, x + 1)):
        for i in range(e - theta, y + 1):
            for le in range(0, e + 1):
                for li in range(0, i + 1):
                    for ge in range(0, x - e + 1):
                        for gi in range(0, y - i + 1):
                            if (e - i - le + li + ge - gi) >= theta:
                                Pc+= P(e,i,le, li, ge, gi)
                                
    return Pc/ Pa if Pa != 0 else 0

In [18]:
# Example numbers

x = 10   # number of excitatory connections
y = 10   # number of inhibitory connections
theta = 3  # threshold
R = 0.5  # proportion of S-points activated by the stimulus
L = 0.3  # proportion for L
G = 0.4  # proportion for G

Pa = 0.6  # Example value for Pa, to be computed separately as per your model

Pc = calculate_Pc(x, y, theta, R, L, G, Pa)
Pc

100%|█████████████████████████████████████████████| 8/8 [00:00<00:00, 40.10it/s]


1.8039399712510817

The minimum value of $P_c$ is equal to 
$$
P_{c_{min}} = (1 - L)^{x}(1 - G)^{y}
$$

In [11]:
def Pc_min(L, x, G, y):
    term1 = (1 - L)**x
    term2 = (1 - G)**y
    return term1 * term2

# Mathematical Analysis of Learning in the Perception

Probability that the perceptron will show a bias towards the "correct" response in preference is called $P_r$.

Probability that the perceptron will give the correct response for the class of stimuli which is represented.<Br>This probability is called $P_g$, the probability of correct generalization.<br>
$$P = P(N_{a_{r}} > 0) \cdot \phi(Z)$$
$$where$$
$$P(N_{a_{r}} > 0) = 1 - (1 - P_{a})^{N_{c}}$$
<div align = "center">
    $\phi (Z)$ = normal curve integral from $-\infty$ to $Z$
</div>
$$and$$
$$
Z=\frac{c_1n_{s_{r}} + c_{2}}{\sqrt{{c_3n_{s_{r}}}^2 + c_4n_{s_{r}}}}
$$

In [19]:
from scipy.stats import norm

def calculate_P(Nc, Pa, c1, c2, c3, c4, nsr):
    P_Nar_greater_than_0 = 1 - (1 - Pa) ** Nc
    Z = (c1 * nsr + c2) / np.sqrt((c3 * nsr) ** 2 + c4 * nsr)
    phi_Z = norm.cdf(Z)

    P = P_Nar_greater_than_0 * phi_Z
    return P

In ideal environment, consisting of randomly placed points of illumination, where there is no attempt to classify stimuli according to intrinsic similarity.<br>
Thus, in a typical learning experiment, we might show the perceptron 1,000 stimuli made up of random collections of illuminated retinal points,<br>and we might arbitrarily reinforce $R_{1}$ as the "correct" response for the first 500 of these, and $R_{2}$ for the first 500.<br><br>
This environment is "ideal" only in the sense that we speak of an ideal gas in physics; it is a convenient artifact for purposes of analysis, and does not lead to the best performance from the perceptron.<br>
In the ideal environment situation, the constant $c_{1}$ is always equal to zero, so that, in the case of $P_{g}$ (where $c_{2}$ is also zero), the value of $Z$ will be zero, and $P_{g}$ can never be any better than the random expectation of 0.5.<br>
The evaluation of $P_{r}$ for these conditions, however, throws some interesting light on the differences between the alpha, beta, and gamma systems.<br><br>
First consider the alpha system, which has the simplest dynamics of the three.<br>
In this system, whenever an A-unit is active for one unit of time, it gains one unit of value.<br>
We will assume an experiment, initially, in which $N_{a_{r}}$ (the number of stimuli associated to each response) is constant for all responses.<br>
In this case, for the sum system,<br>

\begin{cases} 
    c_1 = 0 \\
    c_2 = (1-P_a)N_e \\
    c_3 = 2P_aw \\
    c_4 \approx 0 \\
\end{cases}

where $w = $ the fraction of responses connected to each A-unit.<br>If the source-sets are disjunct, $w = 1 / N_{R},$ where $N_{R}$ is the number of responses in the system.<br>
For the $\mu$-system,<br>

\begin{cases} 
    c_1 = 0 \\
    c_2 = (1-P_a)N_e \\
    c_3 = 0 \\
    c_4 = 2w \\
\end{cases}

The reduction of $c_2$ to zero gives the $\mu$-system a definite advantage over the $\Sigma$-system.<br>
<br> if $n_{a_r}$ instead of being fixed, is treated as a random variable, so that the number of stimuli associated to each response is drawn separately from some distribution, then the performance of the $\alpha$-system is considerably poorer than the above eqquations indicate.<br>
Under these conditions, the constants for the $\mu$-system are

\begin{cases} 
    c_1 = 0 \\
    c_2 = 1-P_a \\
    c_3 = 2{P_{a}}^{2}{q}^2 [\frac{(wN_{R}-1)^2}{N_{R}-2}] \\
    c_4 = \frac{2(1-P_a)N_{R}}{(1-w_c)N_{A}} \\
\end{cases}

$where$
<div align="center">
    <p align="left">
        $q =$ ratio of $\sigma_{n_{S_r{r}}}$ to $\overleftrightarrow{n}_{S_r}$<br>
        $N_R = $ number of responses in the system<br>
        $N_A = $ number of A-units in the system<br>
        $w_c = $ proportion of A-units common to $R_1$ and $R_2$<br>
    </p>
</div>