# Weight-Space Symmetry

According to Bishop,
> Multiple distinct choices for the weight vector $\bf w$ will give rise to the same mapping functioins from inputs to outputs

This will be useful for Bayesian model comparison

In [19]:
import numpy as np
from numpy.random import randn, seed

Consider, for example, a two layered neural network with $\tanh$ activation function, input vector $x$ and a set of weights ${\bf w} = ({\bf w}^{(1)}, {\bf w}^{(2)})$.

$$
    x = \begin{bmatrix}
    x_1 \\
    x_2
    \end{bmatrix}
$$

$$
    {\bf w}^{(1)} = \begin{bmatrix}
    w_{11}^{(1)} & w_{12}^{(1)}\\
    w_{21}^{(1)} & w_{22}^{(1)}\\
    w_{31}^{(1)} & w_{32}^{(1)}
    \end{bmatrix}
$$

$$
    {\bf w}^{(2)} = \begin{bmatrix}
    w_{11}^{(1)} & w_{12}^{(1)} & w_{13}^{(1)} \\
    \end{bmatrix}
$$

## Tanh symmetry

In [38]:
seed(314)
W1 = randn(3, 2)
W2 = randn(1, 3)
x = np.array([[1, 2]]).T

Feedforwarding the network, we obtain the hidden units for the hidden layer as follows:

In [39]:
a1 = W1 @ x
z1 = np.tanh(a1)

Next, we obtain the value of the output layer as follows:

In [40]:
a2 = W2 @ z1
y = np.tanh(a2)
print(y)

[[-0.73209407]]


The weight-space symmetry works as follows: we can obtain the same result for $\hat y$ by _changing_ the signs of a particular group of weights. 

Considering our two-layered neural network. Letting ${\bf \hat w}^{(1)} = -{\bf w}^{(1)}$ and ${\bf \hat w}^{(2)} = -{\bf w}^{(2)}$, we obtain the same output value for different values in the network.

In [41]:
W1p = -W1
W2p = -W2
x = np.array([[1, 2]]).T

In [42]:
a1 = W1p @ x
z1 = np.tanh(a1)

In [43]:
a2 = W2p @ z1
y = np.tanh(a2)
print(y)

[[-0.73209407]]


> Interchanging the values of all the weights [...] leading beath into and out of hidden hidden unit with the corresponding weights [...] associated with a different hidden unit

In [113]:
a1 = W1[::-1] @ x
z1 = np.tanh(a1)

In [114]:
a2 = W2[:, ::-1] @ z1
y = np.tanh(a2)
print(y)

[[-0.73209407]]
