# Demystifying Neural Networks 

---

# Pulsars

In [2]:
import numpy as np
import pandas as pd
columns = [
    'ip_mean',
    'ip_std',
    'ip_kurtosis',
    'ip_skewness',
    'dmsnr_mean',
    'dmsnr_std',
    'dmsnr_kurtosis',
    'dmsnr_skewness',
    'label',
]
df = pd.read_csv('./pulsars_raw.csv', names=columns)
print(sum(df.label == 1), sum(df.label == 0), len(df), sum(df.label == 1)/len(df))

1639 16259 17898 0.09157447759526204


DM-SNR: Dispersion Measure, Signal to Noise Ratio.
Where dispersion means a delay in the lower frequencies, due to interaction with free electrons.
In other words, higher frequencies arrive faster then lower frequencies,
the dispersion is in the time of different frequency arrival.

IP: Integrated Profile.
The profile (amount of light) of a pulse is often very different between pulses.
Interated simply means we baseline the pulseso we have an intensity of zero
anywhere esle but the pulses.
It is expected that pulsars have high variance (and standard deviation)
between pulses (between integrated profiles of pulses).

In [234]:
X, y = df.values[:, :-1], df['label'].values
y_1 = np.flatnonzero(y)
y_0 = np.flatnonzero(y - 1)
y_0 = np.random.choice(y_0, len(y_1), replace=False)
y_idx = np.sort(np.concatenate((y_0, y_1)))
y = y[y_idx]
X = X[y_idx]
print(sum(y == 1), sum(y == 0), len(y), sum(y == 1)/len(y))
print(X.shape, y.shape)

1639 1639 3278 0.5
(3278, 8) (3278,)


![autograd.svg](attachment:autograd.svg)

<div style="text-align:right;font-size:0.7em;">autograd.svg</div>

![perceptron.svg](attachment:perceptron.svg)

<div style="text-align:right;font-size:0.7em;">perceptron.svg</div>

In [235]:
y = np.c_[y == 0, y == 1].astype(np.float)
print(y.shape)

(3278, 2)


![ann.svg](attachment:ann.svg)

<div style="text-align:right;font-size:0.7em;">ann.svg</div>

In [243]:
import torch
import torch.nn as nn


class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(8, 25)
        self.fc2 = nn.Linear(25, 10)
        self.fc3 = nn.Linear(10, 2)

    def forward(self, x):
        x = torch.tanh(self.fc1(x))
        x = torch.tanh(self.fc2(x))
        x = torch.tanh(self.fc3(x))
        return x


net = Net()
criterion = nn.MSELoss()
learning_rate = 0.01
batch = 100
for i in range(1000):
    idx = np.random.randint(0, len(y), batch)
    X_sample, y_sample = torch.Tensor(X[idx]), torch.Tensor(y[idx])

    net.zero_grad()
    y_hat = net(X_sample)
    loss = criterion(y_hat, y_sample)
    loss.backward()
    if 0 == (i+1)%50:
        print(loss)
    for f in net.parameters():
        f.data.sub_(f.grad.data * learning_rate)

tensor(0.1477, grad_fn=<MseLossBackward>)
tensor(0.0903, grad_fn=<MseLossBackward>)
tensor(0.1428, grad_fn=<MseLossBackward>)
tensor(0.1152, grad_fn=<MseLossBackward>)
tensor(0.0909, grad_fn=<MseLossBackward>)
tensor(0.0575, grad_fn=<MseLossBackward>)
tensor(0.0951, grad_fn=<MseLossBackward>)
tensor(0.0916, grad_fn=<MseLossBackward>)
tensor(0.0996, grad_fn=<MseLossBackward>)
tensor(0.0817, grad_fn=<MseLossBackward>)
tensor(0.0888, grad_fn=<MseLossBackward>)
tensor(0.0841, grad_fn=<MseLossBackward>)
tensor(0.0783, grad_fn=<MseLossBackward>)
tensor(0.0684, grad_fn=<MseLossBackward>)
tensor(0.0553, grad_fn=<MseLossBackward>)
tensor(0.1061, grad_fn=<MseLossBackward>)
tensor(0.0544, grad_fn=<MseLossBackward>)
tensor(0.0695, grad_fn=<MseLossBackward>)
tensor(0.0511, grad_fn=<MseLossBackward>)
tensor(0.0533, grad_fn=<MseLossBackward>)


In [267]:
out = net(torch.Tensor(df.values[:, :-1]))
y_hat = out.argmax(dim=1).numpy()
y_true = df['label'].values
print(sum(y_hat[y_true == 1] == y_true[y_true == 1])/sum(y_true == 1))
print(sum(y_hat[y_true == 0] == y_true[y_true == 0])/sum(y_true == 0))
print(sum(y_hat == y_true)/len(y_true))

0.8828553996339231
0.9479672796604958
0.942004693261817


# Equations

All the network above is doing is matrix multiplication.
We used a network with three layers: one with 25 perceptrons,
one with 10 perceptrons and one with 2 perceptrons (pulsar or not pulsar).
We also have 8 features from our dataset.
THis means that we have the following matrices:

$$
W_{8 x 20}, W_{B\: 20 x 1},
W'_{20 x 10}, W'_{B\: 10 x 1},
W'_{10 x 2}, W'_{B\: 2 x 1}
$$

We can see these matrices in the network parameters.

In [246]:
ws = []
for f in net.parameters():
    ws.append(f.data.numpy())
print(list(map(lambda x: x.shape, ws)))

[(25, 8), (25,), (10, 25), (10,), (2, 10), (2,)]


Another thing that we did use above is the activation function (*tanh()*).
If we multiply an input vector through the matrices
and apply the activation function after each multiplication
we should get the same output as `pytorch` gives us.

In [247]:
print(net(torch.Tensor(X[0, :])))

tensor([0.8372, 0.1289], grad_fn=<TanhBackward>)


In [264]:
W = ws[::2]
Wb = ws[1::2]
vector = X[0, :]
for w, b in zip(W, Wb):
    vector = np.tanh(w @ vector + b)
print(vector)

[0.83724581 0.12892361]


There is a little more to it though.

We normally think of data as rows meaning samples and columns meaning features.
That is all good but in our equation above and in the code we used the
vector as a column vector.
In other words, we had one column and eight rows.
`numpy` was kind enough to figure it out and perform the transpose for us.

Now, we normally do not feed a single sample into a neural network,
instead we feed batches (above we used a batch of 100 sample at a time).
The `pytorch` network performs all the transformations needed internally,
we can feed it several rows and it predicts all of them.

In [254]:
print(net(torch.Tensor(X[:3, :])))

tensor([[0.8372, 0.1289],
        [0.8787, 0.0157],
        [0.8456, 0.1699]], grad_fn=<TanhBackward>)


Now, if we are going to feed several samples into our code above,
we will need to perform the transposes ourselves.
Let's say we feed 3 samples and note that:

$$
\hat{Y}_{2 \times 3} = tanh(W''_{2 \times 10} \times
    tanh(W'_{10 \times 25} \times
        tanh(W_{25 \times 8} \times X_{8 \times 3} + W_{B\: 25 \times 1})
    + W'_{B\: 10 \times 1})
+ W''_{B\: 2 \times 1})
$$


In [265]:
W = ws[::]
Wb = ws[1::2]
vector = X[:3, :].T
for w, b in zip(W, Wb):
    vector = np.tanh(w @ vector + b[:, np.newaxis])
print(vector.T)

[[0.83724581 0.12892361]
 [0.87872144 0.01567084]
 [0.84555723 0.16988756]]
