###**Page Rank**

**$p(t)$** is a probability distribution over pages

$p(t+1) = M*p(t)$ 'Random walk'

$ r_{t+1} = Mr_t$

r is the eigenvector of the transition matrix M (with eigenvalue 1)


Solution of the importance score r:
- compute the eigenvector of the transition matrix M with eigenvaue 1
- use power iteration to compute the eigenvector efficiently

- assign each node an initial page rank
- repeat until convergence $\sum{|{r_i^{t+1}-r_i^t}|} < ϵ$



---



###**Neural Networks**

- Fully-connected neural network: multi-layer perceptron (MLP)
- Convolutional neural network - most widely used for image classification
- Recurrent neural network - most widely used with temporal data (e.g., weather)

Layer 1: $z_i^{(1)} = W^{(0)}x_i$

Layer 2: $z_i^{(2)} = W^{(1)}z_i^{(1)}$

Layer 3: $z_i^{(3)} = W^{(2)}z_i^{(2)}$

...

Layer L: $z_i^{(L)} = W^{(L-1)}z_i^{(L-1)}$


**Activation function**: introduces non-linearity

ex) sigmoid, tanh, ReLu, leaky ReLu


Layer 1: $z_i^{(1)} = W^{(0)}x_i$

Layer 2: $z_i^{(2)} = W^{(1)}h_i^{(1)}$; $h_i^{(1)} = σ(z_i^{(1)})$

Layer 3: $z_i^{(3)} = W^{(2)}h_i^{(2)}$; $h_i^{(2)} = σ(z_i^{(2)})$

...

Layer L: $z_i^{(L)} = W^{(L-1)}h_i^{(L-1)}$; $h_i^{(L-1)} = σ(z_i^{(L-1)})$

Use stochastic gradient descent to learn model parameters

for linear model: $W_{t+1} = W_t - η\frac{\partial L}{\partial W}f(W_t)$

Optimization: use backpropagation!

$h(x) = f(g(x))$

Chain rule:
$\frac{\partial h(x)}{\partial x} = \frac{\partial f(g)}{\partial g} \frac{\partial g(x)}{\partial x}$



In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
  def __init__(self):
    super(Net, self).__init__()

    self.fc1 = nn.Linear(28*28, 256) #linear layer (784 -> 256)
    self.fc2 = nn.Linear(256,128) #linear layer (256 -> 128)
    self.fc3 = nn.Linear(128,10) #linear layer (128 -> 10)

  def forward(self, x):
    h0 = x.view(-1, 28*28) #input layer
    h1 = F.relu(self.fc1(h0)) #hidden layer 1
    h2 = F.relu(self.fc2(h1)) #hidden layer 2
    h3 = self.fc3(h2) #output layer

    return h3

# loss function
criterion = nn.CrossEntropyLoss()

# optimizer
optimizer = torch.optim.SGD(model.parameters(), lr = args['lr'])


In [None]:
# train the model
for batch_idx, (data, target) in enumerate(train_loader):
  data, target = data.cuda(), target.cuda()

  #forward pass
  output = model(data)
  #backward pass
  loss = criterion(output, target)

  #compute gradients
  optimizer.zero_grad()
  loss.backward()

  #update parameters
  optimizer.step()

  #print loss periodically
  if batch_idx % args['log interval'] == 0:
    print("Train epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}".format(
        epoch, batch_idx * len(data), len(train_loader.dataset), 100. * batch_idx / len(train_loader), loss.item()))

In [None]:
# test the model
test_loss = 0
correct = 0

for data, target in test_loader:
  data, target = data.cuda(), target.cuda()

  output = model(data)
  test_loss += criterion(output, target)
  pred = output.data.max(1, keepdim = True)[1]
  correct += pred.eq(target.data.view_as(pred)).long().cpu().sum()

test_loss /= len(test_loader.dataset)

print("\nTest set | Average Loss: {:.4f}, Accuracy: {}/{} ({:.0f})\n".format(
    test_loss, correct, len(test_loader.dataset), 100. * correct/ len(test_loader.dataset)))
