<a href="https://colab.research.google.com/github/jamesgeziqian/DS-4440-Homework-F20/blob/master/DS4440_fall2020_custom_layer_exercise.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### DS4440: Implementing an arbitrary `layer`


Imagine that we want to introduce a new *wacky layer* into a feedforward network (say an MLP), such that it will support backprop. This layer will take an input $h^{l-1} \in \mathcal{R}^{d}$ from the preceding layer, and then:

1. Run it through a fixed binary mask $b \in \mathcal{R}^d$; $b_j \in {0, 1}$ for all $j$: $h' = h^{l-1} \odot b$, where $\odot$ is an element-wise (hadamard) product --- possibly [of interest](https://https://numpy.org/doc/stable/reference/generated/numpy.multiply.html).

2. Then let $h^l = 5 \cdot h' + \frac{1}{2} \cdot h'^2$


In [None]:
import numpy as np # all you need here

# An abstract class representing a layer.
class Layer:
  def forward(self, x):
    pass 

  def backward(self, error_backward):
    pass 


class WackyLayer(Layer):

  def __init__(self, b, input_dims):
    self.b = b
    self.h_prime = None

  def forward(self, h_prev):
    self.h_prev = h_prev
    h_prime = self.b * h_prev
    return 5 * h_prime + np.square(h_prime) / 2

  def backward(self, error_backward):
    # error backward is goign to have dims equal to the
    # size of $h$. 

    # how how do changes to $h_prev$ affect $h_l$ in this
    # wacky layer?

    grad = 5 * self.b + self.b * self.h_prev
    return np.multiply(grad, error_backward)
    

In [None]:
d = 5
# generate additional w/ `np.random.randint(2, size=d)`
b = np.array([1, 0, 0, 1, 1])
wl = WackyLayer(b, input_dims=d)

h_prev = np.array([5.0, 10.0, 2.0, -1.0, 4.0])
out = wl.forward(h_prev)

# if silent then all ok.
np.testing.assert_array_equal(out, np.array([37.5,  0. ,  0. , -4.5, 28. ]))

In [None]:
# this is arbitrary/made up here but represents a local
# error signal that would be passed to the model
error_backward = np.array([10.0, 4.0, 3.0, 5.0, 7.0])
back = wl.backward(error_backward)
print(back)

[100.   0.   0.  20.  63.]


### Now, let's implement in `torch` and verify

In [None]:
import torch
import torch.nn as nn 
from torch.autograd import Variable

class WackyLayerTorch(nn.Module):
    
    def __init__(self, b):
      super().__init__()
      self.b = b

    def forward(self, x):
      h_prime = self.b * x
      return 5 * h_prime + (h_prime * h_prime) / 2

In [None]:
b_tensor = torch.from_numpy(b)

print(type(b_tensor))
wlt = WackyLayerTorch(b_tensor)

h_prev_tensor = torch.tensor(h_prev, requires_grad=True) 

out = wlt(h_prev_tensor) 
print(out)

<class 'torch.Tensor'>
tensor([37.5000,  0.0000,  0.0000, -4.5000, 28.0000], dtype=torch.float64,
       grad_fn=<AddBackward0>)


In [None]:
e_t = torch.tensor(error_backward)
out.backward(e_t)

In [None]:
h_prev_tensor.grad

tensor([100.,   0.,   0.,  20.,  63.], dtype=torch.float64)