<center>
    <img src="https://www.ucalgary.ca/themes/ucalgary/ucws_theme/images/UCalgary.svg" width='30%'>
</center>


[comment]: <> (The following line is for the LECTURE title)
<p style="text-align:left;"><font size='6'><b> Deep Learning - Lab </b></font></p>

[comment]: <> (The following line is for the TOPIC of the week)
<p style="text-align:left;"><font size='4'><b> Pytorch Exercises Part 1 </b></font></p>

---



Let's take a sneak peek at the some digital logic gates for this exercise. 

\


Given what we know about an MLP's ability to approximate functons that are not linearly seperable we should quite easily be able to solve XOR without backpropogation, and just simply dial in the correct weights and biases.

\

Before we head down that line of investigation let's go ahead and build what we might need from scratch. This excersise will have you working in two scenarios:


- Scenario one will see us solve the XOR problem with MSE loss, A Recitified Linear Unit in our hidden layer to add non-linearity and an output dimension of 1. \begin{align}
Relu(x) & = max(0,x)
\end{align}  

- Scenario two demonstrates that we can also solve the problem with Binary Cross entropy and simply Sigmoid as our output activation without the need for a Rectified Linear Unit.

\begin{align}
y=\dfrac{1}{1+e^{-x}}
\end{align}



Both scenarios will then be carried forward to our upcoming lab where we'll begin to add some more bells and whistles.



In [7]:
#Digital logic gates

import numpy as np

X = np.array([[0, 0], [0, 1], [1,0], [1, 1]])

gates = {'OR': np.array([0,1,1,1]),
         'AND': np.array([0,0,0,1]),
         'XOR': np.array([0,1,1,0])}

##Part 1

- Convert the input data above from numpy matrices and vectors to tensors.
- Convert the XOR ground truths to tensor.
- Cast the resultant tensors to float32

In [8]:
####Your code here###
import torch
X = torch.from_numpy(X).float()
XOR_labels = torch.tensor(gates['XOR'], dtype=torch.float32)


In [10]:
print(X, XOR_labels)

tensor([[0., 0.],
        [0., 1.],
        [1., 0.],
        [1., 1.]]) tensor([0., 1., 1., 0.])


##Part 2

Let's put aside backpropogation for now and create randomly initialsied weights and biases and construct a feedforward network with the following scenarios (see below for further conditions:

- A single hidden layer with two hidden neurons for the whole batch, and an output dimension of 1.

\

----

Conditions:

- don't worry about `torch.Variable` for this challenge as we don't need to keep track of gradients etc here. however all operations should be constructed from the ground up with `torch.tensor` etc.

- It's also worth bearing in mind that while Pytorch ordinarily takes care of transposing our weights matrix for our hidden layers, we'll need to do this manually below confirm with those. Given this your hidden dense layer should look like:

\begin{align}
h= g(xW^T + c)
\end{align}




\

---

Note that your output should be a vector consisting of four predictions



In [None]:
from IPython.display import IFrame
IFrame(src='https://drive.google.com/file/d/12r-yy2lYIpn4v0ULf8jFMDygPyKaNGSj/preview', width=500, height=300)

In [23]:
####your code here###

#functions
def feed_forward(X, weights, bias):
    
    return torch.matmul(X, weights.T) + bias


#weights
weights_hidden = torch.randn(2, 2)
bias_hidden = torch.randn(2,)

weights_out = torch.randn(1, 2)
bias_out = torch.randn(1,)

#number of out nodes, one bias per neuron
# print(feed_forward(X, weights, bias))

#hidden layer
h = feed_forward(X, weights_hidden, bias_hidden)
print(h.shape)

# #output
out = feed_forward(h, weights_out, bias_out)
print(out.shape)


torch.Size([4, 2])
torch.Size([4, 1])


- A single hidden layer with > 2 hidden neurons for the whole batch, and an output dimension of 1.

In [None]:
from IPython.display import IFrame
IFrame(src='https://drive.google.com/file/d/10ZR_eagfu9peGsuMvIywpRh4KGrQV6ed/preview', width=500, height=300)

In [26]:
#your code here 
hidden_channels = 15
#weights
weights_hidden_1 = torch.randn(hidden_channels, 2)
bias_hidden_1 = torch.randn(hidden_channels,)

#number of out nodes, one bias per neuron
weight_out_1 = torch.randn(1, hidden_channels)
bias_out_1 = torch.randn(1,)

#hidden
h = feed_forward(X, weights_hidden_1, bias_hidden_1)
print(h)

#output
out = feed_forward(h, weight_out_1, bias_out_1)
print(out)

tensor([[-0.1639,  2.1796,  0.3649, -1.1730,  0.1114, -1.1045,  0.3718,  0.9008,
          0.0292, -0.6376, -0.3279, -0.4540,  1.4255, -0.7676,  1.4256],
        [-0.6668,  0.9973,  0.2383, -1.4899, -0.6074, -1.6783, -0.0960,  1.4370,
         -0.8546, -0.2560,  0.8045, -0.3894,  1.6665, -1.7804,  2.0974],
        [-2.0571,  1.2638,  3.1616, -0.4684,  0.5263, -1.2635,  0.1659,  0.2038,
         -0.5672, -2.4325, -0.5962, -0.3910,  1.3754, -0.4313,  2.0015],
        [-2.5599,  0.0816,  3.0350, -0.7852, -0.1924, -1.8373, -0.3019,  0.7400,
         -1.4510, -2.0509,  0.5362, -0.3264,  1.6164, -1.4441,  2.6733]])
tensor([[-0.8635],
        [-0.1362],
        [-3.7297],
        [-3.0025]])


- Two hidden layers with 6 neuons respectively.

In [27]:
#your code here 

#weights
w1 = torch.randn(6, 2) 
w2 = torch.randn(6, 6)
w3 = torch.randn(1, 6)

b1 = torch.randn(6, )
b2 = torch.randn(6, )
b3 = torch.randn(1, )


#number of out nodes, one bias per neuron


#hidden, output
h1 = feed_forward(X, w1, b1)
h2 = feed_forward(h1, w2, b2)
out = feed_forward(h2, w3, b3)



print(h1)
print(w3)
print(out)



tensor([[-1.6857,  1.3479,  1.0660,  0.7283,  0.9960,  1.6666],
        [-1.9162,  2.9479,  1.1244, -0.3388,  1.0253,  2.8085],
        [-1.0342,  1.7759,  1.6040,  0.0473,  0.7141,  1.0931],
        [-1.2647,  3.3759,  1.6624, -1.0199,  0.7434,  2.2350]])
tensor([[-0.9004, -0.3883, -0.1305,  0.3576, -0.0177, -1.9505]])
tensor([[ 4.5972],
        [12.7332],
        [ 2.4620],
        [10.5981]])


In [28]:
w1.t().shape

torch.Size([2, 6])

Conditions:

don't worry about `torch.Variable` for this challenge as we don't need to keep track of gradients etc here. however all operations should be constructed from the ground up with `torch.tensor` etc.

---

Note that your output should be a vector consisting of four predictions


#Part 3

Construct the above feedforward networks with the `nn.Linear` method! but this time let's wrap our output with a Sigmoid output activation for use with `BCEloss` in our upcoming lab.

This time around the problem can be solved with an output dimension of one too.

In [29]:
#2 hidden
import torch.nn as nn
lin1= nn.Linear(2, 2)
lin2= nn.Linear(2, 1)

hidden = lin1(X)
out = lin2(hidden)
print(torch.sigmoid(out))


#6 hidden
lin1= nn.Linear(2, 6)
lin2= nn.Linear(6, 1)

hidden = lin1(X)
out = lin2(hidden)

print(torch.sigmoid(out))


#multiple hidden
lin1= nn.Linear(2, 6)
lin2= nn.Linear(6, 6)
lin3= nn.Linear(6, 1)

hidden = lin1(X)
hidden2 = lin2(hidden)

out = lin3(hidden2)

print(torch.sigmoid(out))
print(np.where(torch.sigmoid(out) > 0.5, 1, 0))

tensor([[0.5273],
        [0.6285],
        [0.6293],
        [0.7203]], grad_fn=<SigmoidBackward0>)
tensor([[0.4941],
        [0.4382],
        [0.3731],
        [0.3222]], grad_fn=<SigmoidBackward0>)
tensor([[0.4932],
        [0.4826],
        [0.4562],
        [0.4457]], grad_fn=<SigmoidBackward0>)
[[0]
 [0]
 [0]
 [0]]


In [30]:
print(lin1.weight.shape)
print(lin1.bias)


torch.Size([6, 2])
Parameter containing:
tensor([ 0.2834, -0.5570,  0.2926, -0.0404, -0.0508,  0.4108],
       requires_grad=True)


In [31]:
lin2.weight.t().shape

torch.Size([6, 6])

# Part 4

Last of all go ahead and check out Ian Goodfellow and Yoshua Bengio and Aaron Courville's [Deep Learning](https://www.deeplearningbook.org/), a fantastic overview of the mathematical foundations of Deep Learning.

A html copy is available via the link above (purchasing the textbook is also highly recommended) and navigate to **Part II: Modern Practical Deep Networks - Deep Feed Forward Networks**.

----

\
Read through this chapter and complete the folowing:

- Discover the correct weights and bias vector for a 2 neuron single hidden layer network.
- Utlize the information you've learnt in either a custom layer to output the correct predictions.
- Calculate the loss to demonstrate this.

Note that much like the first scenario this can be solved with a nonlinearity in the hidden layer rather than employing sigmoid on output.


In [40]:
###your code here###

w1 = torch.ones((2, 2), dtype=torch.float32)
w2 = torch.tensor([1, 2], dtype=torch.float32)

c = torch.tensor([0, -1], dtype=torch.float32)
b = torch.tensor([0], dtype=torch.float32)

In [38]:
input = X

out = XOR_labels

print(input)
print(out)

tensor([[0., 0.],
        [0., 1.],
        [1., 0.],
        [1., 1.]])
tensor([0., 1., 1., 0.])


In [42]:
h1 = feed_forward(input, w1, c)
output = feed_forward(h1, w2, b)

print(output)

tensor([-2.,  1.,  1.,  4.])
