Below we will try and fit a Logisitc Regression Model step by step for the XOR problem.
For this model, we have $x_1$ and $x_2$ are either 0/1 each and $y = x_1 + x_2 - 2x_1x_2$. Notice that this is True (1) if $x_1 = 1$ and $x_2 = 0$ OR $x_1 = 0$ and $x_2 = 1$; $y$ is zero otherwise.

In [1]:
import torch.nn as nn
import torch


In [None]:
x_data = [[0, 0], [0, 1], [1, 0], [1, 1]]
y_data = [[0], [1], [1], [0]]
x_data = torch.Tensor(x_data)
y_data = torch.Tensor(y_data)

In [None]:
# Define each tensor to be 1x1 and have them require a gradient for tracking; these are parameters
dim = 1,1
beta_1 = torch.rand(dim,requires_grad=True)
beta_2 = torch.rand(dim,requires_grad=True)
alpha  = torch.rand(dim,requires_grad=True)

In [None]:
from torch.autograd import grad
lr = 0.01

for epoch in range(10):
  for x, y in zip(x_data, y_data):

    # Have z be beta_2*x[0] + beta_1*x[1] + alpha
    z = beta_2*x[0] + beta_1*x[1] + alpha

    # Push z through a nn.Sigmoid layer to get the p(y=1)
    a = torch.sigmoid(z)

    # Write the loss manually between y and a
    loss = -1*((y*torch.log(a)) + ((1-y)*torch.log(1-a)))

    # Get the loss gradients; the gradients with respect to alpha, beta_1, beta_2
    loss.backward()
    grad_beta2 = beta_2.grad
    grad_beta1 = beta_1.grad
    grad_alpha = alpha.grad

    # Manually update the gradients
    # What we do below is wrapped within this clause because weights have required_grad=True but we don't need to track this in autograd
    with torch.no_grad():
        # Do an update for each parameter
          beta_2 = beta_2 - lr*grad_beta2
          beta_1 = beta_1 - lr*grad_beta1
          alpha  = alpha  - lr*grad_alpha

        # Manually zero the gradients after updating weights
          beta_2.grad = None
          beta_1.grad = None
          alpha.grad  = None
          beta_2.requires_grad = True
          beta_1.requires_grad = True
          alpha.requires_grad = True

  # Manually get the accuracy of the model after each epoch
  with torch.no_grad():
    print(f'Epoch: {epoch}')
    y_pred = []
    loss = 0.0

    for x, y in zip(x_data, y_data):
      # Get z
      z = beta_2*x[0] + beta_1*x[1] + alpha

      # Get a
      a = torch.sigmoid(z)


      # Get the loss
      loss += -1*((y*torch.log(a)) + ((1-y)*torch.log(1-a)))

      # Get the prediction given a
      y_pred =y_pred +[1 if a >= 0.5 else 0]

    #Get the current accuracy over 4 points; make this a tensor
    y_pred=torch.FloatTensor(y_pred)

    accuracy =torch.sum(torch.eq(y_pred,y_data.T))/4
    loss = loss / 4

    #Print the accuracy and the loss
    #You want the item in the tensor thats 1x1
    print('Loss: {} Accuracy: {}'.format(loss.item() ,accuracy))

Epoch: 0
Loss: 0.7420799136161804 Accuracy: 0.5
Epoch: 1
Loss: 0.7406654953956604 Accuracy: 0.5
Epoch: 2
Loss: 0.7392904162406921 Accuracy: 0.5
Epoch: 3
Loss: 0.7379539012908936 Accuracy: 0.5
Epoch: 4
Loss: 0.7366548776626587 Accuracy: 0.5
Epoch: 5
Loss: 0.7353924512863159 Accuracy: 0.5
Epoch: 6
Loss: 0.734165608882904 Accuracy: 0.5
Epoch: 7
Loss: 0.7329734563827515 Accuracy: 0.5
Epoch: 8
Loss: 0.7318153381347656 Accuracy: 0.5
Epoch: 9
Loss: 0.730690062046051 Accuracy: 0.5


Exercise 1: Create a 2D tensor and then add a dimension of size 1 inserted at the 0th axis.



In [None]:
dim = 2,2
x = torch.rand(dim).unsqueeze(0)
x

tensor([[[0.0598, 0.1709],
         [0.5572, 0.6326]]])

Exercise 2: Remove the extra dimension you just added to the previous tensor.



In [None]:
x = x.squeeze(0)
x

tensor([[0.0598, 0.1709],
        [0.5572, 0.6326]])

Exercise 3: Create a random tensor of shape 5x3 in the interval [3, 7)



In [None]:
dim = 5,3
start, end = 3,7
diff = end - start
x = torch.rand(dim)
x = start + diff*x
x

tensor([[4.8692, 4.2555, 3.6053],
        [3.3966, 5.2864, 5.9531],
        [6.2919, 5.4445, 5.0535],
        [3.3911, 6.2098, 6.6665],
        [6.5670, 5.6480, 4.0700]])

Exercise 4: Create a tensor with values from a normal distribution (mean=0, std=1).



In [None]:
dim = 5,3
mu,sigma = 0,1
x = torch.normal(mu,sigma, size=dim)
x

tensor([[ 0.8529,  0.0502,  0.5259],
        [-0.7046,  0.3235, -0.2141],
        [ 1.4503,  0.9467, -0.0790],
        [-0.3112,  1.3725,  0.4459],
        [ 0.6614,  0.1794,  0.4289]])

exercise 5: Retrieve the indexes of all the non zero elements in the tensor torch.Tensor([1, 1, 1, 0, 1]).



In [None]:
x = torch.Tensor([1, 1, 1, 0, 1]).nonzero()
x

tensor([[0],
        [1],
        [2],
        [4]])

Exercise 6: Create a random tensor of size (3,1) and then horizonally stack 4 copies together.



In [None]:
dim = 3,1
desired_dim = 4
x = torch.rand(dim).expand(dim[0] ,desired_dim)
x

tensor([[0.2509, 0.2509, 0.2509, 0.2509],
        [0.1755, 0.1755, 0.1755, 0.1755],
        [0.7913, 0.7913, 0.7913, 0.7913]])

Exercise 7: Return the batch matrix-matrix product of two 3 dimensional matrices (a=torch.rand(3,4,5), b=torch.rand(3,5,4)).



In [None]:
dim_a = 3,4,5
dim_b = 3,5,4
a = torch.rand(dim_a)
b = torch.rand(dim_b)
x = torch.bmm(a, b)
x

tensor([[[1.4494, 1.0602, 1.0511, 1.0479],
         [1.3060, 1.1231, 1.5504, 1.6478],
         [2.6341, 1.7840, 2.8105, 2.8984],
         [1.3131, 0.7990, 1.5453, 1.7199]],

        [[0.9078, 0.8409, 1.5928, 1.2497],
         [1.7301, 1.3984, 2.2090, 1.5277],
         [0.4239, 0.4361, 0.7011, 0.3213],
         [1.5538, 1.1826, 1.9572, 0.9656]],

        [[0.7172, 0.8004, 1.0903, 0.8760],
         [0.7382, 1.1985, 1.2859, 1.3656],
         [0.6102, 1.1650, 0.7028, 1.0886],
         [0.3383, 1.0531, 0.7312, 1.1462]]])

Exercise 8: Return the batch matrix-matrix product of a 3D matrix and a 2D matrix (a=torch.rand(3,4,5), b=torch.rand(5,4)).



In [None]:
dim_a = 3,4,5
dim_b = 5,4
a = torch.rand(dim_a)
b = torch.rand(dim_b)
b_exp = b.unsqueeze(0)
desired_dim = [dim_a[0]]+list(b_exp.shape)
b_exp = b_exp.expand(desired_dim).squeeze(1)
x = torch.bmm(a,b_exp)
x

tensor([[[1.5072, 0.9352, 1.9807, 1.7094],
         [1.1816, 0.9497, 2.0066, 1.6808],
         [1.1476, 0.6570, 1.8228, 1.2792],
         [1.1831, 1.0358, 2.1361, 2.0779]],

        [[1.2094, 0.8068, 2.0547, 1.4848],
         [1.4514, 0.8914, 2.2731, 1.6378],
         [0.7146, 0.6993, 1.4329, 1.3858],
         [1.3151, 0.8394, 1.9992, 1.6389]],

        [[1.2578, 0.7881, 1.6333, 1.6176],
         [1.2229, 0.9364, 1.7324, 1.5681],
         [1.3912, 0.6937, 1.9344, 1.3331],
         [1.5092, 1.2223, 2.6781, 2.1639]]])

Exercise 9: Create a 1x1 random tensor and get the value inside of this tensor as a scalar. No tensor.

In [None]:
dim = 1,1
x = torch.rand(dim).item()
x

0.8007276058197021

Exercise 10: Create a 2x1 tensor and have it require a gradient. Have $x$, this tensor, hold [-2, 1]. Set $y=x_1^2 + x_2^2$ and get the gradient of y wirht respect to $x_1$ and then $x_2$.

In [None]:
x = torch.tensor([[-2.0],[1.0]], requires_grad=True)
y = x[0]**2 + x[1]**2
y.backward()
print('d/dx1:',x.grad[0].item(),'|  d/dx2:',x.grad[1].item())

d/dx1: -4.0 |  d/dx2: 2.0


Exercise 11: Check if cuda is available (it shuld be if in the Runtime setting for colab you choose the GPU). If it is, move $x$ above to a CUDA device. Create a new tensor of the same shape as $x$ and put it on the cpu. Try and add these tensors. What happens. How do you fix this?

In [2]:
print(torch.cuda.is_available())

True


In [None]:
gpu_device = torch.device("cuda")
x1 = x.to(gpu_device)

cpu_device = torch.device("cpu")
x2 = torch.tensor([[4.0],[-1.0]], requires_grad=True)
x2 = x2.to(cpu_device)

x1+x2

RuntimeError: ignored

## Answer:
The problem is that the tensors are on different devices, and all variables needed for the single computation must be on the same device. This causes an error and the computation cannot be performed. We could fix this by moving both variables x1 and x2 to the same device, either both on the GPU or both on the  CPU.