# Exercise 1
#### http://neuralnetworksanddeeplearning.com/chap1.html#exercise_263792
##### done by: github.com/famu8 | Fernando Augusto Marina Urriola


### Question
An extreme version of gradient descent is to use a mini-batch size of just 1. That is, given a training input, x
, we update our weights and biases according to the rules wk→w′k=wk−η∂Cx/∂wk
 and bl→b′l=bl−η∂Cx/∂bl
. Then we choose another training input, and update the weights and biases again. And so on, repeatedly. This procedure is known as online, on-line, or incremental learning. In online learning, a neural network learns from just one training input at a time (just as human beings do). Name one advantage and one disadvantage of online learning, compared to stochastic gradient descent with a mini-batch size of, say, 20
.

#### Solution
Advantage of online learning compared to stochastic gradient descent with a mini-batch size of 20:

**Fast Adaptation to Changing Data**: Online learning allows a neural network to quickly adapt to changing data patterns. Since it updates the weights and biases after processing each individual training input, it can respond rapidly to new information or evolving trends in the data. This makes it well-suited for scenarios where the data distribution is non-stationary or where new data is continuously arriving.

Disadvantage of online learning compared to stochastic gradient descent with a mini-batch size of 20:

**High Variance in Updates**: Online learning can suffer from high variance in weight updates since it updates the model's parameters based on single data points. This variance can lead to erratic training behavior, making convergence slower and potentially causing the model to get stuck in local minima. In contrast, stochastic gradient descent with a mini-batch size of 20 or more provides smoother updates, which can help in reaching a more stable and better-performing solution.

# Credits

This is heavily based on https://github.com/pytorch/tutorials

# What is PyTorch?

> **NOTE** In the last part of this lab cuda is used. If you have a cuda enabled machine, read the README.md in the root of this repo on how to use nvidia-docker.


It’s a Python based scientific computing package targeted at two sets of
audiences:
-  A replacement for numpy to use the power of GPUs
-  a deep learning research platform that provides maximum flexibility
   and speed


# Getting Started

In this lab you will get a quick start on what pytorch is and how to use it.

## 1. Tensors

Tensors are similar to numpy’s ndarrays, with the addition being that
Tensors can also be used on a GPU to accelerate computing.

In [1]:
import torch

Construct a 5x3 matrix, uninitialized

In [30]:
x = torch.Tensor(5, 3)
print(x)
#random numbers

tensor([[1.4443e-30, 7.9454e-43, 1.4443e-30],
        [7.9454e-43, 1.4443e-30, 7.9454e-43],
        [1.4443e-30, 7.9454e-43, 1.4443e-30],
        [7.9454e-43, 1.4443e-30, 7.9454e-43],
        [1.4443e-30, 7.9454e-43, 1.4443e-30]])


In [31]:
x = torch.rand(5, 3)
print(x)
#random numbers between 0 and 1

tensor([[0.3660, 0.0452, 0.7618],
        [0.4303, 0.0403, 0.5963],
        [0.3653, 0.3640, 0.3383],
        [0.4470, 0.6776, 0.2404],
        [0.8408, 0.0532, 0.4989]])


Get its size

In [32]:
print(x.size())

torch.Size([5, 3])


**NOTE**: `torch.Size` is in fact a tuple, so it supports the same operations that a tuple supports.

In [33]:
x[1:3] = 2
print(x)

tensor([[0.3660, 0.0452, 0.7618],
        [2.0000, 2.0000, 2.0000],
        [2.0000, 2.0000, 2.0000],
        [0.4470, 0.6776, 0.2404],
        [0.8408, 0.0532, 0.4989]])


# Assignment

Make use of the pytorch docs <http://pytorch.org/docs/torch>
1. Make a tensor of size (2, 17)
2. Make a torch.FloatTensor of size (3, 1)
3. Make a torch.LongTensor of size (5, 2, 1)
  - fill the entire tensor with 7s
4. Make a torch.ByteTensor of size (5,)
  - fill the middle 3 indices with ones such that it records [0, 1, 1, 1, 0]

#### Solution

In [35]:
a = torch.Tensor(2,17)
print(a)

tensor([[1.0561e-38, 1.0653e-38, 4.1327e-39, 8.9082e-39, 9.8265e-39, 9.4592e-39,
         1.0561e-38, 1.0653e-38, 1.0469e-38, 9.5510e-39, 1.0102e-38, 8.4490e-39,
         1.0102e-38, 1.0469e-38, 1.0102e-38, 9.6429e-39, 8.4490e-39],
        [1.0745e-38, 1.0102e-38, 9.6429e-39, 9.2756e-39, 8.4490e-39, 1.1112e-38,
         8.9082e-39, 9.6429e-39, 8.4490e-39, 1.0194e-38, 1.0745e-38, 9.2755e-39,
         8.4490e-39, 9.6429e-39, 9.2755e-39, 1.0469e-38, 1.0469e-38]])


In [57]:
b= torch.FloatTensor(3,1)
print(b)

tensor([[0.],
        [0.],
        [0.]])


In [58]:
c = torch.LongTensor(5,2,1)
print(c.shape)

torch.Size([5, 2, 1])


In [59]:
c = torch.full((5,2,1),7,dtype=torch.int64)
print(c)

tensor([[[7],
         [7]],

        [[7],
         [7]],

        [[7],
         [7]],

        [[7],
         [7]],

        [[7],
         [7]]])


In [60]:
d = torch.ByteTensor(5,)
print(d)

tensor([0, 0, 0, 0, 0], dtype=torch.uint8)


In [63]:
d[1:4]=1
print(d)

tensor([0, 1, 1, 1, 0], dtype=torch.uint8)


## 2. Operations
There are multiple syntaxes for operations. Let's see addition as an example:

### 2.1 Addition: syntax 1

In [65]:
y = torch.rand(5, 3)
print(x + y)

tensor([[0.6506, 0.2786, 1.4628],
        [2.9307, 2.9672, 2.1787],
        [2.2699, 2.7400, 2.5253],
        [0.7027, 1.0780, 0.8871],
        [1.1721, 0.3413, 0.5439]])


### 2.2 Addition: syntax 2

In [66]:
print(torch.add(x, y))

tensor([[0.6506, 0.2786, 1.4628],
        [2.9307, 2.9672, 2.1787],
        [2.2699, 2.7400, 2.5253],
        [0.7027, 1.0780, 0.8871],
        [1.1721, 0.3413, 0.5439]])


### 2.3 Addition: giving an output tensor

In [67]:
result = torch.Tensor(5, 3)
torch.add(x, y, out=result)
print(result)

tensor([[0.6506, 0.2786, 1.4628],
        [2.9307, 2.9672, 2.1787],
        [2.2699, 2.7400, 2.5253],
        [0.7027, 1.0780, 0.8871],
        [1.1721, 0.3413, 0.5439]])


### 2.4 Addition: in-place

adds `x`to `y`

In [68]:
y.add_(x)
print(y)

tensor([[0.6506, 0.2786, 1.4628],
        [2.9307, 2.9672, 2.1787],
        [2.2699, 2.7400, 2.5253],
        [0.7027, 1.0780, 0.8871],
        [1.1721, 0.3413, 0.5439]])


**NOTE**: Any operation that mutates a tensor in-place is post-fixed with an `_`. For example: `x.copy_(y)`, `x.t_()`, will change `x`.

You can use standard numpy-like indexing with all bells and whistles!

In [74]:
print(x[2:, 1])

tensor([2.0000, 0.6776, 0.0532])


In [70]:
print(x)

tensor([[0.3660, 0.0452, 0.7618],
        [2.0000, 2.0000, 2.0000],
        [2.0000, 2.0000, 2.0000],
        [0.4470, 0.6776, 0.2404],
        [0.8408, 0.0532, 0.4989]])


**Read later** 100+ Tensor operations, including transposing, indexing, slicing, mathematical operations, linear algebra, random numbers, etc are described here <http://pytorch.org/docs/torch>

# Assignment

1. multiplication of two tensors (see [torch.Tensor.mul](http://pytorch.org/docs/master/tensors.html#torch.Tensor.mul))
2. do the same, but inplace
3. division of two tensors (see [torch.Tensor.div](http://pytorch.org/docs/master/tensors.html#torch.Tensor.div))
4. perform a matrix multiplication of two tensors of size (2, 4) and (4, 2)

#### Solution

In [84]:
e = torch.rand(5,1)
f = torch.rand(5,1)
# h = torch.Tensor(5,1) ---> valid. they act as matrices
g = torch.Tensor.mul(e,f)
print(g)

tensor([[0.3807],
        [0.0601],
        [0.0625],
        [0.0516],
        [0.1597]])


In [85]:
e.mul_(f)
print(e)

tensor([[0.3807],
        [0.0601],
        [0.0625],
        [0.0516],
        [0.1597]])


In [86]:
h = torch.Tensor.div(e,f)
print(h)

tensor([[0.4351],
        [0.2301],
        [0.0819],
        [0.3761],
        [0.4125]])


In [94]:
x1 = torch.rand(4,2)
print(x1.shape)
x2 = torch.rand(2,4)
print(x2.shape)
print(torch.mm(x1,x2))
#mm = matrix multiplication

torch.Size([4, 2])
torch.Size([2, 4])
tensor([[0.4325, 0.0910, 0.4308, 0.3143],
        [0.9888, 0.2242, 0.9787, 0.7407],
        [0.1463, 0.0080, 0.1544, 0.0752],
        [0.1599, 0.0255, 0.1624, 0.1051]])


## 3. Numpy Bridge

Converting a torch Tensor to a numpy array and vice versa is a breeze.

The torch Tensor and numpy array will share their underlying memory locations, and changing one will change the other.

### 3.1 Converting torch Tensor to numpy Array

In [95]:
a = torch.ones(5)
print(a)

tensor([1., 1., 1., 1., 1.])


In [96]:
b = a.numpy()
print(b)

[1. 1. 1. 1. 1.]


See how the numpy array changed in value: the `numpy()` method provides a *view* of the original tensor, not a copy.

In [97]:
a.add_(1)
print(a)
print(b)

tensor([2., 2., 2., 2., 2.])
[2. 2. 2. 2. 2.]


### 3.2 Converting numpy Array to torch Tensor

See how changing the np array changed the torch Tensor automatically

In [99]:
import numpy as np
a = np.ones(5)
#creates a tensor from a np.array
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)
print(b)

[2. 2. 2. 2. 2.]
tensor([2., 2., 2., 2., 2.], dtype=torch.float64)


# Assignment

1. create a tensor of size (5, 2) containing ones
2. now convert it to a numpy array
3. now convert it back to a torch tensor

In [101]:
x3 = torch.ones(5,2)
print(x3)

tensor([[1., 1.],
        [1., 1.],
        [1., 1.],
        [1., 1.],
        [1., 1.]])


In [109]:
x4 = x3.numpy()
x4

array([[1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.]], dtype=float32)

In [110]:
x5 = torch.from_numpy(x4)
print(x5)

tensor([[1., 1.],
        [1., 1.],
        [1., 1.],
        [1., 1.],
        [1., 1.]])


All the Tensors on the CPU except a CharTensor support converting to NumPy and back.

## 4 CUDA Tensors

Tensors can be moved onto GPU using the `.cuda` function.
This is not necessary, but check the `README.md` for details on how to use a GPU with docker.

In [111]:
# let us run this cell only if CUDA is available
if torch.cuda.is_available():
    x = x.cuda()
    y = y.cuda()
    z = x + y
    # Notice that the tensors are now of type torch.cuda.FloatTensor (notice the cuda in there)
    # This is meant as a tensor to be run on the GPU.
    # The .cuda() does this to any parameter it is applied to.
    print(x)
    print(y)
    print(z)
else:
    print("CUDA not available on your machine.")

tensor([[0.3660, 0.0452, 0.7618],
        [2.0000, 2.0000, 2.0000],
        [2.0000, 2.0000, 2.0000],
        [0.4470, 0.6776, 0.2404],
        [0.8408, 0.0532, 0.4989]], device='cuda:0')
tensor([[0.6506, 0.2786, 1.4628],
        [2.9307, 2.9672, 2.1787],
        [2.2699, 2.7400, 2.5253],
        [0.7027, 1.0780, 0.8871],
        [1.1721, 0.3413, 0.5439]], device='cuda:0')
tensor([[1.0167, 0.3238, 2.2246],
        [4.9307, 4.9672, 4.1787],
        [4.2699, 4.7400, 4.5253],
        [1.1497, 1.7556, 1.1274],
        [2.0129, 0.3945, 1.0427]], device='cuda:0')
