# Demystifying Neural Network 

## Pytorch tutorial
- What is pytorch?
- Why everyone talks about GPUs in neural network?
- All about tensors
    - What is a tensor?
    - Tensor's rank, axes, and shape
        - Shape of a tensor 
    - 4 ways to create Pytorch tensor objects
    - Memory Sharing - different pytorch object constructors
    - Converting numpy arrays into pytorch tensor object
    - different tensor data types
- Tensor Manipulation: Flatten, Reshape, and Squeeze
- Broadcasting operation

### Intro
I started taking a course on *Natural Language Processing with Pytorch* last week at FastCampus. So, in this notebook, I am going to practice tensor manipulation with pytorch. Although pytorch is very similar to numpy, I am still not 100% familiar with all tensor manipulation concepts and codes (reshape, concatenate, broadcasting, etc). This notebook will cover only basics, but I am going to add more Pytorch tutorial practice notebook as the course goes on. Most of the information I wrote in this notebook is from Deeplizard and the course material that I mentioned above. Check out [Deeplizard](http://deeplizard.com/) website since they have awesome pytorch tutorial for beginners! 

### What is pytorch?
Pytorch is an open-source deep learning neural network package for Python. Pytorch tensor objects (primary data structure used by neural networks) are created from Numpy `ndarray` objects. Thus, Pytorch and Numpy share basic operations. We can build a nueral network with Numpy, but problems are that computations cannot run on GPUs (Graphic Processing Units) and we have to compute our gradients. This might be ok if we only have only few layers, but it will be absolutely impossible to do so when we have to build deep neural networks. On the other hand, we can easily build neural networks via deep learning frameworks (in our case, Pytorch!) and compute gradients. It can also run computations efficiently on GPUs. Therefore, Pytorch is targeted at the audience who (1) wants to use the power of GPUs to build and train neural network and (2) needs a deep learning platform that offers flexibility and speed. 

### Why everyone talks about GPUs in neural network programming? 
Frankly, I did not know what a GPU is until I took a course on deep learning. So, I want to explain briefly about a GPU that everyone talks about in neural network programming. 

We all have CPUs (Central Processing Units) since a CPU is the processor that power most of the typical computations on our electronic devices. Although a CPU has fewer cores, each core is much faster and good at multi-tasking (sequential tasks). On the other hand, a GPU is more cores (sometimes has almost 1000 times more cores than a CPU!), but each core is much slower. However, it has its advanages; it is really good at one thing: parallel computing.

Then, what is parallel computing? Parallel computing is a type of computation in which a big computation is broken into small independent computations. Matrix multiplication is a great example of parallel computing, the main operation within neural network. You see, in matrix multiplication, each computation is independent. When we do matrix multiplication, we multiply each column of B by each row of A. 

<img src="img/blog3_figure1.png" width=400, height=200>

If we break really big computations into smaller independent computations and feed to a GPU, it can handle the computations very quickly by utilizing its numerous cores simultaneously. That's great, right? then why don't we just do every computation in GPU? Well, a GPU can be much faster at computing than a CPU, but we have to be careful since it is costly to move data from a CPU to a GPU. If we send every computation (even a simple computation that does not need the power of GPUs), overall performance might be slower.

## All about tensors

### What is a tensor?
As I mentioned above, tensor is the primary data structure used in neural networks. I first thought that the term tensor is some kind of a new data type that I have to learn all over again, BUT it was not! The concept of a tensor is a "mathematical generalization of other more specific concepts." Basically, it's a generalized concept of a vector. 

The field of computer science and that of mathematics call the same concepts differently:  
(Computer Science, Mathematics) --> (number, scalar), (array, vector), (2d-array, matrix)  
2 terms inside each tuple mean the same thing, but the same concept is called differently in two fields. Tensor is a term that solves this confusion. Thus, instead of calling nd-array for CS folks or nd-matrix for mathematicians, we can just have nd-tensor that can generalize concepts.

### Tensor's rank, axes, and shape
For me, one of the most confusing concepts was the shape of a tensor. By reading materials on Deeplizard, I realized that a tensor's rank, axes, and shape are all fundamentally connected with a tensor's indices. If I understand how they are connected.
- rank of a tensor: the number of dimensions present within a tensor
- axes of a tensor: specific dimension of a tensor. Rank-4 tensor means it has 4 dimensions (4 axes)
- shape of a tensor: shape of a tensor is determined by the length of each axis

Below is an example:

In [1]:
import torch
import numpy as np

# we have a rank-1 tensor since it requires 1 index to get a component of d_1_tensor 
d_1_tensor = np.array([1,1,1,1,1])
print(d_1_tensor[0])

# here, we have a rank-2 tensor since it requires 2 inices to get a component of d_2_tensor
d_2_tensor = np.array([[1,1,1],
                       [2,2,2],
                       [3,3,3]])

print(d_2_tensor[1][0]) 


print(d_2_tensor.shape) # d_2_tensor has a shape of (3,3)
print(len(d_2_tensor.shape)) # the length of the shape is 2 showing its rank

1
2
(3, 3)
2


#### shape of a tensor

<img src="img/blog3_figure2.png" width=400, height=200>
Above shows typical tensor shapes for two scenarios: computer vision, natural language processing.
For example, for computer vision, B represents batch size: how many samples are in a batch. Then, we pick a color channel, choose a hieght, then we choose a width to arrive at a specific pixel value.

In [49]:
x = torch.tensor([[[1,2,3],
                 [4,5,6]],
                 [[5,7,9],
                 [10,11,12]]])
x

tensor([[[ 1,  2,  3],
         [ 4,  5,  6]],

        [[ 5,  7,  9],
         [10, 11, 12]]])

In [52]:
print(x.size(), x.shape) # size and shape give us the same value
print(len(x.shape)) # length of a tensor's shape is equal to the rank of a tensor

torch.Size([2, 2, 3]) torch.Size([2, 2, 3])
3


### 4 ways to create Pytorch tensor objects

There are 4 ways to get tensor objects via Pytorch:  
1) `torch.Tensor(data)`  
2) `torch.tensor(data)`  
3) `torch.as_tensor(data)`  
4) `torch.from_numpy(data)`  

I made 1d-array with numpy below and converted to a tensor object via above 4 methods. First, the `torch.Tensor` constructor gives us the default tensor data type. Therefore, even though we made the data with integers, when we construct a tensor object with `torch.Tensor` constructor, we get the data type `torch.float32`. On the other hand, other constructor types `torch.tensor`, `torch.as_tensor`, and `torch.from_numpy` give us the same data type. It turns out that the methods 2,3 and 4 are factory functions. Factory function is a function that accepts parameter inputs and returns particular type of object. This allows more dynamic object creation. Thus, methods 2,3, and 4 do type inference when we make tensor with those methods. Otherwise, we can also assign a data type to a tensor when creating one.

In [5]:
data = np.array([2, 3, 4])
print(torch.get_default_dtype()) # pytorch default data type is float32
t1 = torch.Tensor(data)
t2 = torch.tensor(data)
t3 = torch.as_tensor(data)
t4 = torch.from_numpy(data)

print(t1, t1.dtype)
print(t2, t2.dtype)
print(t3, t3.dtype)
print(t4, t4.dtype)

# if we pass input data as a numpy array, tensor constructor infers its data type from the input
print(torch.tensor(np.array([0.2, 0.45, -0.0014]))) 

torch.tensor([2, 3, 4], dtype = torch.float64) # we can also assign a data type

torch.float32
tensor([2., 3., 4.]) torch.float32
tensor([2, 3, 4]) torch.int64
tensor([2, 3, 4]) torch.int64
tensor([2, 3, 4]) torch.int64
tensor([ 0.2000,  0.4500, -0.0014], dtype=torch.float64)


tensor([2., 3., 4.], dtype=torch.float64)

### Memory Sharing - different pytorch object constructors
It gets interesting when we change values of the data without redefining above tensor objects. I'm going to change the values of the array as below and print out each created tensor object. 

In [6]:
data[0] = 0
data[1] = 1
data[2] = 2

Can you see the difference below? `t1`and `t2` are constructed with `torch.Tensor` and `torch.tensor`, respectively, and `t3`and `t4` are with `torch.as_tensor` and `torch.from_numpy`, respectively. Changing the data did not affect `t1` and `t2`, while it did for `t3` and `t4`. This difference comes from how memory is allocated within each of above creation methods. 

In [7]:
print(t1)
print(t2)
print(t3)
print(t4)

tensor([2., 3., 4.])
tensor([2, 3, 4])
tensor([0, 1, 2])
tensor([0, 1, 2])


while `torch.Tensor` and `torch.tensor` **copy** the input numpy array, `torch.as_tensor` and `torch.from_numpy` **share** memory with the input array. This means that we can move between numpy and pytorch seamlessly. Using `torch.as_tensor` and `torch.from_numpy` constructors can be very fast when creating a tensor object since pytorch object constructed with these methods share memory with an input numpy array. Therefore, any changes made within numpy data will be reflected on a pytorch object that are made with `torch.as_tensor` and `torch.from_numpy`.

### Converting numpy arrays into pytorch tensor object

In [34]:
d_1_tensor = torch.from_numpy(d_1_tensor)
print(d_1_tensor, type(d_1_tensor))

# pytorch tensor object can also be converted to numpy array
d_1_tensor = d_1_tensor.numpy()
print(d_1_tensor, type(d_1_tensor))

tensor([1, 1, 1, 1, 1]) <class 'torch.Tensor'>
[1 1 1 1 1] <class 'numpy.ndarray'>


### different tensor data types

In [35]:
torch.FloatTensor(3,3) # making a random (3,3) tensor with pytorch

tensor([[ 0.0000e+00, -2.5244e-29,  6.0774e+10],
        [-8.5920e+09,  4.2039e-45,  0.0000e+00],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00]])

In [37]:
ft = torch.FloatTensor([[1, 2],
                        [3, 4]])
ft # elements of the input is integer, but FloatTensor changes integer values into float values

tensor([[1., 2.],
        [3., 4.]])

In [42]:
lt = torch.LongTensor([[12, 23],
                      [31, 46]])
print(lt) # long tensor has 64-bit integer values

bt = torch.ByteTensor([[4, 0, 0],
                       [0, 1, 3]])
print(bt) # ByteTensor has 8-bit integer values

tensor([[12, 23],
        [31, 46]])
tensor([[4, 0, 0],
        [0, 1, 3]], dtype=torch.uint8)


- changing tensor's data type using: `x.long()`, `x.float()`, `x.byte()`

In [46]:
print(ft.long())
print(lt.float())
print(ft.byte())

tensor([[1, 2],
        [3, 4]])
tensor([[12., 23.],
        [31., 46.]])
tensor([[1, 2],
        [3, 4]], dtype=torch.uint8)


### Tensor Manipulation: Flatten, Reshape, and Squeeze
#### reshaping operation  
- With Pytorch, we can reshape a tensor with `x.view(n,m)`and `x.reshape(n,m)`
- we can also change rank with `squeeze`, `unsqueeze`, `flatten`
`squeeze` removes all of the dimensions or axes that have a length of 1  
`unsqueeze`adds a dimension with a length of 1  
`flatten` removes all of the axes except for 1. This creates another tensor with a single axis which contains the elements of a tensor. Thus, when we flatten a tensor, we create a 1d-tensor of the given tensor

In [58]:
# because the tensor x contains 12 elements, 
x.view(12), x.view(-1), x.numel() 

(tensor([ 1,  2,  3,  4,  5,  6,  5,  7,  9, 10, 11, 12]),
 tensor([ 1,  2,  3,  4,  5,  6,  5,  7,  9, 10, 11, 12]),
 12)

In [64]:
y = torch.tensor([[[1,1,1],
                  [2,2,2],
                  [3,3,3]]])
y.shape

torch.Size([1, 3, 3])

In [67]:
# Let's make a flatten function using reshape and squeeze
def flatten(t):
    t = t.reshape(1,-1)
    t = t.squeeze()
    return t

In [71]:
flatten(y), y.flatten() # both give the same 1d tensor

(tensor([1, 1, 1, 2, 2, 2, 3, 3, 3]), tensor([1, 1, 1, 2, 2, 2, 3, 3, 3]))

In [72]:
y.reshape(1,-1) # reshape function only "reshapes" only while flatten does reshaping and squezzing

tensor([[1, 1, 1, 2, 2, 2, 3, 3, 3]])

In [73]:
y.reshape(-1) # however, if we put -1, it gives us the same result as flatten

tensor([1, 1, 1, 2, 2, 2, 3, 3, 3])

- Let's say there are 4 images represented by t1, t2, t3, t4 vectors. Let's say these 4 images will make a one batch of an input of CNN. We know that we have to flatten this batch in order to feed in to the network. How can we do that?

In [116]:
t1 = torch.tensor([[1,1,1,1],
                   [1,1,1,1],
                   [1,1,1,1],
                   [1,1,1,1]])

t2 = torch.tensor([[2,2,2,2],
                  [2,2,2,2],
                  [2,2,2,2],
                  [2,2,2,2]])
t3 = torch.tensor([[3,3,3,3],
                  [3,3,3,3],
                  [3,3,3,3],
                  [3,3,3,3]])

t4 = torch.tensor([[4,4,4,4],
                  [4,4,4,4],
                  [4,4,4,4],
                  [4,4,4,4]])

In [112]:
t1.shape, t2.shape, t3.shape, t4.shape # we have 4 x 4 (H x W) 2d-tensor

(torch.Size([4, 4]),
 torch.Size([4, 4]),
 torch.Size([4, 4]),
 torch.Size([4, 4]))

In [117]:
t = torch.stack((t1,t2,t3,t4)) # Let's stack all 4 images so that we have 1 batch made of 4 images
t

tensor([[[1, 1, 1, 1],
         [1, 1, 1, 1],
         [1, 1, 1, 1],
         [1, 1, 1, 1]],

        [[2, 2, 2, 2],
         [2, 2, 2, 2],
         [2, 2, 2, 2],
         [2, 2, 2, 2]],

        [[3, 3, 3, 3],
         [3, 3, 3, 3],
         [3, 3, 3, 3],
         [3, 3, 3, 3]],

        [[4, 4, 4, 4],
         [4, 4, 4, 4],
         [4, 4, 4, 4],
         [4, 4, 4, 4]]])

In [118]:
t.shape

torch.Size([4, 4, 4])

In [119]:
# Note that we have only 3 rank. 
# As we saw, a typical setting for an image tesor has 4 ranks - (# of images, # of color channels, height, width)
# Thus, we put color channel axis to the index 1 (for simplicity, let's say that we have only 1 color channel)
t = t.unsqueeze(1) # we 'unsqueeze' at axis 1 so that a color channel is included in our tensor
t.shape

torch.Size([4, 1, 4, 4])

In [120]:
t[0] # we can access the first image

tensor([[[1, 1, 1, 1],
         [1, 1, 1, 1],
         [1, 1, 1, 1],
         [1, 1, 1, 1]]])

In [121]:
t[0][0] # first color channel in the first image

tensor([[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]])

In [122]:
t[0][0][0] # the first row of pixels in the first color channel of the first image

tensor([1, 1, 1, 1])

- However if we just do `t.reshape(-1)`, we cannot get individual prediction for each image since `t.reshape(-1)` will ignore separations among images and just make a single 1d-tensor. In order to prevent that we can use parameter `start_dim` of `flatten`function. by adding `start_dim = 1` argument, we can see that batch size is preserved and it flattens only color channel, height and width of a tensor.

In [124]:
# by 
t.flatten(start_dim = 1) 

tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
        [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
        [4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4]])

In [125]:
t.flatten(start_dim = 1).shape # batch size is preserved! 

torch.Size([4, 16])

### broadcasting operation
- you probably also have seen the concept of broadcasting when you learned numpy. Broadcasting refers to element-wise tensor operation. Therefore, same shape is required to do broadcasting operation. 

In [126]:
t1

tensor([[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]])

In [127]:
t1 + 2 # you see, when we add 2 to t1 2 is added to each element of t1

tensor([[3, 3, 3, 3],
        [3, 3, 3, 3],
        [3, 3, 3, 3],
        [3, 3, 3, 3]])

If we see this with `np.broadcast_to` function, we convert a value 2 to match the shape of t1. This is precisely what's going on under the hood when we do `t1 + 2`

In [130]:
np.broadcast_to(2, t1.shape) 

array([[2, 2, 2, 2],
       [2, 2, 2, 2],
       [2, 2, 2, 2],
       [2, 2, 2, 2]])