# PyTorch Tutorial with Google Colab


In this tutorial, we will first review the mathematics behind vectors, matrics and tensors. Then we learn the PyTorch basics, and show you how to construct a simple deep neural network (DNN/CNN).

After finishing this tutorial, you will able to create, transpose, squeeze, and change the order of a tensor. You will also be able to understand the basic structure of a neural network.

Important: You need to run all cells in this notebook in order. Otherwise, you may not import the right libraries, and the code may not run.


# Background: the mathematics of scalars, vectors, matrices and tensors 



In mathematics,  **tensors** are a multi-dimensional generalization of **scalars**, **vectors** and **matrices**. 
(NB: PyTorch, the `Tensor` data structure is used to implement tensors, vectors and matrices. We will get to that below)


We assume that you have come across scalars, vectors, and matrices before: 

* **Scalar** is just a fancy term for a single (natural/integer/real/complex) number. You can think of a scalar as a **zero-dimensional arrays**.  We typically assume we are dealing with real-valued scalars $x\in \mathbb{R}$,  e.g. $x=3.4$ or $x=2.0$. 

* An **$n$-dimensional vector** $\mathbf{x}$ is a **one-dimensional array** of $n$ scalars:  $$\mathbf{x} = [2, 5, 10]$$ with elements $x_1 = 2$, $x_2 = 5$, $x_3 = 10$ and $n=3$. 

* Mathematically,  a vector represents a point in $n$-dimensional space, although we will sometimes just think of it as a list of $n$ numbers. We typically assume that all components of a vector $\mathbf{x}$ are scalars of the same type (e.g. real numbers), which allows us to write $\mathbf{x} \in \mathbb{R}^n$ (for a real vector)). 

* Note that the scalar $x$ is not the same as the one-dimensional vector $$\mathbf{x} = [x]$$ (you can form a product of (multiply) any n-dimensional vector $\mathbf{y}$ with a scalar $x$, but you cannot form any product of two vectors $\mathbf{x}$ and $\mathbf{y}$, unless they have the same dimensionality) 

* An $n\!\times\!m$**-dimensional matrix** $X$ is a **two-dimensional array** of scalars with $n$ rows and $m$ columns, eg.: 

$$A = \begin{bmatrix}
   1 & 2 \\
   3 & 4 \\ 
   5 & 6 \\
\end{bmatrix}$$


* Here, the matrix $A$ has three **rows** ($[1,2]$, $[3,4]$, and $[5,6]$) and two **columns** ($[1,3,5]$ and $[2,4,6])$. Note that each row and each column can be seen as a vector, so you can think of a matrix as an array of vectors. 

* However, rows and columns are not interchangeable. The **size**  of matrix $A$ is $3 \times 2$ , corresponding to the product of number of rows and columns, while the size of matrix $B$ is $2 \times 3$:

$$B = \begin{bmatrix}
   1 & 3 & 5 \\
   2 & 4 & 6\\ 
\end{bmatrix}$$


* Mathematically,  a matrix $X$ maps points in an $n$-dimensional space to points in an $m$ dimensional space via matrix multiplication (more on that below).  We again assume that all $n\times m$ elements of a matrix are scalars of the same type (typically reals) and write $X \in \mathbb{R}^{n \times m}$)




* **Tensors** generalize vectors and matrices to **multi-dimensional arrays**. 
You can think of a three-dimensional tensor as a vector of matrices (or a matrix of vectors, depending on which way you look at it). 






# Tensors as data structures in PyTorch



In PyTorch,  a Tensor (`torch.Tensor`) is a multi-dimensional matrix containing elements of a **single data type** (typically floating point or integer).  

PyTorch Tensors are similar to Numpy Arrays, but Tensors are  better suited for deep learning. Tensors can run on GPUs/TPUs, and are optimized for automatic differentiation, allowing us to compute gradients and update their values in a very straightforward manner. 

(Tensors on the CPU and Numpy arrays can share their underlying memory locations, and changing one will change the other. You can ignore that if you are just using PyTorch, or if you want to use GPUs, both of which we recommend).

When we implement neural nets in PyTorch, we use tensors to encode the **inputs** and **outputs** of the model, as well as the model’s **parameters** (sets of weights).


## Usefule attributes of Tensors:
- The `type(x)` function and the `x.dtype` argument return the type of the Tensor `x`, and can be very useful for **debugging purposes**. 

- The `x.shape` argument indicates the size of each dimension of Tensor `x`.  

- The `x.device` argument tells you what device Tensor `x` is stored on (cpu or gpu). 

# Setup: Import Libraries and Select Device (CPU/GPU)

To use PyTorch, you need to import the necessary libraries and decide whether to use a CPU or GPU (Cuda). 

In [1]:
from collections import defaultdict
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence
import torch.optim as optim
from torchtext import data, datasets
import torchvision.transforms as transforms
import torchvision

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

print('Using device:', device) 
# if you want to use cuda, you should select "GPU" in the mean bar -> Runtime -> Change runtime type

Using device: cuda:0


# Creating vectors, matrices and tensors as `Torch.Tensor`s

We will now walk through a few simple examples, showing you how to use Torch.Tensor to implement one-dimensional vectors, two-dimensional matrices, and finally three-dimensional tensors.


A **vector** (e.g. [1, 25, 30, 6]) can be represented as a one-dimensional Tensor, which you can create by passing a list of numbers to the `torch.tensor(list)` constructor. 


The `shape` attribute of any Tensor is a tuple. In the case of a one-dimensional Tensor the `shape` tuple has a single argument, indicating how mamny elements the Tensor (vector) has.  

A Tensor's `dtype` attribute tells you the data type (e.g. 64-bit integers) of its elements, and the `device` attribute tells you whether the Tensor is stored on a CPU or GPU. 

In [2]:
# create a tensor from a list and print out its properties
# start from a vector (one-dimensional)
data = [1,25,30,6] # create a list 
x_data = torch.tensor(data) # use torch.tensor to create a tensor from a list
print(f"Tensor x_data:\n {x_data}")  #print out the tensor 
print(f"Shape of tensor x_data: {x_data.shape}") # read the size of the tensor
print(f"Datatype of tensor x_data: {x_data.dtype}") # what kind of datatype this tensor is stored
print(f"Device tensor is stored on x_data: {x_data.device}") # this tensor is stored on cpu/gpu

Tensor x_data:
 tensor([ 1, 25, 30,  6])
Shape of tensor x_data: torch.Size([4])
Datatype of tensor x_data: torch.int64
Device tensor is stored on x_data: cpu



A **matrix** can be implemented as two-dimensional Torch tensor. To read in the elements of the matrix, we assume we are given a list of rows vectors.


In [3]:
# Let us create a 3x2 matrix, consisting of 3 rows and 2 columns
data = [[1, 2],[3, 4], [5,6]] # To create a matrix with specific elements, we need a list of row vectors. 
x = torch.tensor(data) # use torch.tensor to create a tensor from a list
print(f"Tensor x:\n {x}")  #print out the tensor 
print(f"Shape of Tensor x: {x.shape}") # read the size of the tensor
print(f"Data type of Tensor x: {x.dtype}") # what kind of datatype this tensor is stored
print(f"Device that tensor x_data is stored on: {x.device}") # this tensor is stored on cpu/gpu

Tensor x:
 tensor([[1, 2],
        [3, 4],
        [5, 6]])
Shape of Tensor x: torch.Size([3, 2])
Data type of Tensor x: torch.int64
Device that tensor x_data is stored on: cpu


Finally, we also show the case of a **tensor** (a three-dimensional Tensor). The third dimension is just one more dimension in addition to rows and columns. In other words, if you consider a two dimensional matrix as a list inside a list (two layers), then a three dimensional one has three such layers of lists.

In [4]:
# three dimensional matrix
data = [[[1, 2],[3, 4]], [[5,6], [7,8]]] # create a list with lists inside
x_data = torch.tensor(data) # use torch.tensor to create a tensor from a list
print(f"The Tensor x_data:\n {x_data}") # print the content of this tensor
print(f"Shape of tensor x_data: {x_data.shape}") # read the size of the tensor, we have three numbers here with three dims
print(f"Datatype of tensor x_data: {x_data.dtype}") # what kind of datatype this tensor is stored
print(f"Device that tensor x_data is stored on: {x_data.device}") # this tensor is stored on cpu/gpu


The Tensor x_data:
 tensor([[[1, 2],
         [3, 4]],

        [[5, 6],
         [7, 8]]])
Shape of tensor x_data: torch.Size([2, 2, 2])
Datatype of tensor x_data: torch.int64
Device that tensor x_data is stored on: cpu


#### Conversion between Numpy Array and Tensor

You can also convert a Numpy Array to a Tensor.

In [5]:
# create a tensor from a numpy array
np_array = np.array(data)
print(f"The content of this numpy array: \n{np_array}\n")
x_np = torch.from_numpy(np_array)
print(f"Shape of this tensor: {x_np.shape}\n")
# convert a tensor to a numpy array
np_array_convert = x_np.numpy()
print(f"The content of this converted numpy array: \n{np_array_convert}\n")

The content of this numpy array: 
[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]

Shape of this tensor: torch.Size([2, 2, 2])

The content of this converted numpy array: 
[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]



#### Create Tensors with Certain Properties

Below we will create tensors with all 1's and random numbers.

In [6]:
# create tensors with certain properties (ones, random)
x_ones = torch.ones_like(x_data) # retains the properties of x_data (shape here)
print(f"Ones Tensor: \n {x_ones} \n") # we will get a matrix of all 1's
x_rand = torch.rand_like(x_data, dtype=torch.float) # overrides the datatype of x_data, but retains the shape
print(f"Random Tensor: \n {x_rand} \n") # we will get a matrix of random numbers

Ones Tensor: 
 tensor([[[1, 1],
         [1, 1]],

        [[1, 1],
         [1, 1]]]) 

Random Tensor: 
 tensor([[[0.5751, 0.2491],
         [0.9667, 0.5305]],

        [[0.9325, 0.1260],
         [0.7124, 0.3086]]]) 



In [7]:
# create tensors with a specific "shape" (it should be a tuple)
shape = (2,5,) # if you leave it blank for the third dimension, the tensor will only be two-dimensional
rand_tensor = torch.rand(shape)
ones_tensor = torch.ones(shape)
zeros_tensor = torch.zeros(shape)

print(f"Random Tensor: \n {rand_tensor} with shape {rand_tensor.shape}\n")
print(f"Ones Tensor: \n {ones_tensor} with shape {rand_tensor.shape}\n")
print(f"Zeros Tensor: \n {zeros_tensor} with shape {rand_tensor.shape}")

Random Tensor: 
 tensor([[0.1076, 0.6719, 0.4005, 0.0096, 0.8671],
        [0.3910, 0.9862, 0.6260, 0.0324, 0.1765]]) with shape torch.Size([2, 5])

Ones Tensor: 
 tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]]) with shape torch.Size([2, 5])

Zeros Tensor: 
 tensor([[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]]) with shape torch.Size([2, 5])


# Operations on Tensors

#### Indexing and Slicing
How can we access and change parts of a given Tensor `tensor`? (For now we will assume the tensor is two-dimensional)

**Accessing the content of a tensor:**
*   Note that indices start at 0, not at 1!
*  `tensor[i,j]` accesses the element in the row $i$ and  column $j$.
* `tensor[i]` selects the $i^{th}$ row of a tensor's content. Note that we can ignore the columns (which are specified after the rows).
* `tensor[:, i]` or `tensor[..., i]` selects the $i^{th}$ column  (Note that we now need to select all rows, because they are specified before the columns). We do this with  `:` or `...`). 
* If we have a tensor with higher dimensions, then we need to specify more numbers to get the part we need.


**Changing the content of a tensor:**

First, we identify the part of the tensor that we want to change. After that, we assign a new value to it using `tensor[i] = new_tensor` or `tensor[i] = new_scalar`. If we provide a new tensor, it needs to have a **same shape** as the selected part. A new scalar will change **all** elements in the selected part to that scalar.

In [8]:
tensor = torch.rand(2, 4)
print('THE INPUT TENSOR WITH SHAPE', tensor.shape, ':\n\n', tensor)
print('\nACCESSING ELEMENTS OF THIS TENSOR\n')
print('-- The element in the 1st row and 2nd column (note that indices start at 0): ', tensor[0,1])

print('-- The first row:\n\t',  tensor[0], 'or\n\t', tensor[0, :], 'or\n\t', tensor[0,...]) # three different ways to access the first row

print('-- The last column:\n\t', tensor[:,-1], 'or\n\t', tensor[..., -1]) # ... is the same as : for selecting all
tensor[1, :] = 0

print('\nCHANGING ELEMENTS OF THIS TENSOR\n\n')
print('-- Setting the last row to a zero vector:\n\t', tensor)
tensor[1, :] = torch.tensor([1,2,3,4])
print('-- Setting the last row to the the vector (1, 2, 3, 4):\n\t', tensor)

THE INPUT TENSOR WITH SHAPE torch.Size([2, 4]) :

 tensor([[0.0874, 0.4600, 0.5120, 0.9194],
        [0.1577, 0.1536, 0.7996, 0.8438]])

ACCESSING ELEMENTS OF THIS TENSOR

-- The element in the 1st row and 2nd column (note that indices start at 0):  tensor(0.4600)
-- The first row:
	 tensor([0.0874, 0.4600, 0.5120, 0.9194]) or
	 tensor([0.0874, 0.4600, 0.5120, 0.9194]) or
	 tensor([0.0874, 0.4600, 0.5120, 0.9194])
-- The last column:
	 tensor([0.9194, 0.8438]) or
	 tensor([0.9194, 0.8438])

CHANGING ELEMENTS OF THIS TENSOR


-- Setting the last row to a zero vector:
	 tensor([[0.0874, 0.4600, 0.5120, 0.9194],
        [0.0000, 0.0000, 0.0000, 0.0000]])
-- Setting the last row to the the vector (1, 2, 3, 4):
	 tensor([[0.0874, 0.4600, 0.5120, 0.9194],
        [1.0000, 2.0000, 3.0000, 4.0000]])


#### Concatenate the Tensors: `torch.cat` & `torch.stack`

- `torch.cat`: 

  - Concatenates a given sequence of tensors in the **given** dimension. **All** tensors must either have the **same shape** (except in the concatenating dimension) or be **empty**.
  - *When to use this operation?* You have several tensors with same shapes (except the concatenating dim) and you want to **extend** the concatenating dim.


- `torch.stack`: 
  - Concatenates a sequence of tensors along a **new** dimension (inserted at a particular location). **All** tensors need to be of the **same** size.
  - *When to use this operation?* You have several tensors with exactly same sizes and you want to **add a new axis** so as to combine these tensors.


In [9]:
# cat: dim specifies which dimension we cat, it should be small than the dims of given tensor
t1 = torch.cat([tensor, tensor, tensor], dim=0)  # we cat three tensors at existing dim=0, so dim=0 becomes 2*3=6
print('THE INPUT TENSOR WITH SHAPE', tensor.shape, ':\n\n', tensor, '\n')

print('AFTER WE CONCATENATE THREE COPIES OF THIS TENSOR AT DIM=0 (rows): new shape =', t1.shape, '\n\n', t1, '\n')

t2 = torch.cat([tensor, tensor], dim=1)  # we cat three tensors at existing dim=1, so dim=1 becomes 4*2=8
print('AFTER WE CONCATENATE TWO COPIES OF THIS TENSOR AT DIM=1 (columns): new shape =', t2.shape, '\n\n', t2)

THE INPUT TENSOR WITH SHAPE torch.Size([2, 4]) :

 tensor([[0.0874, 0.4600, 0.5120, 0.9194],
        [1.0000, 2.0000, 3.0000, 4.0000]]) 

AFTER WE CONCATENATE THREE COPIES OF THIS TENSOR AT DIM=0 (rows): new shape = torch.Size([6, 4]) 

 tensor([[0.0874, 0.4600, 0.5120, 0.9194],
        [1.0000, 2.0000, 3.0000, 4.0000],
        [0.0874, 0.4600, 0.5120, 0.9194],
        [1.0000, 2.0000, 3.0000, 4.0000],
        [0.0874, 0.4600, 0.5120, 0.9194],
        [1.0000, 2.0000, 3.0000, 4.0000]]) 

AFTER WE CONCATENATE TWO COPIES OF THIS TENSOR AT DIM=1 (columns): new shape = torch.Size([2, 8]) 

 tensor([[0.0874, 0.4600, 0.5120, 0.9194, 0.0874, 0.4600, 0.5120, 0.9194],
        [1.0000, 2.0000, 3.0000, 4.0000, 1.0000, 2.0000, 3.0000, 4.0000]])


In [10]:
# stack: dim to insert. Has to be between 0 and the number of dims of concatenated tensors
t0 = torch.stack([tensor, tensor, tensor], dim=0) # we stack three tensors at a new dim=0, them we have dim=0 with size 1*3=3
t1 = torch.stack([tensor, tensor, tensor], dim=1) # we stack three tensors at a new dim=1, them we have dim=1 with size 1*3=3
t2 = torch.stack([tensor, tensor, tensor], dim=2) # we stack three tensors at a new dim=2, them we have dim=2 with size 1*3=3
print('THE INPUT TENSOR WITH SHAPE', tensor.shape, ':\n\n', tensor, '\n')
print('AFTER WE STACK THREE COPIES OF THIS TENSOR AT DIM=0: new shape=', t0.shape, '\n\n', t0, '\n\n')
print('AFTER WE STACK THREE COPIES OF THIS TENSOR AT DIM=1: new shape=', t1.shape, '\n\n', t1, '\n\n')
print('AFTER WE STACK THREE COPIES OF THIS TENSOR AT DIM=2: new shape=', t2.shape, '\n\n', t2)

THE INPUT TENSOR WITH SHAPE torch.Size([2, 4]) :

 tensor([[0.0874, 0.4600, 0.5120, 0.9194],
        [1.0000, 2.0000, 3.0000, 4.0000]]) 

AFTER WE STACK THREE COPIES OF THIS TENSOR AT DIM=0: new shape= torch.Size([3, 2, 4]) 

 tensor([[[0.0874, 0.4600, 0.5120, 0.9194],
         [1.0000, 2.0000, 3.0000, 4.0000]],

        [[0.0874, 0.4600, 0.5120, 0.9194],
         [1.0000, 2.0000, 3.0000, 4.0000]],

        [[0.0874, 0.4600, 0.5120, 0.9194],
         [1.0000, 2.0000, 3.0000, 4.0000]]]) 


AFTER WE STACK THREE COPIES OF THIS TENSOR AT DIM=1: new shape= torch.Size([2, 3, 4]) 

 tensor([[[0.0874, 0.4600, 0.5120, 0.9194],
         [0.0874, 0.4600, 0.5120, 0.9194],
         [0.0874, 0.4600, 0.5120, 0.9194]],

        [[1.0000, 2.0000, 3.0000, 4.0000],
         [1.0000, 2.0000, 3.0000, 4.0000],
         [1.0000, 2.0000, 3.0000, 4.0000]]]) 


AFTER WE STACK THREE COPIES OF THIS TENSOR AT DIM=2: new shape= torch.Size([2, 4, 3]) 

 tensor([[[0.0874, 0.0874, 0.0874],
         [0.4600, 0.4600, 0.

#### Change the Shape of a Tensor: Squeeze/Unsqueeze, Permute, View, Transpose

**`torch.squeeze`**: 
  - Returns a tensor with **all** the dimensions of input of **size 1 removed**. For example, if the input has a shape of $(A × 1 × B × 1 × C)$, then the ouput after squeezing should have a shape of $(A × B × C)$. We can also squeeze a specific dimension: if we squeeze `dim=1`, then the output will have a shape of $(A × B × 1 × C)$.
  - *When to use this operation?* You want to remove all dims with size 1 and make the tensor more succinct.
  - *CAREFUL*: the new tensor will share memory with the old tensor, so if you change an element of the new tensor, you will also change it in the old tensor.


In [11]:

# Squeeze: if the input Tensor has the shape (A×1×B×C×1×D),  the output Tensor will have the shape (A×B×C×D)
x = torch.rand(1) # initialize a tensor x
y = torch.squeeze(x) # remove all dimensions that only have a single element
print('INPUT VECTOR WITH SHAPE', x.shape, '\n' , x)
print('SQUEEZED VECTOR HAS SHAPE', y.shape, '\n', y, '\n\n')


x = torch.rand(1,1) # initialize a tensor x
y = torch.squeeze(x) # remove all dimensions that only have a single element
print('INPUT VECTOR WITH SHAPE', x.shape, '\n' , x)
print('SQUEEZED VECTOR HAS SHAPE', y.shape, '\n', y, '\n\n')

x = torch.rand(2, 1) # initialize a tensor x
y = torch.squeeze(x) # remove all dimensions that only have a single element
print('INPUT VECTOR WITH SHAPE', x.shape, '\n' , x)
print('SQUEEZED VECTOR HAS SHAPE', y.shape, '\n', y, '\n\n')

x = torch.rand(1, 2) # initialize a tensor x
y = torch.squeeze(x)
print('INPUT VECTOR WITH SHAPE', x.shape, '\n' , x)
print('SQUEEZED VECTOR HAS SHAPE', y.shape, '\n', y, '\n\n')


x = torch.rand(2, 1, 2, 1) # initialize a tensor x
print(f"INPUT VECTOR WITH SHAPE': {x.shape} and vector:{x}")
y = torch.squeeze(x) # let's remove all dimensions that only have a single element
print(f"SQUEEZED VECTOR HAS SHAPE{y.shape} vector: {y}" )
y = torch.squeeze(x, 1) #  now let's delete dim=1
print(f"SQUEEZED VECTOR (dim=1) HAS SHAPE{y.shape} vector: {y}" )
y = torch.squeeze(x, 3) #  now let's delete dim=3
print(f"SQUEEZED VECTOR (dim=3) HAS SHAPE{y.shape} vector: {y}" )


INPUT VECTOR WITH SHAPE torch.Size([1]) 
 tensor([0.6662])
SQUEEZED VECTOR HAS SHAPE torch.Size([]) 
 tensor(0.6662) 


INPUT VECTOR WITH SHAPE torch.Size([1, 1]) 
 tensor([[0.6160]])
SQUEEZED VECTOR HAS SHAPE torch.Size([]) 
 tensor(0.6160) 


INPUT VECTOR WITH SHAPE torch.Size([2, 1]) 
 tensor([[0.8158],
        [0.6131]])
SQUEEZED VECTOR HAS SHAPE torch.Size([2]) 
 tensor([0.8158, 0.6131]) 


INPUT VECTOR WITH SHAPE torch.Size([1, 2]) 
 tensor([[0.7164, 0.2549]])
SQUEEZED VECTOR HAS SHAPE torch.Size([2]) 
 tensor([0.7164, 0.2549]) 


INPUT VECTOR WITH SHAPE': torch.Size([2, 1, 2, 1]) and vector:tensor([[[[0.4035],
          [0.4437]]],


        [[[0.8094],
          [0.1751]]]])
SQUEEZED VECTOR HAS SHAPEtorch.Size([2, 2]) vector: tensor([[0.4035, 0.4437],
        [0.8094, 0.1751]])
SQUEEZED VECTOR (dim=1) HAS SHAPEtorch.Size([2, 2, 1]) vector: tensor([[[0.4035],
         [0.4437]],

        [[0.8094],
         [0.1751]]])
SQUEEZED VECTOR (dim=3) HAS SHAPEtorch.Size([2, 1, 2]) vecto

**`torch.unsqueeze`:**
  - Returns a new tensor with **a** dimension of size **one** inserted at the **specified** position. This operation is the opposite of `torch.squeeze`. For example, if we have an input tensor with shape $(A × B × C)$ and we want to have unsqueezing operation with dim=1, then we will get the output with shape $(A × 1 × B × C)$.
  - *When to use this operation?* When you want to add a dim to the current tensor. It is really useful when you want to setup a batch with individual datapoints.

In [12]:
# Unsqueeze: create a new tensor with a dim of size "1" inserted at the specified position
x = torch.tensor([[1, 2, 3], [1,2,3]])
print('INPUT VECTOR WITH SHAPE=', x.shape, '\n' , x)
y = torch.unsqueeze(x,0) 
print('UNSQUEEZED VECTOR (DIM=0) HAS SHAPE', y.shape, '\n', y, '\n\n')
y = torch.unsqueeze(x, 1)
print('UNSQUEEZED VECTOR (DIM=1) HAS SHAPE', y.shape, '\n', y, '\n\n')
y = torch.unsqueeze(x, 2)
print('UNSQUEEZED VECTOR (DIM=2) HAS SHAPE', y.shape, '\n', y, '\n\n')
#print(torch.unsqueeze(x, 2)) # error: out of range, so just [-dim-1, dim+1] (dim is for the original tensor)

INPUT VECTOR WITH SHAPE= torch.Size([2, 3]) 
 tensor([[1, 2, 3],
        [1, 2, 3]])
UNSQUEEZED VECTOR (DIM=0) HAS SHAPE torch.Size([1, 2, 3]) 
 tensor([[[1, 2, 3],
         [1, 2, 3]]]) 


UNSQUEEZED VECTOR (DIM=1) HAS SHAPE torch.Size([2, 1, 3]) 
 tensor([[[1, 2, 3]],

        [[1, 2, 3]]]) 


UNSQUEEZED VECTOR (DIM=2) HAS SHAPE torch.Size([2, 3, 1]) 
 tensor([[[1],
         [2],
         [3]],

        [[1],
         [2],
         [3]]]) 




**`permute`:**
  - This creates a **rotated** copy of the original tensor in which the dimensions (0,1,3,...) of the original tensor are arranged according to the **desired ordering**.  For example, if we have a tensor with shape $(A × B × C)$ and we apply permutation $(1, 2, 0)$, then the tensor will change into $(B × C × A)$.
  - You can use this operation to align tensors that have dimensions $(AxBxC)$ and $(CxAxB)$ so that you can then concatenate them: Use *Tensor.permute* to create a copy of the second tensor that has the same dimensions as the first before concatenation. 

In [13]:
# Permute: change the orders of the orginal tensor, according to the given order
x = torch.randn(0, 1, 2, 3, 4, 5) # Let us create a random tensor where the i-th dimension has size i:
print(f"INPUT VECTOR WITH SHAPE': {x.shape} and vector:{x}")
# We create a permuted copy of tensor x.
y1 = x.permute(0, 1, 3, 2, 5, 4) # 2->dim0, 3->dim1, 1->dim2, 4->dim3, 6->dim4, 5->dim5
print(f"PERMUTING DIMENSIONS  2 and 3, and 4 and 5:': {y1.shape} and vector:{y1}")
# And now we create a second copy that reverses this permutation: 
y2 = y1.permute((0,1,3,2,5,4)) # 2->dim0, 3->dim1, 1->dim2, 4->dim3, 6->dim4, 5->dim5, use a tuple here
print(f"PERMUTING DIMENSIONS 2 and 3 and 4 and 5 again:': {y2.shape} and vector:{y2}")

INPUT VECTOR WITH SHAPE': torch.Size([0, 1, 2, 3, 4, 5]) and vector:tensor([], size=(0, 1, 2, 3, 4, 5))
PERMUTING DIMENSIONS  2 and 3, and 4 and 5:': torch.Size([0, 1, 3, 2, 5, 4]) and vector:tensor([], size=(0, 1, 3, 2, 5, 4))
PERMUTING DIMENSIONS 2 and 3 and 4 and 5 again:': torch.Size([0, 1, 2, 3, 4, 5]) and vector:tensor([], size=(0, 1, 2, 3, 4, 5))


**`view`**: 
  - Returns a new tensor with the **same data** (and the same number of elements) as the **original tensor**, but of a **different shape**. For example, we can use this function to change a tensor with shape $(2, 8)$ into the shape $(4, 4)$.
  - This operation can only be used if the resulting tensor has the same number of elements as the input tensor ($4 \cdot 4 = 2 \cdot 8 = 16$).
  - *When to use this operation?* You want to change the shape of a tensor but want it to still the same data. This is very similar to `numpy.reshape` operation and we will also show it as follows.

In [14]:
# View example in pytorch
print('-------------Example of view() in PyTorch-------------')
x = torch.randn(4, 4) # a random tensor with shape (4,4)
print('Content of x:', tensor)
print('Shape of x:', tensor.shape, '\n')
y = x.view(16) # it is like flatten x to y, and y will have shape (16,) (one dimensional)
print('Shape of tensor y with only one dimension:', y.shape)
# -1 here means after making second dim with size 8, it will just make the first dim with size 2
# It is helpful since we do not need to calculate the number manually
z = x.view(-1, 8) 
print('Shape of tensor z after view operation:', z.shape, '\n')

# Numpy reshape example
print('-------------Example of reshape() in Numpy-------------')
# first we initialize a numpy array with shape (4,4) of random integers between the interval (0,100)
x = np.random.randint(0, 100, (4, 4)) 
print('Content of numpy array x:', x)
print('Shape of numpy array x:', x.shape)
x = x.reshape(2,8) # do the reshape operation
print('Shape of x after reshape operation:', x.shape, 'x: ', x)

-------------Example of view() in PyTorch-------------
Content of x: tensor([[0.0874, 0.4600, 0.5120, 0.9194],
        [1.0000, 2.0000, 3.0000, 4.0000]])
Shape of x: torch.Size([2, 4]) 

Shape of tensor y with only one dimension: torch.Size([16])
Shape of tensor z after view operation: torch.Size([2, 8]) 

-------------Example of reshape() in Numpy-------------
Content of numpy array x: [[80 35 36 79]
 [30 60 15 45]
 [60 28 48 23]
 [42 67 81 99]]
Shape of numpy array x: (4, 4)
Shape of x after reshape operation: (2, 8) x:  [[80 35 36 79 30 60 15 45]
 [60 28 48 23 42 67 81 99]]


- `torch.transpose`: 
  - Returns a tensor that is a **transposed** version of input. The given dimensions `dim0` and `dim1` are **swapped**. For example, $\begin{bmatrix}
   0 & -1 & 1 \\
   1 & -1 & 0 \\
   -1 & 2 & 1 
\end{bmatrix}^{\top} = \begin{bmatrix}
   0 & 1 & -1 \\
   -1 & -1 & 2 \\
   1 & 0 & 1 
\end{bmatrix}$
  - *When to use this operation?* You may want to transpose a matrix to prepare for matrix multiplication.

In [15]:
# torch.transpose: transposed version of input
tensor = torch.tensor([[0,-1,1], [1,-1,0], [-1,2,1]]) # initializa a tensor same as the example above
print('Content of tensor:', tensor)
print('Shape of tensor:', tensor.shape, '\n')
print('Content of the output after the transpose operation:\n', torch.transpose(tensor,0,1), '\n')
x = torch.randn(2, 3, 4)
print('The shape of x:', x.shape, 'x:', x, '\n')
# We have swapped the dim0 & dim2 to y.
# Only two dimension can be swapped, no matter how many dimension x originally has.
y = torch.transpose(x, 0, 2) 
print('The shape of y:', y.shape, '\n', 'y:', y)


Content of tensor: tensor([[ 0, -1,  1],
        [ 1, -1,  0],
        [-1,  2,  1]])
Shape of tensor: torch.Size([3, 3]) 

Content of the output after the transpose operation:
 tensor([[ 0,  1, -1],
        [-1, -1,  2],
        [ 1,  0,  1]]) 

The shape of x: torch.Size([2, 3, 4]) x: tensor([[[-0.0030, -1.0617, -0.3225, -0.7373],
         [-0.6795,  0.0222,  1.0622,  0.4597],
         [-1.1377, -1.2084,  0.4789, -1.1309]],

        [[-0.3663, -0.7196,  0.0569, -1.7182],
         [ 0.4483, -0.1055,  1.0199, -0.8596],
         [ 1.2760,  0.5459,  0.1473,  0.2885]]]) 

The shape of y: torch.Size([4, 3, 2]) 
 y: tensor([[[-0.0030, -0.3663],
         [-0.6795,  0.4483],
         [-1.1377,  1.2760]],

        [[-1.0617, -0.7196],
         [ 0.0222, -0.1055],
         [-1.2084,  0.5459]],

        [[-0.3225,  0.0569],
         [ 1.0622,  1.0199],
         [ 0.4789,  0.1473]],

        [[-0.7373, -1.7182],
         [ 0.4597, -0.8596],
         [-1.1309,  0.2885]]])


#### Arithmetic Operations
- *Matrix addition*: You can use either `+` or `torch.add(t1, t2)` between two tensors. Either operation gives you the same output.
- *Matrix multiplication*: You can use either `@` or `matmul` between two tensors.  Either operation gives you the same output.
- *Element-wise product*: You can use either `*` or `mul` between two tensors.  Either operation gives you the same output.

In [16]:
# matrix addition (+ and torch.add())
print('Input tensor tensor:\n', {tensor})

print('-------------Matrix addition-------------')
print(f"Adding the following 2D tensor to itself\n {tensor}\n + {tensor}")
y1 = tensor + tensor
print("tensor + tensor:\n", y1, '\n')
y2 = torch.add(tensor, tensor)
print("Test if + gives the same result as tensor.add():", torch.equal(y1, y2), '\n')

# matrix multiplication (@ and tensor.matmul())
print('-------------Matrix multiplication-------------')
print('Multiplying a 2D tensor with its transpose tensor.T')
print('tensor\n', tensor)
print('tensor transpose\n', tensor.T)
y1 = tensor @ tensor.T  # matrix mul
print("tensor @ tensor:\n", y1, '\n')
y2 = tensor.matmul(tensor.T) # matrix mul
print("Test if @ gives the same results as tensor.matmul():", torch.equal(y1, y2), '\n')


# This computes the element-wise product (* and tensor.mult())
print('-------------Matrix element-wise product-------------')
print('Elementwise multiplication of a tensor with itself:')
print('tensor\n', tensor)
z1 = tensor * tensor # ele-wise product
print("tensor * tensor\n", z1, '\n')
z2 = tensor.mul(tensor) # ele-wise product
print("Test if * gives the same result as tensor.mul():", torch.equal(z1, z2))

Input tensor tensor:
 {tensor([[ 0, -1,  1],
        [ 1, -1,  0],
        [-1,  2,  1]])}
-------------Matrix addition-------------
Adding the following 2D tensor to itself
 tensor([[ 0, -1,  1],
        [ 1, -1,  0],
        [-1,  2,  1]])
 + tensor([[ 0, -1,  1],
        [ 1, -1,  0],
        [-1,  2,  1]])
tensor + tensor:
 tensor([[ 0, -2,  2],
        [ 2, -2,  0],
        [-2,  4,  2]]) 

Test if + gives the same result as tensor.add(): True 

-------------Matrix multiplication-------------
Multiplying a 2D tensor with its transpose tensor.T
tensor
 tensor([[ 0, -1,  1],
        [ 1, -1,  0],
        [-1,  2,  1]])
tensor transpose
 tensor([[ 0,  1, -1],
        [-1, -1,  2],
        [ 1,  0,  1]])
tensor @ tensor:
 tensor([[ 2,  1, -1],
        [ 1,  2, -3],
        [-1, -3,  6]]) 

Test if @ gives the same results as tensor.matmul(): True 

-------------Matrix element-wise product-------------
Elementwise multiplication of a tensor with itself:
tensor
 tensor([[ 0, -1,  1],
  

In [17]:
# sum/item/type
print('Summing up all elements of tensor:\n', tensor, '\n')
agg = tensor.sum() # get the sum of all elements of the tensor
print('The sum of all elements of this tensor is 1x1 tensor: ', agg, '\n') # a tensor with one element
agg_item = agg.item() # item() will convert a tensor (single element) to a numerical value
print('tensor.item() changes a tensor with single element to a scalar:', agg_item, '\nCheck the type of the scalar:', type(agg_item))

Summing up all elements of tensor:
 tensor([[ 0, -1,  1],
        [ 1, -1,  0],
        [-1,  2,  1]]) 

The sum of all elements of this tensor is 1x1 tensor:  tensor(2) 

tensor.item() changes a tensor with single element to a scalar: 2 
Check the type of the scalar: <class 'int'>


In [18]:
# in-place operations
print(tensor, "\n")
tensor.add_(5) # it will change all the elements in tensor itself by adding 5 to each.
print(tensor)

tensor([[ 0, -1,  1],
        [ 1, -1,  0],
        [-1,  2,  1]]) 

tensor([[5, 4, 6],
        [6, 4, 5],
        [4, 7, 6]])


# Create a Neural Network

Now we build a simple classical deep neural network model, which only has one convolutional layer. 

### Network Structure
- We create a new class named `SimpleCNN`, which inherited from `nn.Module` (a built-in class from Pytorch).
- The model below consists of an `__init__()` portion which is where you **include the layers and components of the neural network**. In our model, we have a convolutional layer denoted by `nn.Conv2d(...)`. We are dealing with an image dataset that is in a grayscale so we only need one channel going in, hence `in_channels=1`. We hope to get a nice representation of this layer, so we use `out_channels=32`. Kernel size is 3, and for the rest of parameters we use the default values which you can find [here](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html#torch.nn.Conv2d). 
- After the convolutional layer, we have a flatten operation on the incoming data, which will transform the tensor into a one-dimensional tensor by specifying the `start_dim` and `end_dim`. Here, we transform the tensor to a size of `(1, 26*26*32)`. If you would like to find out how to calculate those numbers refer to [this](https://pytorch.org/docs/stable/generated/torch.flatten.html?highlight=flatten#torch.flatten). 
- We use two back-to-back **dense layers** to the incoming data. Notice for `d1` we have a dimension 128 representing the size we want as output and `26*26*32` representing the dimension of the incoming data. In short, the dense layer transforms the input data into a specific dimension. The same applies for the second linear transformation (`d2`) where the dimension of the output of the previous linear layer was added as `in_features=128`, and `10` is just the size of the output which also corresponds to **the number of classes**.
- After each one of those layers, we also apply an **activation function** such as `ReLU`. `ReLu` is a common and useful function for activation. The output has the same shape as the input. For prediction purposes, we then apply a `softmax` layer to the last transformation and return the output of that. If you want to know the math in these operations, you can search [Pytorch Doc](https://pytorch.org/docs/stable/index.html).
- *Forword Pass* refers to the **calculation process**, values of the output layers from the inputs data. It's traversing through all neurons from first to last layer.
- *Backward Pass* refers to the **counting changes in weights**, using **gradient descent algorithm **(or similar). Computation is made from last layer, backward to the first layer.


In [19]:
class SimpleCNN(nn.Module):
    def __init__(self, in_channel=1): # default setting of channels to be 3 - color images
        super(SimpleCNN, self).__init__()

        # 28x28x1 => 26x26x32: why 26? (28-(3-1)=26), here we have stride=1, padding=0 (default setting)
        self.conv1 = nn.Conv2d(in_channel, out_channels=32, kernel_size=3)
        # we flatten as new num of channels * new_height * new_width without dimension of 128
        self.d1 = nn.Linear(26 * 26 * 32, 128)
        # another dense layer
        self.d2 = nn.Linear(128, 10)

    def forward(self, x, is_debug=True):
        # 1x1(3)x28x28 => 1x32x26x26
        if(is_debug):
          print("The input of the convolutional layer has a shape of:", x.shape, '\n')
        x = self.conv1(x)
        if(is_debug):
          print("The output of the convolutional layer has a shape of:", x.shape, '\n')
        x = F.relu(x) 
        if(is_debug):
          print("The output of the ReLU activation has a shape of:", x.shape, '\n')
        # flatten => 1 x (32*26*26)
        x = x.flatten(start_dim = 1)
        if(is_debug):
          print("The output of the flatten operation has a shape of:", x.shape, '\n')
        # 1 x (32*26*26) => 1 x 128
        x = self.d1(x)
        if(is_debug):
          print("The output of the first dense layer has a shape of:", x.shape, '\n')
        x = F.relu(x)
        if(is_debug):
          print("The output of the ReLU activation has a shape of:", x.shape, '\n')
        # logits => 1 x10
        logits = self.d2(x)
        if(is_debug):
          print("The output of the second dense layer has a shape of:", logits.shape, '\n')
        out = F.softmax(logits, dim=1)
        if(is_debug):
          print("The output of the softmax operation has a shape of:", out.shape, '\n')
        return out

### A Simple Test Case
MNIST contains grayscale images that have a height/width of 28x28 pixels.
Grayscale images have only a single color channel (RGB color images would have three color channels). 
Each individual image can therefore be represented as a `(1,28,28)` tensor.
But since we typically want to represent a set of images, we use a tensor whose first dimension we can use to index each image. 
So, even though we first want to just read in a single image, we use a tensor of shape `(1, 1, 28, 28)`



In [20]:
# a simple test case with a grayscale image
input_image = torch.rand((1, 1, 28, 28)) # (num_img, num_channel=1, height, width)
CNN = SimpleCNN(in_channel=1)
print('Final output shape:', CNN(input_image).shape) # the output shape should be (1,10)

The input of the convolutional layer has a shape of: torch.Size([1, 1, 28, 28]) 

The output of the convolutional layer has a shape of: torch.Size([1, 32, 26, 26]) 

The output of the ReLU activation has a shape of: torch.Size([1, 32, 26, 26]) 

The output of the flatten operation has a shape of: torch.Size([1, 21632]) 

The output of the first dense layer has a shape of: torch.Size([1, 128]) 

The output of the ReLU activation has a shape of: torch.Size([1, 128]) 

The output of the second dense layer has a shape of: torch.Size([1, 10]) 

The output of the softmax operation has a shape of: torch.Size([1, 10]) 

Final output shape: torch.Size([1, 10])


In [21]:
# a simple test case with a color image
input_image = torch.rand((1, 3, 28, 28)) # (num_img, num_channel=3, height, width)
CNN = SimpleCNN(in_channel=3)
print('Final output shape:', CNN(input_image).shape) # the output shape should be (1,10)

The input of the convolutional layer has a shape of: torch.Size([1, 3, 28, 28]) 

The output of the convolutional layer has a shape of: torch.Size([1, 32, 26, 26]) 

The output of the ReLU activation has a shape of: torch.Size([1, 32, 26, 26]) 

The output of the flatten operation has a shape of: torch.Size([1, 21632]) 

The output of the first dense layer has a shape of: torch.Size([1, 128]) 

The output of the ReLU activation has a shape of: torch.Size([1, 128]) 

The output of the second dense layer has a shape of: torch.Size([1, 10]) 

The output of the softmax operation has a shape of: torch.Size([1, 10]) 

Final output shape: torch.Size([1, 10])


### Train/Test a Neural Network in Pytorch
We use training and testing data of MINIST dataset to make you become familiar with the process of training a network.

#### Prepare MNIST Data

The first step before training the model is to import the data. We will use the [MNIST dataset](http://yann.lecun.com/exdb/mnist/) which is frequently used in deep learning.

Apart from importing the data, we will also do a few more things:
- We will **tranform** the data into tensors using the `transforms` module.
- We will use `DataLoader` to build **convenient data loaders** in Pytorch, which makes it easy to efficiently feed data in batches to the neural network. We will create **batches** of the data by setting the `batch` parameter inside the data loader. Notice we use batches of `32` in this tutorial but you can change it to other values if you like. We encourage you to experiment with different batch size values.

In [22]:
BATCH_SIZE = 32 # here we set the batch size to 32

# transformations
transform = transforms.Compose([transforms.ToTensor()])

# download and load training dataset
trainset = torchvision.datasets.MNIST(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=BATCH_SIZE,
                                          shuffle=True, num_workers=2)

# download and load testing dataset
testset = torchvision.datasets.MNIST(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=BATCH_SIZE,
                                         shuffle=False, num_workers=2)

#### Before Training the Model
We need to first set up a **loss function**, an **optimizer** and a function to **compute the accuracy** of the model. 
- Loss Function: cross entropy loss
- Optimizer: Adam

In [23]:
learning_rate = 0.001 # specify learning rate for the optimizer
num_epochs = 5 # training epochs

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") # device is to determine where the model
print('Using device:', device)
print('torch.cuda.is_available():', torch.cuda.is_available())

# will be trained on
model = SimpleCNN() # instantiate a object
model = model.to(device) # move the model to GPU (if we have)
criterion = nn.CrossEntropyLoss() # we use cross entropy loss here, which is common in classfication task
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate) # use Adam optimizer which is also useful

Using device: cuda:0
torch.cuda.is_available(): True


In [30]:
# Accuracy Function: it is calculating the number of correct predictions in a batch
def get_correct_num(logit, target):
    ''' Obtain accuracy for training round '''
    # print(f"logit: {logit} and target.size():{target.size()}")
    #
    # print(f"torch.max(logit, 1)[1].view(target.size()): {torch.max(logit, 1)[1].view(target.size())} and torch.max(logit, 1)[1].view(target.size()).data:{torch.max(logit, 1)[1].view(target.size()).data}")
    #
    # print(f"target.data: {target.data} and target:{target}")


    corrects = (torch.max(logit, 1)[1].view(target.size()).data == target.data).sum() # get the number of correct answers
    return corrects.item() # here .item() will change the tensor into a number

#### Training and Testing
- Training: We will train the model on training data for five epoches. For each epoch, we will print the accuracy  that this model has on the training fata.

In [32]:
 for epoch in range(num_epochs):
    # initialize the loss & correct count to record the training performance
    train_running_loss = 0.0
    train_correct = 0

    # set the model to the training mode
    model = model.train()

    # training step
    for i, (images, labels) in enumerate(trainloader): # loop through batches of the training data
        # print(f" is: {i}")
        # here we copy the images and labels to the device where we are doing training (GPU if available)
        images = images.to(device)
        labels = labels.to(device)

        # forward pass, we don't print out the debug info here
        logits = model(images, False)
        # after we get the logits (classification results), we use the criterion to compute the loss
        loss = criterion(logits, labels)
        # sets the gradients of all optimized torch.Tensor to zero
        optimizer.zero_grad()
        # this step will compute the gradiants of tensors in this network
        loss.backward()

        # update model params using the gradiants computed
        optimizer.step()

        # update training loss & correct count
        train_running_loss += loss.detach().item()
        train_correct += get_correct_num(logits, labels)

    model.eval() # set the model to the evaluation mode (we don't update params of the network)
    # print average loss and accuracy of the training data via correct counts / total num of the training data
    print('Epoch: %d | Loss: %.4f | Train Accuracy: %2f' \
          %(epoch, train_running_loss / i, train_correct/len(trainloader.dataset)))    

 is: 0
 is: 1
 is: 2
 is: 3
 is: 4
 is: 5
 is: 6
 is: 7
 is: 8
 is: 9
 is: 10
 is: 11
 is: 12
 is: 13
 is: 14
 is: 15
 is: 16
 is: 17
 is: 18
 is: 19
 is: 20
 is: 21
 is: 22
 is: 23
 is: 24
 is: 25
 is: 26
 is: 27
 is: 28
 is: 29
 is: 30
 is: 31
 is: 32
 is: 33
 is: 34
 is: 35
 is: 36
 is: 37
 is: 38
 is: 39
 is: 40
 is: 41
 is: 42
 is: 43
 is: 44
 is: 45
 is: 46
 is: 47
 is: 48
 is: 49
 is: 50
 is: 51
 is: 52
 is: 53
 is: 54
 is: 55
 is: 56
 is: 57
 is: 58
 is: 59
 is: 60
 is: 61
 is: 62
 is: 63
 is: 64
 is: 65
 is: 66
 is: 67
 is: 68
 is: 69
 is: 70
 is: 71
 is: 72
 is: 73
 is: 74
 is: 75
 is: 76
 is: 77
 is: 78
 is: 79
 is: 80
 is: 81
 is: 82
 is: 83
 is: 84
 is: 85
 is: 86
 is: 87
 is: 88
 is: 89
 is: 90
 is: 91
 is: 92
 is: 93
 is: 94
 is: 95
 is: 96
 is: 97
 is: 98
 is: 99
 is: 100
 is: 101
 is: 102
 is: 103
 is: 104
 is: 105
 is: 106
 is: 107
 is: 108
 is: 109
 is: 110
 is: 111
 is: 112
 is: 113
 is: 114
 is: 115
 is: 116
 is: 117
 is: 118
 is: 119
 is: 120
 is: 121
 is: 122
 is

KeyboardInterrupt: 

- Testing: We also compute the accuracy on the test set to see how well the model performs on the unseen data. As you can see below, our simple CNN model achieves a good performance on the MNIST classification task.

In [26]:
test_correct = 0 # set the correct count to zero
for i, (images, labels) in enumerate(testloader, 0): # loop through batches of testing data
    # same as the training process; we copy the data to the specified device
    images = images.to(device)
    labels = labels.to(device)
    # forward pass: to get the classification results
    outputs = model(images, False)
    # here we do not need to update the network params; just count the correct number of classification
    test_correct += get_correct_num(outputs, labels)

# print test accuracy via correct number of classification / the total number of the testing data
print('Test Accuracy: %2f'%( test_correct/len(testloader.dataset)))

Test Accuracy: 0.973000


## The End

Congrats! You have reached the end of this notebook. We hope you have understood the basic operations of Pytorch, the process of concstructing a neural network with detailed changes of the shape for each step,  as well as the process of training a CNN/DNN model. If you have any further questions about Pytorch, please feel free to read the following references.

The count of the current iteration
## References
- [Pytorch Tutorials](https://pytorch.org/tutorials/)
- [Pytorch Doc](https://pytorch.org/docs/stable/index.html)
- [Forward and Backword Pass](https://stackoverflow.com/questions/36740533/what-are-forward-and-backward-passes-in-neural-networks)
- [Another Useful Colab Notebook](https://colab.research.google.com/github/omarsar/pytorch_notebooks/blob/master/pytorch_quick_start.ipynb)