In [8]:
# -*- coding: utf-8 -*-
import logging
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import torch
import warnings

%matplotlib inline
pd.set_option('display.max_rows', 30)
pd.set_option('display.max_columns', None)
# pd.set_option('display.width', 1000)
warnings.filterwarnings('ignore')
# to avoid lots of INFO
logger = logging.getLogger()
logger.setLevel(logging.CRITICAL)

# Introduction to Pytorch - Topics Cover the Following:

- data structure

- forward pass/propagation (every pytorch code must define this function)

- customized loss function definition

# [data structure](https://www.google.com/search?q=scalar+vector+matrix+tensor&tbm=isch&ved=2ahUKEwjN3-WthO31AhWBat4KHb_QDJgQ2-cCegQIABAA&oq=scalar+vector+matrix+tensor&gs_lcp=CgNpbWcQAzIFCAAQgAQyBggAEAgQHlC7BVi7BWDwB2gAcAB4AIABOIgBZ5IBATKYAQCgAQGqAQtnd3Mtd2l6LWltZ8ABAQ&sclient=img&ei=gcUAYo2bN4HV-Qa_obPACQ&bih=914&biw=1920)

- [TENSORS](https://pytorch.org/tutorials/beginner/basics/tensorqs_tutorial.html#:~:text=Tensors%20are%20a%20specialized%20data,GPUs%20or%20other%20hardware%20accelerators.)
  > Tensors are a specialized data structure that are very similar to arrays and matrices. <font color='red'>In PyTorch, we use tensors to encode the inputs and outputs of a model, as well as the model’s parameters.</font>
  > <font color='red'>Tensors are similar to NumPy’s ndarrays</font>, except that tensors can run on GPUs or other hardware accelerators. 

- scaler: 0-way tensor (0d array)

- vector: 1-way tensor (1d array)

- matrix: 2-way tensor (2d array)

- tensor: 3-way tensor (3d array)

- 4-way tensor (4d array)

- 5-way tensor (5d array)

- 6-way tensor (6d array)

- N-way tensor (nd array -> numpy: ndarray)

<table>
    <tr>
        <td>
            <img src='https://149695847.v2.pressablecdn.com/wp-content/uploads/2022/02/tensor.png' width='300'/>
            <p style="text-align: center;"></p>
        </td>
        <td>
            <img src='https://www.researchgate.net/profile/Dmytro-Shulga/publication/332263806/figure/fig3/AS:745257088086019@1554694544935/Tensors-as-generalizations-of-scalars-vectors-and-matrices.png' width='600'/>
            <p style="text-align: center;"></p>
        </td>
        <td>
            <img src='https://i.ytimg.com/vi/ir-Eg684MR4/hqdefault.jpg' width='400'/>
            <p style="text-align: center;"></p>
        </td>
        <td>
            <img src='https://programmerall.com/images/540/3b/3bc254c1db9fd4e12862be62b227f55c.JPEG' width='400'/>
            <p style="text-align: center;"></p>
        </td>
        <td>
            <img src='https://d3i71xaburhd42.cloudfront.net/82750a1533ca30a705d3325290ee8de471073773/3-Figure3-1.png' width='500'/>
            <p style="text-align: center;"></p>
        </td>
</table>

<!-- <div>
    <img src='https://i.imgur.com/5kDiQyA.png' width='300'/>
</div> -->



<table>
        <td>
            <img src='https://i.imgur.com/5kDiQyA.png' width='300'/>
            <p style="text-align: center;"></p>
        </td>
        <td>
            <img src='https://www.researchgate.net/profile/Andrews-Sobral/publication/316967304/figure/fig18/AS:613918162563083@1523380904228/Figure-C4-Matricization-of-a-third-order-tensor-into-its-n-mode-matrices.png' width='300'/>
            <p style="text-align: center;">slicing the tensor from variaty of aspects</p>
        </td>
        <td>
            <img src='https://i1.momoshop.com.tw/1619092112/goodsimg/0006/210/835/6210835_R.webp' width='300'/>
            <p style="text-align: center;">slicing the tensor from variaty of aspects</p>
        </td>
</table>


## [introduction to vectors and tensors](https://biomechanics.stanford.edu/me338_12/me338_s03.pdf)

# [TENSORS](https://pytorch.org/tutorials/beginner/basics/tensorqs_tutorial.html#:~:text=Tensors%20are%20a%20specialized%20data,GPUs%20or%20other%20hardware%20accelerators.)

## Initializing a Tensor

- ### method 1: Directly from data

In [9]:
data = [[1, 2],[3, 4]]
x_data = torch.tensor(data)

print('''
type: {t}
length: {l}
data: {d}
'''.format(t=type(x_data), 
           l=len(x_data),
           d=x_data,
          )
)


type: <class 'torch.Tensor'>
length: 2
data: tensor([[1, 2],
        [3, 4]])



- ### method 2: From a NumPy array

In [10]:
np_array = np.array(data)
x_np = torch.from_numpy(np_array)


print('''
type: {t}
length: {l}
data: {d}
'''.format(t=type(x_np), 
           l=len(x_np),
           d=x_np,
          )
)


type: <class 'torch.Tensor'>
length: 2
data: tensor([[1, 2],
        [3, 4]])



- ### method 3: From another tensor

In [11]:
x_ones = torch.ones_like(x_data) # retains the properties of x_data
print(f"Ones Tensor: \n {x_ones} \n")

x_rand = torch.rand_like(x_data, dtype=torch.float) # overrides the datatype of x_data
print(f"Random Tensor: \n {x_rand} \n")

Ones Tensor: 
 tensor([[1, 1],
        [1, 1]]) 

Random Tensor: 
 tensor([[0.8741, 0.5798],
        [0.5316, 0.8774]]) 



- ### method 4: With random or constant values

> <font color='red'>shape</font> is a tuple of tensor dimensions, it determines the <font color='red'>dimensionality of the output tensor</font>.

In [22]:
# shape is a tuple of tensor dimensions, it determines the dimensionality of the output tensor.
shape = (2,3,)  # matrix: 2-way tensor (2d array)

rand_tensor = torch.rand(shape)
ones_tensor = torch.ones(shape)
zeros_tensor = torch.zeros(shape)

print(f"Random Tensor: \n {rand_tensor} \n")
print(f"Ones Tensor: \n {ones_tensor} \n")
print(f"Zeros Tensor: \n {zeros_tensor}")

Random Tensor: 
 tensor([[0.1789, 0.7120, 0.2848],
        [0.8818, 0.8797, 0.1874]]) 

Ones Tensor: 
 tensor([[1., 1., 1.],
        [1., 1., 1.]]) 

Zeros Tensor: 
 tensor([[0., 0., 0.],
        [0., 0., 0.]])


<table>
        <td>
            <img src='https://d3i71xaburhd42.cloudfront.net/82750a1533ca30a705d3325290ee8de471073773/3-Figure3-1.png' width='400'/>
            <p style="text-align: center;">[the code above shows] shape = (2,3,)  # matrix: 2-way tensor (2d array) -> thus it is a matrix</p>
            <p style="text-align: center;">[the code below shows] shape = (2,3,4)  # tensor: 3-way tensor (3d array) -> thus it is a tensor</p>
        </td>
</table>

In [23]:
# shape is a tuple of tensor dimensions, it determines the dimensionality of the output tensor.
shape = (2,3,4)  # tensor: 3-way tensor (3d array)

rand_tensor = torch.rand(shape)
ones_tensor = torch.ones(shape)
zeros_tensor = torch.zeros(shape)

print(f"Random Tensor: \n {rand_tensor} \n")
print(f"Ones Tensor: \n {ones_tensor} \n")
print(f"Zeros Tensor: \n {zeros_tensor}")

Random Tensor: 
 tensor([[[0.8240, 0.1243, 0.8565, 0.7867],
         [0.8658, 0.5476, 0.3869, 0.1415],
         [0.6293, 0.2940, 0.4930, 0.5771]],

        [[0.9283, 0.7408, 0.0589, 0.7307],
         [0.6613, 0.6725, 0.1387, 0.3936],
         [0.8548, 0.9496, 0.8170, 0.0944]]]) 

Ones Tensor: 
 tensor([[[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]],

        [[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]]) 

Zeros Tensor: 
 tensor([[[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]],

        [[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]]])


### Attributes of a Tensor
Tensor attributes describe their shape, datatype, and the device on which they are stored.

In [26]:
tensor = torch.rand(3,4)  # shape = (3,4,)  # matrix: 2-way tensor (2d array)

print('tensor:', tensor)
print(f"Shape of tensor: {tensor.shape}")
print(f"Datatype of tensor: {tensor.dtype}")
print(f"Device tensor is stored on: {tensor.device}")

tensor: tensor([[0.8053, 0.0289, 0.9167, 0.0460],
        [0.4463, 0.2402, 0.9492, 0.7410],
        [0.0938, 0.8324, 0.2899, 0.6986]])
Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu


### [Operations on Tensors](https://pytorch.org/docs/stable/torch.html)

- Over 100 tensor operations, including arithmetic, linear algebra, matrix manipulation (transposing, indexing, slicing), sampling and more are comprehensively described here.
> <font color='red'>By default, tensors are created on the CPU.</font> We need to explicitly move tensors to the GPU using .to method (after checking for GPU availability). <font color='red'>Keep in mind that copying large tensors across devices can be expensive in terms of time and memory!</font>

In [27]:
# We move our tensor to the GPU if available
if torch.cuda.is_available():
    tensor = tensor.to("cuda")

- ### Standard numpy-like indexing and slicing

In [31]:
tensor = torch.rand(3, 4)
print(f"tensor: {tensor}")
print(f"First row: {tensor[0]}")
print(f"First column: {tensor[:, 0]}")
print(f"Last column: {tensor[..., -1]}")
tensor[:,1] = 0
print(tensor)

tensor: tensor([[0.9726, 0.3014, 0.8923, 0.7406],
        [0.0715, 0.9165, 0.9960, 0.5859],
        [0.6690, 0.8470, 0.8853, 0.3305]])
First row: tensor([0.9726, 0.3014, 0.8923, 0.7406])
First column: tensor([0.9726, 0.0715, 0.6690])
Last column: tensor([0.7406, 0.5859, 0.3305])
tensor([[0.9726, 0.0000, 0.8923, 0.7406],
        [0.0715, 0.0000, 0.9960, 0.5859],
        [0.6690, 0.0000, 0.8853, 0.3305]])


In [53]:
tensor = torch.randint(low=0, high=10, size=(3, 4))
print('''

matrix: 2-way tensor (2d array)

tensor: {tensor}

# ===========================
#
# row representation
#
# ===========================
row_1: {row_1}
row_2: {row_2}
row_3: {row_3}

# ===========================
#
# column representation
#
# ===========================
col_1: {col_1}
col_2: {col_2}
col_3: {col_3}
col_4: {col_4}


# ===========================
#
# column representation 2
#
# ===========================
first column: {first_col}
last column: {last_col}
'''.format(tensor=tensor,
           row_1=tensor[0],
           row_2=tensor[1],
           row_3=tensor[2],
           
           col_1=tensor[:, 0],
           col_2=tensor[:, 1],
           col_3=tensor[:, 2],
           col_4=tensor[:, 3],
           
           first_col=tensor[..., 0],
           last_col=tensor[..., -1],
          )
)



matrix: 2-way tensor (2d array)

tensor: tensor([[1, 4, 3, 2],
        [2, 6, 0, 9],
        [7, 1, 8, 0]])

#
# row representation
#
row_1: tensor([1, 4, 3, 2])
row_2: tensor([2, 6, 0, 9])
row_3: tensor([7, 1, 8, 0])

#
# column representation
#
col_1: tensor([1, 2, 7])
col_2: tensor([4, 6, 1])
col_3: tensor([3, 0, 8])
col_4: tensor([2, 9, 0])


#
# column representation 2
#
first column: tensor([1, 2, 7])
last column: tensor([2, 9, 0])



<table>
        <td>
            <img src='https://www.researchgate.net/profile/Andrews-Sobral/publication/316967304/figure/fig18/AS:613918162563083@1523380904228/Figure-C4-Matricization-of-a-third-order-tensor-into-its-n-mode-matrices.png' width='600'/>
            <p style="text-align: center;">slicing the tensor from variaty of aspects</p>
        </td>
        <td>
            <img src='https://i1.momoshop.com.tw/1619092112/goodsimg/0006/210/835/6210835_R.webp' width='400'/>
            <p style="text-align: center;">slicing the tensor from variaty of aspects</p>
        </td>
</table>

In [67]:
tensor = torch.randint(low=0, high=10, size=(2, 3, 4))  # size=(2, 3, 4) -> (z, x, y)
print('''

tensor: 3-way tensor (3d array)

tensor: {tensor}

# ===========================
#
# plane representation: z-axis
#
# ===========================

mode_1_z_matrixplane_1: {mode_1_z_matrixplane_1}
mode_1_z_matrixplane_2: {mode_1_z_matrixplane_2}


# ===========================
#
# plane representation: x-axis
#
# ===========================

mode_2_x_matrixplane_1: {mode_2_x_matrixplane_1}
mode_2_x_matrixplane_2: {mode_2_x_matrixplane_2}
mode_2_x_matrixplane_3: {mode_2_x_matrixplane_3}

# ===========================
#
# plane representation: y-axis
#
# ===========================

mode_3_y_matrixplane_1: {mode_3_y_matrixplane_1}
mode_3_y_matrixplane_2: {mode_3_y_matrixplane_2}
mode_3_y_matrixplane_3: {mode_3_y_matrixplane_3}
mode_3_y_matrixplane_4: {mode_3_y_matrixplane_4}


# ===========================
#
# column representation 2
#
# ===========================

first z column: {first_z_col}  # tensor[0, ...] --> because shape=(z, x, y), it means we slice along the z-axis, the first plane of the z-axis
last z column: {last_z_col}  # tensor[-1, ...] --> because shape=(z, x, z), it means we slice along the z-axis, the last plane of the z-axis

first y column: {first_y_col}  # tensor[..., 0] --> because shape=(z, x, y), it means we slice along the y-axis, the first plane of the y-axis
last y column: {last_y_col}  # tensor[..., -1] --> because shape=(z, x, y), it means we slice along the y-axis, the last plane of the y-axis
'''.format(tensor=tensor,
           
           mode_1_z_matrixplane_1=tensor[0,:,:],
           mode_1_z_matrixplane_2=tensor[1,:,:],
           
           mode_2_x_matrixplane_1=tensor[:,0,:],
           mode_2_x_matrixplane_2=tensor[:,1,:],
           mode_2_x_matrixplane_3=tensor[:,2,:],
           
           mode_3_y_matrixplane_1=tensor[:,:,0],
           mode_3_y_matrixplane_2=tensor[:,:,1],
           mode_3_y_matrixplane_3=tensor[:,:,2],
           mode_3_y_matrixplane_4=tensor[:,:,3],
           

           first_z_col=tensor[0, ...],
           last_z_col=tensor[-1, ...],
           
           first_y_col=tensor[..., 0],
           last_y_col=tensor[..., -1],
          )
)



tensor: 3-way tensor (3d array)

tensor: tensor([[[1, 3, 5, 1],
         [6, 4, 6, 5],
         [4, 0, 8, 9]],

        [[6, 2, 7, 7],
         [2, 0, 7, 6],
         [9, 7, 5, 2]]])

#
# plane representation: z-axis
#

mode_1_z_matrixplane_1: tensor([[1, 3, 5, 1],
        [6, 4, 6, 5],
        [4, 0, 8, 9]])
mode_1_z_matrixplane_2: tensor([[6, 2, 7, 7],
        [2, 0, 7, 6],
        [9, 7, 5, 2]])


#
# plane representation: x-axis
#

mode_2_x_matrixplane_1: tensor([[1, 3, 5, 1],
        [6, 2, 7, 7]])
mode_2_x_matrixplane_2: tensor([[6, 4, 6, 5],
        [2, 0, 7, 6]])
mode_2_x_matrixplane_3: tensor([[4, 0, 8, 9],
        [9, 7, 5, 2]])

#
# plane representation: y-axis
#

mode_3_y_matrixplane_1: tensor([[1, 6, 4],
        [6, 2, 9]])
mode_3_y_matrixplane_2: tensor([[3, 4, 0],
        [2, 0, 7]])
mode_3_y_matrixplane_3: tensor([[5, 6, 8],
        [7, 7, 5]])
mode_3_y_matrixplane_4: tensor([[1, 5, 9],
        [7, 6, 2]])


#
# column representation 2
#

first z column: tensor([[1, 3

### [Joining tensors](https://pytorch.org/tutorials/beginner/basics/tensorqs_tutorial.html#:~:text=Tensors%20are%20a%20specialized%20data,GPUs%20or%20other%20hardware%20accelerators.)

- [torch.cat](https://pytorch.org/docs/stable/generated/torch.cat.html)

- [torch.stack](https://pytorch.org/docs/stable/generated/torch.stack.html)

In [80]:
print('tensor:', tensor)
t1 = torch.cat([tensor, tensor, tensor], dim=1)
print('t1.data.cpu().numpy():', t1.data.cpu().numpy())

tensor: tensor([[[1, 3, 5, 1],
         [6, 4, 6, 5],
         [4, 0, 8, 9]],

        [[6, 2, 7, 7],
         [2, 0, 7, 6],
         [9, 7, 5, 2]]])
t1.data.cpu().numpy(): [[[1 3 5 1]
  [6 4 6 5]
  [4 0 8 9]
  [1 3 5 1]
  [6 4 6 5]
  [4 0 8 9]
  [1 3 5 1]
  [6 4 6 5]
  [4 0 8 9]]

 [[6 2 7 7]
  [2 0 7 6]
  [9 7 5 2]
  [6 2 7 7]
  [2 0 7 6]
  [9 7 5 2]
  [6 2 7 7]
  [2 0 7 6]
  [9 7 5 2]]]


In [82]:
tensor_2 = torch.randint(low=0, high=10, size=(2, 5))  # size=(2, 5) -> (x, y)
print('tensor_2:', tensor_2)
t2 = torch.cat([tensor_2, tensor_2, tensor_2], dim=1)
print('t2:', t2)
print('t2.data.cpu().numpy():', t2.data.cpu().numpy())

tensor_2: tensor([[5, 5, 9, 0, 7],
        [4, 8, 2, 6, 3]])
t2: tensor([[5, 5, 9, 0, 7, 5, 5, 9, 0, 7, 5, 5, 9, 0, 7],
        [4, 8, 2, 6, 3, 4, 8, 2, 6, 3, 4, 8, 2, 6, 3]])
t2.data.cpu().numpy(): [[5 5 9 0 7 5 5 9 0 7 5 5 9 0 7]
 [4 8 2 6 3 4 8 2 6 3 4 8 2 6 3]]


## Arithmetic operations

In [94]:
tensor = torch.randint(low=0, high=10, size=(2, 2))
print('tensor:', tensor)
y1 = tensor @ tensor.T  # @: Matrix product of two tensors
y2 = tensor.matmul(tensor.T)

print('''
y1: {y1}
y2: {y2}
y1 == y2: {y1_y2}
'''.format(y1=y1,
           y2=y2,
           y1_y2=(y1 == y2)
          )
)


tensor: tensor([[4, 1],
        [6, 2]])

y1: tensor([[17, 26],
        [26, 40]])
y2: tensor([[17, 26],
        [26, 40]])
y1 == y2: tensor([[True, True],
        [True, True]])



### Matrix product of two tensors
- matmul
- @

In [156]:
'''
tensor = torch.randint(low=0, high=10, size=(4, 4))

RuntimeError: "check_uniform_bounds" not implemented for 'Long'
---> 13 y3 = torch.rand_like(tensor)

because of "randint" is used before
'''
# tensor = torch.ones(4, 4)
tensor = torch.rand(size=(4, 4))
print('tensor:', tensor)


# This computes the matrix multiplication between two tensors. y1, y2, y3 will have the same value
y1 = tensor @ tensor.T
y2 = tensor.matmul(tensor.T)  # matmul: Matrix product of two tensors

y3 = torch.rand_like(tensor)
print('''
y1: {y1}
y2: {y2}
y3: {y3}
'''.format(y1=y1,
           y2=y2,
           y3=y3
          )
)
torch.matmul(tensor, tensor.T, out=y3)  # matmul: Matrix product of two tensors
# torch.matmul(tensor, tensor.T).size()

tensor: tensor([[0.2201, 0.2851, 0.4248, 0.2401],
        [0.0553, 0.6028, 0.3875, 0.3088],
        [0.3880, 0.2262, 0.2308, 0.1486],
        [0.1612, 0.9137, 0.1006, 0.9780]])

y1: tensor([[0.3679, 0.4228, 0.2836, 0.5735],
        [0.4228, 0.6119, 0.2931, 0.9007],
        [0.2836, 0.2931, 0.2771, 0.4377],
        [0.5735, 0.9007, 0.4377, 1.8275]])
y2: tensor([[0.3679, 0.4228, 0.2836, 0.5735],
        [0.4228, 0.6119, 0.2931, 0.9007],
        [0.2836, 0.2931, 0.2771, 0.4377],
        [0.5735, 0.9007, 0.4377, 1.8275]])
y3: tensor([[0.2243, 0.1665, 0.6568, 0.2155],
        [0.8517, 0.4761, 0.9782, 0.3259],
        [0.1360, 0.8396, 0.4618, 0.9151],
        [0.1034, 0.0260, 0.5967, 0.1920]])



tensor([[0.3679, 0.4228, 0.2836, 0.5735],
        [0.4228, 0.6119, 0.2931, 0.9007],
        [0.2836, 0.2931, 0.2771, 0.4377],
        [0.5735, 0.9007, 0.4377, 1.8275]])

### element-wise product
- mul
- *

In [191]:
# tensor = torch.ones(4, 4)
tensor = torch.rand(size=(4, 4))
# tensor = torch.randint(low=0, high=10, size=(4, 4))
print('tensor:', tensor)

# This computes the element-wise product. z1, z2, z3 will have the same value
z1 = tensor * tensor
z2 = tensor.mul(tensor)

z3 = torch.rand_like(tensor)
print('''
z1: {z1}
z2: {z2}
z3: {z3}
'''.format(z1=z1,
           z2=z2,
           z3=z3
          )
)

torch.mul(tensor, tensor, out=z3)

tensor: tensor([[0.7769, 0.8565, 0.7920, 0.0745],
        [0.0243, 0.4274, 0.6382, 0.8787],
        [0.5015, 0.0647, 0.7550, 0.5581],
        [0.1056, 0.2139, 0.4019, 0.1910]])

z1: tensor([[6.0358e-01, 7.3365e-01, 6.2726e-01, 5.5575e-03],
        [5.8874e-04, 1.8270e-01, 4.0736e-01, 7.7210e-01],
        [2.5147e-01, 4.1887e-03, 5.7007e-01, 3.1153e-01],
        [1.1153e-02, 4.5757e-02, 1.6153e-01, 3.6469e-02]])
z2: tensor([[6.0358e-01, 7.3365e-01, 6.2726e-01, 5.5575e-03],
        [5.8874e-04, 1.8270e-01, 4.0736e-01, 7.7210e-01],
        [2.5147e-01, 4.1887e-03, 5.7007e-01, 3.1153e-01],
        [1.1153e-02, 4.5757e-02, 1.6153e-01, 3.6469e-02]])
z3: tensor([[0.8595, 0.2970, 0.8870, 0.2612],
        [0.1352, 0.8120, 0.4587, 0.4407],
        [0.1438, 0.5920, 0.5600, 0.1763],
        [0.1427, 0.0432, 0.3534, 0.2297]])



tensor([[6.0358e-01, 7.3365e-01, 6.2726e-01, 5.5575e-03],
        [5.8874e-04, 1.8270e-01, 4.0736e-01, 7.7210e-01],
        [2.5147e-01, 4.1887e-03, 5.7007e-01, 3.1153e-01],
        [1.1153e-02, 4.5757e-02, 1.6153e-01, 3.6469e-02]])

### Single-element tensors 

- .item(): convert a one-element tensor to a Python numerical value

In [192]:
agg = tensor.sum()
agg_item = agg.item()  # convert it to a Python numerical value
print('agg: {d}, type: {t}'.format(d=agg, t=type(agg)))
print('agg_item: {d}, type: {t}'.format(d=agg_item, t=type(agg_item)))

agg: 7.260379791259766, type: <class 'torch.Tensor'>
agg_item: 7.260379791259766, type: <class 'float'>


### <font color='red'>In-place operations</font>

Operations that store the result into the operand are called in-place. They are denoted by a <font color='red'>_ suffix</font>. For example: x.<font color='red'>copy_</font>(y), x.<font color='red'>t_</font>(), <font color='red'>will change x</font>.

In [193]:
print(f"tensor before being added: \n{tensor} \n")
tensor.add_(5)
print(f"tensor after being added: \n{tensor} \n")

tensor before being added: 
tensor([[0.7769, 0.8565, 0.7920, 0.0745],
        [0.0243, 0.4274, 0.6382, 0.8787],
        [0.5015, 0.0647, 0.7550, 0.5581],
        [0.1056, 0.2139, 0.4019, 0.1910]]) 

tensor after being added: 
tensor([[5.7769, 5.8565, 5.7920, 5.0745],
        [5.0243, 5.4274, 5.6382, 5.8787],
        [5.5015, 5.0647, 5.7550, 5.5581],
        [5.1056, 5.2139, 5.4019, 5.1910]]) 



## Bridge with NumPy

<font color='red'>Tensors on the CPU and NumPy arrays can share their underlying memory locations, and changing one will change the other.</font>

### Tensor to NumPy array

```
- my_tensor.numpy()

- A change in the tensor reflects in the NumPy array
  - my_tensor.add_(1)
```

In [195]:
t = torch.ones(5)
print(f"t: {t}, type: {type(t)} \n")
n = t.numpy()
print(f"n: {n}, type: {type(n)} \n")

t: tensor([1., 1., 1., 1., 1.]), type: <class 'torch.Tensor'> 

n: [1. 1. 1. 1. 1.], type: <class 'numpy.ndarray'> 



- A change in the tensor reflects in the NumPy array.

In [196]:
t.add_(1)  # A change in the tensor reflects in the NumPy array.

print(f"t: {t}")
print(f"n: {n}")

t: tensor([2., 2., 2., 2., 2.])
n: [2. 2. 2. 2. 2.]


### NumPy array to Tensor

```
- torch.from_numpy(my_ndarray)

- Changes in the NumPy array reflects in the tensor
  - np.add(my_ndarray, 1, out=my_ndarray)
```

In [197]:
n = np.ones(5)
t = torch.from_numpy(n)

# Changes in the NumPy array reflects in the tensor.
np.add(n, 1, out=n)

print(f"t: {t}, type: {type(t)} \n")
print(f"n: {n}, type: {type(n)} \n")

t: tensor([2., 2., 2., 2., 2.], dtype=torch.float64)
n: [2. 2. 2. 2. 2.]


# BUILD THE NEURAL NETWORK

In [1]:
import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

In [2]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f'Using {device} device')

Using cpu device


## Define the Class

We define our neural network by subclassing `nn.Module`, and initialize the neural network layers in `__init__`. <font color='red'>Every nn.Module subclass implements the operations on input data in the forward method.</font>

In [4]:
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

We create an instance of NeuralNetwork, and move it to the device, and print its structure.

In [5]:
model = NeuralNetwork().to(device)
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


To use the model, we pass it the input data. This executes the model’s forward, along with some background operations. <font color='red'>Do not call model.forward() directly!</font>

Calling the model on the input returns a 10-dimensional tensor with raw predicted values for each class. We get the prediction probabilities by passing it through an instance of the nn.Softmax module.

In [6]:
X = torch.rand(1, 28, 28, device=device)
logits = model(X)
pred_probab = nn.Softmax(dim=1)(logits)
y_pred = pred_probab.argmax(1)
print(f"Predicted class: {y_pred}")

Predicted class: tensor([5])


## Model Layers

Let’s break down the layers in the FashionMNIST model. To illustrate it, we will take a sample <font color='red'>minibatch of 3 images of size 28x28</font> and see what happens to it as we pass it through the network.

In [7]:
input_image = torch.rand(3,28,28)
print(input_image.size())

torch.Size([3, 28, 28])


## nn.Flatten
We initialize the nn.Flatten layer to <font color='red'>convert each 2D 28x28 image into a contiguous array of 784 pixel values </font>( the minibatch dimension (at dim=0) is maintained).

In [8]:
flatten = nn.Flatten()
flat_image = flatten(input_image)
print(flat_image.size())

torch.Size([3, 784])


## nn.Linear
T
he linear layer is a module that <font color='red'>applies a linear transformation on the input using its stored weights and biases.</font>

In [9]:
layer1 = nn.Linear(in_features=28*28, out_features=20)
hidden1 = layer1(flat_image)
print(hidden1.size())

torch.Size([3, 20])


## nn.ReLU
Non-linear activations are what <font color='red'>create the complex mappings</font> between the model’s inputs and outputs. They are applied after linear transformations <font color='red'>to introduce nonlinearity, helping neural networks learn a wide variety of phenomena.</font>

In this model, we use nn.ReLU between our linear layers, but there’s other <font color='red'>activations</font> to introduce non-linearity in your model.

In [10]:
print(f"Before ReLU: {hidden1}\n\n")
hidden1 = nn.ReLU()(hidden1)
print(f"After ReLU: {hidden1}")

Before ReLU: tensor([[ 0.3708,  0.2019,  0.1015,  0.8094,  0.0389,  0.4796,  0.2315, -0.3376,
         -0.1430,  0.4016,  0.4047, -0.3876, -0.2334,  0.2890,  0.1611, -0.2293,
         -0.2058, -0.1435, -0.3682, -0.3675],
        [ 0.8131,  0.2989,  0.0363,  0.8490,  0.1033,  0.2025,  0.1335, -0.0600,
          0.2642,  0.1572,  0.2741, -0.3090, -0.1181,  0.0534,  0.4741,  0.1215,
         -0.1490, -0.3422, -0.2155, -0.6833],
        [ 0.5557,  0.2995,  0.0977,  0.6628,  0.1940,  0.1078, -0.2695, -0.1404,
          0.1489,  0.3265,  0.2171, -0.3534, -0.3651, -0.0026,  0.1436, -0.4192,
          0.0250, -0.1984, -0.1768, -0.0337]], grad_fn=<AddmmBackward0>)


After ReLU: tensor([[0.3708, 0.2019, 0.1015, 0.8094, 0.0389, 0.4796, 0.2315, 0.0000, 0.0000,
         0.4016, 0.4047, 0.0000, 0.0000, 0.2890, 0.1611, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000],
        [0.8131, 0.2989, 0.0363, 0.8490, 0.1033, 0.2025, 0.1335, 0.0000, 0.2642,
         0.1572, 0.2741, 0.0000, 0.0000, 0.0534, 0.47

## nn.Sequential

nn.Sequential is <font color='red'>an ordered container of modules.</font> The data is passed through all the modules in the same order as defined. You can use sequential containers to put together a quick network like seq_modules.

In [11]:
seq_modules = nn.Sequential(
    flatten,
    layer1,
    nn.ReLU(),
    nn.Linear(20, 10)
)
input_image = torch.rand(3,28,28)
logits = seq_modules(input_image)

## nn.Softmax

<font color='red'>The last linear layer of the neural network returns logits</font> - raw values in [-infty, infty] - which <font color='red'>are passed to the nn.Softmax module. The logits are scaled to values [0, 1]</font> representing the model’s predicted probabilities for each class. dim parameter indicates the dimension along which the values must sum to 1.

In [12]:
softmax = nn.Softmax(dim=1)
pred_probab = softmax(logits)

## Model Parameters

Many layers inside a neural network are parameterized, i.e. have associated <font color='red'>weights</font> and <font color='red'>biases</font> that <font color='red'>are optimized during training</font>. <font color='red'>Subclassing nn.Module automatically tracks all fields defined inside your model object</font>, and makes all parameters accessible using your model’s parameters() or named_parameters() methods.

In this example, we iterate over each parameter, and print its size and a preview of its values.

In [13]:
print("Model structure: ", model, "\n\n")

for name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")

Model structure:  NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
) 


Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values : tensor([[ 0.0019,  0.0347, -0.0247,  ...,  0.0323, -0.0294, -0.0063],
        [-0.0310,  0.0093,  0.0222,  ...,  0.0009,  0.0174,  0.0146]],
       grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values : tensor([-0.0053,  0.0277], grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values : tensor([[-0.0222, -0.0249,  0.0052,  ..., -0.0216, -0.0014, -0.0166],
        [ 0.0180,  0.0270, -0.0217,  ..., -0.0031,  0.0154,  0.0344]],
       grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.2.bias 

# AUTOMATIC DIFFERENTIATION WITH TORCH.AUTOGRAD

## Tensors, Functions and Computational graph


In this network, w and b are parameters, which we need to optimize. Thus, we need to be able to compute the gradients of loss function with respect to those variables. In order to do that, we set the requires_grad property of those tensors.

<table>
        <td>
            <img src='https://pytorch.org/tutorials/_images/comp-graph.png' width='800'/>
            <p style="text-align: center;">computational graph</p>
        </td>
</table>

In [200]:
import torch

x = torch.ones(5)  # input tensor
y = torch.zeros(3)  # expected output
# ===============================================
#
# In this network, w and b are parameters, which we need to optimize. Thus, we need to be able to compute the gradients of loss function with respect to those variables.
#
w = torch.randn(5, 3, requires_grad=True)  # requires_grad=True
b = torch.randn(3, requires_grad=True)     # requires_grad=True
#
#
# ===============================================
z = torch.matmul(x, w)+b

loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)

print(f"x: {x}, type: {type(x)} \n")
print(f"y: {y}, type: {type(y)} \n")
print(f"w: {w}, type: {type(w)} \n")
print(f"b: {b}, type: {type(b)} \n")
print(f"z: {z}, type: {type(z)} \n")
print(f"loss: {loss}, type: {type(loss)} \n")

x: tensor([1., 1., 1., 1., 1.]), type: <class 'torch.Tensor'> 

y: tensor([0., 0., 0.]), type: <class 'torch.Tensor'> 

w: tensor([[ 8.6636e-01,  1.4533e-02,  6.2713e-02],
        [-5.0381e-01,  5.1424e-04,  4.5244e-01],
        [-5.4447e-01,  1.1528e+00, -1.7528e-01],
        [ 8.6750e-01, -1.0791e+00, -1.1617e+00],
        [-9.2799e-03, -2.8656e+00, -3.0130e-01]], requires_grad=True), type: <class 'torch.Tensor'> 

b: tensor([1.7421, 1.2407, 0.2888], requires_grad=True), type: <class 'torch.Tensor'> 

z: tensor([ 2.4184, -1.5362, -0.8344], grad_fn=<AddBackward0>), type: <class 'torch.Tensor'> 

loss: 1.0197442770004272, type: <class 'torch.Tensor'> 



In [201]:
print('Gradient function for z =', z.grad_fn)
print('Gradient function for loss =', loss.grad_fn)

Gradient function for z = <AddBackward0 object at 0x136b664c0>
Gradient function for loss = <BinaryCrossEntropyWithLogitsBackward0 object at 0x136c6a250>



## [CLASStorch.autograd.Function(*args, **kwargs)](https://pytorch.org/docs/stable/autograd.html#function) | [SOURCE](https://pytorch.org/docs/stable/_modules/torch/autograd/function.html#Function)

Base class to create custom autograd.Function


To create a custom autograd.Function, subclass this class and implement the forward() and :math`backward` static methods. Then, to use your custom op in the forward pass, call the class method apply. Do not call forward() directly.

To ensure correctness and best performance, make sure you are calling the correct methods on ctx and validating your backward function using torch.autograd.gradcheck().

See Extending torch.autograd for more details on how to use this class.

Examples:

In [None]:
class Exp(Function):
    @staticmethod
    def forward(ctx, i):
        result = i.exp()
        ctx.save_for_backward(result)
        return result

    @staticmethod
    def backward(ctx, grad_output):
        result, = ctx.saved_tensors
        return grad_output * result

# Use it by calling the apply method:
output = Exp.apply(input)

## Computing Gradients

To optimize weights of parameters in the neural network, we need to compute the derivatives of our loss function with respect to parameters, namely, we need $\frac{\partial loss}{\partial w}$ and $\frac{\partial loss}{\partial b}$ under some fixed values of `x` and `y`. To compute those derivatives, we call `loss.backward()`, and then retrieve the values from `w.grad` and `b.grad`:

In [202]:
loss.backward()
print(w.grad)
print(b.grad)

tensor([[0.3061, 0.0590, 0.1009],
        [0.3061, 0.0590, 0.1009],
        [0.3061, 0.0590, 0.1009],
        [0.3061, 0.0590, 0.1009],
        [0.3061, 0.0590, 0.1009]])
tensor([0.3061, 0.0590, 0.1009])


## Disabling Gradient Tracking

<font color='red'>By default, all tensors with requires_grad=True are tracking their computational history and support gradient computation.</font> However, there are some cases when we do not need to do that, for example, <font color='red'>when we have trained the model and just want to apply it to some input data, i.e. we only want to do forward computations through the network.</font> 
    
- method 1: We can <font color='red'>stop tracking computations</font> by surrounding our computation code with `torch.no_grad()` block:

- method 2: Another way to achieve the same result is to use the `detach()` method on the tensor:


--

There are reasons you might want to disable gradient tracking:

- To <font color='red'>mark some parameters</font> in your neural network <font color='red'>as frozen parameters</font>. This is a very common scenario <font color='red'>for [finetuning a pretrained network](https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html).</font>

- To <font color='red'>speed up computations when you are only doing forward pass</font>, because computations on tensors that do not track gradients would be more efficient.


In [203]:
z = torch.matmul(x, w)+b
print(z.requires_grad)

with torch.no_grad():
    z = torch.matmul(x, w)+b
print(z.requires_grad)

True
False


In [204]:
z = torch.matmul(x, w)+b
z_det = z.detach()
print(z_det.requires_grad)

False
