# CSED490E: PyTorch, Neural Network Tutorial

Author: Jihyoung Jang (jihyoung@postech.ac.kr)

In this tutorial, we will cover the basic concepts of PyTorch and learn how to implement a simple neural network.


## Part 0: Check GPU status
Why do we put "!" before the below command?

In [1]:
!nvidia-smi

Tue Nov 18 16:28:22 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 581.80                 Driver Version: 581.80         CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA GeForce RTX 5060 ...  WDDM  |   00000000:01:00.0 Off |                  N/A |
| N/A   41C    P8              6W /   42W |       0MiB /   8151MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+----------------------------------------------

## Part 1: Load PyTorch
PyTorch is one of the deep learning frameworks and is already installed in Google Colab.

In [None]:
#!pip install torch

In [2]:
import torch

In [3]:
print(torch.__version__)

2.10.0.dev20251114+cu128


## Part 2: Basic Tensor Manipulation

Tensor is PyTorch's basic data structure (i.e., basic buliding block). Each tensor is a multi-dimensional matrix.

In [4]:
t0 = torch.zeros(1, 3)
print(t0)

tensor([[0., 0., 0.]])


In [5]:
t1 = torch.tensor([0, 1, 2, 3, 4])
t2 = torch.tensor([5 ,6 ,7, 8, 9])
print(t1)
print(t2)

tensor([0, 1, 2, 3, 4])
tensor([5, 6, 7, 8, 9])


In [7]:
print(t1.dim())
print(t1.shape)
print(t1.size())

1
torch.Size([5])
torch.Size([5])


In [8]:
t_cat = torch.cat([t1, t2], dim=0)
print(t_cat)

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])


In [9]:
print(t_cat.dim())
print(t_cat.shape)
print(t_cat.size())

1
torch.Size([10])
torch.Size([10])


In [10]:
t_reshpae1 = t_cat.view(5, 2)
print(t_reshpae1)
print(t_reshpae1.dim())
print(t_reshpae1.shape)

tensor([[0, 1],
        [2, 3],
        [4, 5],
        [6, 7],
        [8, 9]])
2
torch.Size([5, 2])


### Exercise
Write the code that creates simple tensor with the following example:
```
tensor([[0, 1, 2, 3, 4],
        [5, 6, 7, 8, 9]])
```

In [16]:
##### TODO #####
t1 = torch.tensor([0,1,2,3,4])
t2 = torch.tensor([5,6,7,8,9])

t_cat = torch.cat([t1, t2], dim = 0)
t_cat_reshape = t_cat.view(2,5)
print(t_cat_reshape)


tensor([[0, 1, 2, 3, 4],
        [5, 6, 7, 8, 9]])


In [14]:
t3 = torch.tensor([1, 2, 3])
t3_repeat = t3.repeat(3)

print(t3)
print(t3_repeat)

tensor([1, 2, 3])
tensor([1, 2, 3, 1, 2, 3, 1, 2, 3])


### Exercise
Write the code that creates simple tensor with the following example:
```
tensor([[1, 2, 3],
        [1, 2, 3],
        [1, 2, 3]])
```

In [35]:
##### TODO #####
t3 = torch.tensor([1, 2, 3])
t3_repeat = t3.repeat(3, 1)

print(t3)
print(t3_repeat)

tensor([1, 2, 3])
tensor([[1, 2, 3],
        [1, 2, 3],
        [1, 2, 3]])


In [36]:
t3_stack = torch.stack([t3, t3, t3])

print(t3_stack)

tensor([[1, 2, 3],
        [1, 2, 3],
        [1, 2, 3]])


In [28]:
t4 = torch.tensor([
                  [1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9]
                  ], dtype=torch.float32)
print(t4)
print(t4.dtype)

tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]])
torch.float32


In [29]:
t4[0]

tensor([1., 2., 3.])

In [30]:
t4[0, :]

tensor([1., 2., 3.])

In [31]:
t4[:, 0]

tensor([1., 4., 7.])

In [38]:
t4[0:3]

tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]])

In [33]:
t4[:]

tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]])

In [34]:
t4[:, 2]

tensor([3., 6., 9.])

### Exercise
Write the code that creates simple tensor with the following example:
```
tensor([[5., 6.],
        [8., 9.]])

```

In [42]:
##### TODO #####
ten1 = torch.tensor([5,6],dtype=torch.float32)
ten2 = torch.tensor([8,9],dtype=torch.float32)
ten_stk = torch.stack([ten1, ten2])
print(ten_stk)


tensor([[5., 6.],
        [8., 9.]])


## Part 3: Tensor on GPU

PyTorch supports GPU computation. When utilizing GPU operations, it is important to ensure that all models and tensors are located on the same device.

In [43]:
print(torch.cuda.is_available())

True


In [44]:
a = torch.ones(3)
b = torch.randn(100, 50, 3)

In [45]:
a

tensor([1., 1., 1.])

In [46]:
b

tensor([[[ 1.0004,  1.3208, -1.2359],
         [-0.5151,  1.0715, -0.6017],
         [-0.7519, -0.4552,  0.6591],
         ...,
         [-1.6310,  1.6737, -1.6257],
         [-0.2607,  0.0191, -0.5513],
         [-0.9569, -0.3819,  0.2696]],

        [[-1.8964, -0.5942,  0.1271],
         [-2.0775,  1.6809, -0.8235],
         [-0.3757, -0.6300, -0.2146],
         ...,
         [-0.8041,  0.7808, -0.5693],
         [-0.6878, -0.1652,  0.0907],
         [ 1.9686, -0.2308,  1.6541]],

        [[ 0.4004, -0.3538,  2.6770],
         [ 2.4209, -1.6492, -0.5618],
         [ 1.5618, -0.1026,  0.8020],
         ...,
         [-0.8695,  1.3980,  0.5167],
         [-0.8372,  1.3514, -0.3146],
         [-0.7265, -1.2179,  0.2519]],

        ...,

        [[ 0.2222,  0.8327, -0.2556],
         [-0.8597, -0.5266,  0.7687],
         [ 0.3089,  0.6050,  0.8360],
         ...,
         [ 0.2669, -0.6977,  0.2062],
         [-1.5184,  0.3220, -0.1703],
         [ 0.1287, -0.0298,  0.2551]],

        [[

In [47]:
print(b.size())

torch.Size([100, 50, 3])


In [48]:
print(a.device)
print(b.device)

cpu
cpu


In [49]:
c = a + b

In [50]:
print(c.device)

cpu


In [51]:
# upload a and b to GPU
a = a.to('cuda')
b = b.to('cuda')

In [52]:
print(a.device)
print(b.device)

cuda:0
cuda:0


In [53]:
c = c.to('cpu')

In [54]:
print(c.device)

cpu


In [55]:
e = torch.ones(3).to('cuda')
d = torch.ones(3)

In [56]:
print(e.device)
print(d.device)

cuda:0
cpu


### Short Quiz
Why does the below code return an error message?

A : If you want to use tensors and models, they are in same device. However, 'd' is in CPU and 'e' is in GPU. So, it cannot be calculated.

In [57]:
f = e + d

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

## Part 4: Autograd
PyTorch is famous for its ability to automatically calculate gradients. We can simply use the `backward()` method to tell PyTorch to figure out these gradients for us, and it stores them in the grad attribute for later use.






In [64]:
x = torch.tensor([2.], requires_grad=True)
print(x)

tensor([2.], requires_grad=True)


Why None? What do you think about this result?

A : The backward() method which is backpropagation operation calculates and stores the gradient. However, it hasn't been excuted.

In [59]:
print(x.grad)

None


$y=x^2 \times 3$

$x.grad = \frac{d(y)}{d(x)} = \frac{d(3x^2)}{d(x)} = 6x = 12$

In [65]:
y = x * x * 3
y.backward()

print(x.grad)

tensor([12.])


In [66]:
z = x * x * 3
z.backward()

print(x.grad)

tensor([24.])


### Exercise

In [67]:
x = torch.ones(2, 2, requires_grad=True)

print(x)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)


In [68]:
y = x + 2

print(y)

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)


In [69]:
z = y * y * 3
print(z)

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>)


In [70]:
out = z.mean()
print(out)

tensor(27., grad_fn=<MeanBackward0>)


In [71]:
y.retain_grad()
z.retain_grad()
out.backward()

In [72]:
print(z.grad)

tensor([[0.2500, 0.2500],
        [0.2500, 0.2500]])


In [73]:
print(y.grad)

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])


In [None]:
print(x.grad)

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])


In [74]:
with torch.no_grad():
  x = torch.ones(2, 2, requires_grad=True)
  y = x + 2
  z = y * y * 3
  out = z.mean()

In [75]:
out

tensor(27.)

### Short Quiz
Why does the below code return an error message?

A : The torch.no_grad() is deactivate gradient tracking. x is requires_grad=True. However, (y, z) are result of operations on x. So, it cannot use backward operation.

In [76]:
out.backward()

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

## Part 5: nn.Module
`torch.nn.Module` is the fundamental base class for all Neural Networks in PyTorch. A module can contain other modules, allowing us to create complex neural network architectures in a structured and organized manner. These modules can be organized in a hierarchical or tree-like structure, enabling us to build and manage deep and intricate neural networks efficiently.






In [77]:
import torch.nn as nn

X = torch.tensor([[1., 2., 3., 4.], [5., 6., 7., 8.]])

print(X)
print(X.shape)

tensor([[1., 2., 3., 4.],
        [5., 6., 7., 8.]])
torch.Size([2, 4])


In [86]:
linear_fn = nn.Linear(4, 1)

In [79]:
linear_fn

Linear(in_features=4, out_features=1, bias=True)

In [87]:
Y = linear_fn(X)

print(Y)
print(Y.shape)

tensor([[0.4012],
        [1.4559]], grad_fn=<AddmmBackward0>)
torch.Size([2, 1])


## How to design our own model?

In [88]:
class Model(nn.Module):
  def __init__(self, input_dim, output_dim, hidden_dim):
    super(Model, self).__init__()
    self.linear_1 = nn.Linear(input_dim, hidden_dim)
    self.linear_2 = nn.Linear(hidden_dim, output_dim)
    self.relu = nn.ReLU()
  def forward(self, x):
    x = self.linear_1(x)
    x = self.relu(x)
    x = self.linear_2(x)
    return x

## Part 6: MLP

In [89]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

from torch.utils.data import DataLoader

import torchvision
import torchvision.transforms as transforms

In [90]:
train_dataset = torchvision.datasets.MNIST(root='./', train=True, transform=transforms.ToTensor(), download=True)
test_dataset = torchvision.datasets.MNIST(root='./', train=False, transform=transforms.ToTensor())

train_loader = DataLoader(dataset=train_dataset, batch_size=128, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=128, shuffle=False)

100.0%
100.0%
100.0%
100.0%


### Exercise
  Write the code that builds simple neural network model with the following architecture:
```
NeuralNet(
  (fc1): Linear(in_features=784, out_features=20, bias=True)
  (fc2): Linear(in_features=20, out_features=20, bias=True)
  (fc3): Linear(in_features=20, out_features=10, bias=True)
  (sigmoid): Sigmoid()
)
```

In [91]:
class NeuralNet(nn.Module):
  def __init__(self, input_size, hidden_size, output_size):
    super(NeuralNet, self).__init__()
    ##### TODO #####
    self.fc1 = nn.Linear(input_size, hidden_size)
    self.fc2 = nn.Linear(hidden_size, hidden_size)
    self.fc3 = nn.Linear(hidden_size, output_size)
    self.sigmoid = nn.Sigmoid()

  def forward(self, x):
    ##### TODO #####
    x = self.fc1(x)
    x = self.fc2(x)
    x = self.fc3(x)
    out = self.sigmoid(x)

    return out

In [None]:
# input dim - 784
# hidden dim - 20
# output dim - 10 (why?)
# A : Images are classified into 10 classes (0~9 digits)

model = NeuralNet(784, 20, 10)
print(model)
model = model.to('cuda')
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.05)

NeuralNet(
  (fc1): Linear(in_features=784, out_features=20, bias=True)
  (fc2): Linear(in_features=20, out_features=20, bias=True)
  (fc3): Linear(in_features=20, out_features=10, bias=True)
  (sigmoid): Sigmoid()
)


In [94]:
total_step = len(train_loader)

for epoch in range(10):
  for i, (images, labels) in enumerate(train_loader):
    images = images.reshape(-1, 28*28).to('cuda')
    labels = labels.to('cuda')

    outputs = model(images)
    loss = criterion(outputs, labels)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if (i+1) % 100 == 0:
      print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'.format(epoch+1, 10, i+1, total_step, loss.item()))

Epoch [1/10], Step [100/469], Loss: 1.6267
Epoch [1/10], Step [200/469], Loss: 1.6479
Epoch [1/10], Step [300/469], Loss: 1.6076
Epoch [1/10], Step [400/469], Loss: 1.5867
Epoch [2/10], Step [100/469], Loss: 1.6314
Epoch [2/10], Step [200/469], Loss: 1.5779
Epoch [2/10], Step [300/469], Loss: 1.6175
Epoch [2/10], Step [400/469], Loss: 1.5978
Epoch [3/10], Step [100/469], Loss: 1.5816
Epoch [3/10], Step [200/469], Loss: 1.5949
Epoch [3/10], Step [300/469], Loss: 1.6032
Epoch [3/10], Step [400/469], Loss: 1.5971
Epoch [4/10], Step [100/469], Loss: 1.6048
Epoch [4/10], Step [200/469], Loss: 1.6233
Epoch [4/10], Step [300/469], Loss: 1.5754
Epoch [4/10], Step [400/469], Loss: 1.6084
Epoch [5/10], Step [100/469], Loss: 1.5526
Epoch [5/10], Step [200/469], Loss: 1.6038
Epoch [5/10], Step [300/469], Loss: 1.5973
Epoch [5/10], Step [400/469], Loss: 1.6010
Epoch [6/10], Step [100/469], Loss: 1.5711
Epoch [6/10], Step [200/469], Loss: 1.6305
Epoch [6/10], Step [300/469], Loss: 1.6143
Epoch [6/10

### Exercise
How to test model performace? Try it yourself!

In [99]:
model.eval()
correct = 0
total = 0
total_loss = 0

with torch.no_grad():
  for images, labels in test_loader:
    images = images.reshape(-1, 28*28).to('cuda')
    labels = labels.to('cuda')
    
    output = model(images)
    loss = criterion(output, labels)
    total_loss += loss.item()
    
    _, pred = torch.max(output.data, 1)
    total += labels.size(0)
    correct += (pred== labels).sum().item()
avg_loss = total_loss / len(test_loader)
accuracy = 100 * correct / total
print(f"Test Loss : {avg_loss:.4f}, Accuracy : {accuracy:.2f}%")

Test Loss : 1.5747, Accuracy : 85.07%
