<a href="https://colab.research.google.com/github/sambar1729/pytorch_tutorial/blob/main/pytorch_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Pytorch tutorial

Following Sebastian Raschka's tutorial here: https://sebastianraschka.com/teaching/pytorch-1h/

In [1]:
!pip install torch==2.4.1

Collecting torch==2.4.1
  Downloading torch-2.4.1-cp311-cp311-manylinux1_x86_64.whl.metadata (26 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch==2.4.1)
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch==2.4.1)
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch==2.4.1)
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch==2.4.1)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch==2.4.1)
  Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.0.2.54 (from torch==2.4.1)
  Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-many

In [49]:
%pip install memory-profiler


Collecting memory-profiler
  Downloading memory_profiler-0.61.0-py3-none-any.whl.metadata (20 kB)
Downloading memory_profiler-0.61.0-py3-none-any.whl (31 kB)
Installing collected packages: memory-profiler
Successfully installed memory-profiler-0.61.0


In [2]:
import torch

torch.__version__

'2.4.1+cu121'

In [5]:
tensor0d = torch.tensor(1)

In [6]:
tensor1d = torch.tensor([1, 2, 3])

In [7]:
tensor2d = torch.tensor([[1, 2], [3, 4]])

In [8]:
tensor3d = torch.tensor([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

In [9]:
tensor1d = torch.tensor([1, 2, 3])

print(tensor1d.dtype)

torch.int64


In [10]:
floatvec = torch.tensor([1.0, 2.0, 3.0])
print(floatvec.dtype)

torch.float32


In [11]:
floatvec = tensor1d.to(torch.float32)
print(floatvec.dtype)

torch.float32


In [12]:
tensor2d.shape

torch.Size([2, 2])

In [13]:
tensor2d = torch.tensor([[1, 2, 3],
                         [4, 5, 6]])

In [14]:
tensor2d.shape

torch.Size([2, 3])

In [15]:
tensor2d.reshape([3, 2])

tensor([[1, 2],
        [3, 4],
        [5, 6]])

In [16]:
tensor2d.view(3,2)

tensor([[1, 2],
        [3, 4],
        [5, 6]])

In [17]:
tensor2d

tensor([[1, 2, 3],
        [4, 5, 6]])

In [18]:
tensor2d.T

tensor([[1, 4],
        [2, 5],
        [3, 6]])

In [19]:
tensor2d.matmul(tensor2d.T)

tensor([[14, 32],
        [32, 77]])

In [20]:
tensor2d @ tensor2d.T

tensor([[14, 32],
        [32, 77]])

In [21]:
import torch.nn.functional as F

y = torch.tensor([1.0])  # true label
x1 = torch.tensor([1.1]) # input feature
w1 = torch.tensor([2.2]) # weight parameter
b = torch.tensor([0.0])  # bias unit

z = x1 * w1 + b          # net input
a = torch.sigmoid(z)     # activation & output

loss = F.binary_cross_entropy(a, y)
print(loss)

tensor(0.0852)


In [22]:
import torch, torch.nn as nn, torch.optim as optim

model = nn.Linear(10, 1)
optimizer = optim.SGD(model.parameters(), lr=1e-2)
data = torch.randn(4, 10)
target = torch.randn(4, 1)

for epoch in range(100):
    output = model(data)
    loss = nn.functional.mse_loss(output, target)
    loss.backward()
    # BUG: missing optimizer.zero_grad()
    optimizer.step()
    print(f"Epoch {epoch} loss grad sum:", sum(p.grad.sum().item() for p in model.parameters()))


Epoch 0 loss grad sum: 5.941392242908478
Epoch 1 loss grad sum: 11.378333985805511
Epoch 2 loss grad sum: 15.853800415992737
Epoch 3 loss grad sum: 19.000988602638245
Epoch 4 loss grad sum: 20.57731395959854
Epoch 5 loss grad sum: 20.48631116747856
Epoch 6 loss grad sum: 18.785321831703186
Epoch 7 loss grad sum: 15.678302884101868
Epoch 8 loss grad sum: 11.494457006454468
Epoch 9 loss grad sum: 6.654939651489258
Epoch 10 loss grad sum: 1.630887508392334
Epoch 11 loss grad sum: -3.1029887199401855
Epoch 12 loss grad sum: -7.114686965942383
Epoch 13 loss grad sum: -10.056479930877686
Epoch 14 loss grad sum: -11.697687149047852
Epoch 15 loss grad sum: -11.945482730865479
Epoch 16 loss grad sum: -10.851707696914673
Epoch 17 loss grad sum: -8.605173110961914
Epoch 18 loss grad sum: -5.510236501693726
Epoch 19 loss grad sum: -1.9539647102355957
Epoch 20 loss grad sum: 1.6348085403442383
Epoch 21 loss grad sum: 4.830343008041382
Epoch 22 loss grad sum: 7.254231214523315
Epoch 23 loss grad sum

In [23]:
import torch, torch.nn as nn, torch.optim as optim

model = nn.Linear(10, 1)
optimizer = optim.SGD(model.parameters(), lr=1e-2)
data = torch.randn(4, 10)
target = torch.randn(4, 1)

for epoch in range(100):
    optimizer.zero_grad()
    output = model(data)
    loss = nn.functional.mse_loss(output, target)
    loss.backward()
    # optimizer.zero_grad()
    optimizer.step()
    print(f"Epoch {epoch} loss grad sum:", sum(p.grad.sum().item() for p in model.parameters()))


Epoch 0 loss grad sum: -15.222372055053711
Epoch 1 loss grad sum: -14.585453748703003
Epoch 2 loss grad sum: -13.971067905426025
Epoch 3 loss grad sum: -13.378954410552979
Epoch 4 loss grad sum: -12.80876612663269
Epoch 5 loss grad sum: -12.260099411010742
Epoch 6 loss grad sum: -11.73249864578247
Epoch 7 loss grad sum: -11.225463390350342
Epoch 8 loss grad sum: -10.738462448120117
Epoch 9 loss grad sum: -10.270942687988281
Epoch 10 loss grad sum: -9.82233214378357
Epoch 11 loss grad sum: -9.392048954963684
Epoch 12 loss grad sum: -8.979502558708191
Epoch 13 loss grad sum: -8.584102034568787
Epoch 14 loss grad sum: -8.205257415771484
Epoch 15 loss grad sum: -7.842382311820984
Epoch 16 loss grad sum: -7.494899153709412
Epoch 17 loss grad sum: -7.162235975265503
Epoch 18 loss grad sum: -6.843834161758423
Epoch 19 loss grad sum: -6.53914475440979
Epoch 20 loss grad sum: -6.247633457183838
Epoch 21 loss grad sum: -5.9687793254852295
Epoch 22 loss grad sum: -5.702076196670532
Epoch 23 loss 

In [24]:
import torch.nn.functional as F

y = torch.tensor([1.0])  # true label
x1 = torch.tensor([1.1]) # input feature
w1 = torch.tensor([2.2]) # weight parameter
b = torch.tensor([0.0])  # bias unit

z = x1 * w1 + b          # net input
a = torch.sigmoid(z)     # activation & output

loss = F.binary_cross_entropy(a, y)
print(loss)

tensor(0.0852)




```
# This is formatted as code
```

# Dataloaders

In [25]:
class NeuralNetwork(torch.nn.Module):
    def __init__(self, num_inputs, num_outputs):
        super().__init__()

        self.layers = torch.nn.Sequential(

            # 1st hidden layer
            torch.nn.Linear(num_inputs, 30),
            torch.nn.ReLU(),

            # 2nd hidden layer
            torch.nn.Linear(30, 20),
            torch.nn.ReLU(),

            # output layer
            torch.nn.Linear(20, num_outputs),
        )

    def forward(self, x):
        logits = self.layers(x)
        return logits

In [26]:
model = NeuralNetwork(50, 3)

In [27]:
print(model)

NeuralNetwork(
  (layers): Sequential(
    (0): Linear(in_features=50, out_features=30, bias=True)
    (1): ReLU()
    (2): Linear(in_features=30, out_features=20, bias=True)
    (3): ReLU()
    (4): Linear(in_features=20, out_features=3, bias=True)
  )
)


In [28]:
num_params = sum(
    p.numel() for p in model.parameters() if p.requires_grad
)
print("Total number of trainable model parameters:", num_params)

Total number of trainable model parameters: 2213


In [29]:
X_train = torch.tensor([
    [-1.2, 3.1],
    [-0.9, 2.9],
    [-0.5, 2.6],
    [2.3, -1.1],
    [2.7, -1.5]
])

y_train = torch.tensor([0, 0, 0, 1, 1])

In [30]:
X_train.shape

torch.Size([5, 2])

In [31]:
y_train.shape

torch.Size([5])

In [32]:
X_test = torch.tensor([
    [-0.8, 2.8],
    [2.6, -1.6],
])

y_test = torch.tensor([0, 1])

In [33]:
from torch.utils.data import Dataset


class ToyDataset(Dataset):
    def __init__(self, X, y):
        self.features = X
        self.labels = y

    def __getitem__(self, index):
        one_x = self.features[index]
        one_y = self.labels[index]
        return one_x, one_y

    def __len__(self):
        return self.labels.shape[0]

train_ds = ToyDataset(X_train, y_train)
test_ds = ToyDataset(X_test, y_test)

In [34]:
len(train_ds)

5

In [44]:
from torch.utils.data import DataLoader

torch.manual_seed(123)

# train_loader = DataLoader(
#     dataset=train_ds,
#     batch_size=2,
#     shuffle=True,
#     num_workers=0
# )

train_loader = DataLoader(
    dataset=train_ds,
    batch_size=2,
    shuffle=True,
    num_workers=0,
    drop_last=True
)

In [45]:
train_loader

<torch.utils.data.dataloader.DataLoader at 0x7daf559127d0>

In [46]:
test_ds = ToyDataset(X_test, y_test)

test_loader = DataLoader(
    dataset=test_ds,
    batch_size=2,
    shuffle=False,
    num_workers=0
)

In [47]:
for idx, (x, y) in enumerate(train_loader):
    print(f"Batch {idx+1}:", x, y)

Batch 1: tensor([[ 2.3000, -1.1000],
        [-0.9000,  2.9000]]) tensor([1, 0])
Batch 2: tensor([[-1.2000,  3.1000],
        [-0.5000,  2.6000]]) tensor([0, 0])


In [42]:
for idx, (x, y) in enumerate(train_loader):
    print(f"Batch {idx+1}:", x, y)

Batch 1: tensor([[-1.2000,  3.1000],
        [-0.5000,  2.6000]]) tensor([0, 0])
Batch 2: tensor([[ 2.3000, -1.1000],
        [-0.9000,  2.9000]]) tensor([1, 0])
Batch 3: tensor([[ 2.7000, -1.5000]]) tensor([1])


In [43]:
for idx, (x, y) in enumerate(train_loader):
    print(f"Batch {idx+1}:", x, y)

Batch 1: tensor([[-0.9000,  2.9000],
        [ 2.3000, -1.1000]]) tensor([0, 1])
Batch 2: tensor([[ 2.7000, -1.5000],
        [-0.5000,  2.6000]]) tensor([1, 0])
Batch 3: tensor([[-1.2000,  3.1000]]) tensor([0])


In [48]:
import os, psutil

proc = psutil.Process(os.getpid())
mem_bytes = proc.memory_info().rss  # in bytes
print(f"RSS memory: {mem_bytes/1e9:.2f} GB")

RSS memory: 0.56 GB


In [50]:
import torch

print(torch.cuda.memory_summary())     # overall summary
print(torch.cuda.memory_allocated())   # bytes currently allocated by tensors
print(torch.cuda.memory_reserved())    # bytes reserved by caching allocator

|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |      0 B   |      0 B   |      0 B   |      0 B   |
|       from large pool |      0 B   |      0 B   |      0 B   |      0 B   |
|       from small pool |      0 B   |      0 B   |      0 B   |      0 B   |
|---------------------------------------------------------------------------|
| Active memory         |      0 B   |      0 B   |      0 B   |      0 B   |
|       from large pool |      0 B   |      0 B   |      0 B   |      0 B   |
|       from small pool |      0 B   |      0 B   |      0 B   |      0 B   |
|---------------------------------------------------------------

In [51]:
!nvidia-smi

Tue Jul  8 16:40:01 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   42C    P8              9W /   70W |       2MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [52]:
%%bash
cat > gpu_hammer.py << 'EOF'
import torch, time

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
N = 4096
a = torch.randn((N, N), device=device)
b = torch.randn((N, N), device=device)

iter_num = 0
while True:
    iter_num += 1
    c = torch.mm(a, b)
    # burn CPU/GPU a bit less if you like:
    time.sleep(0.1)
    if iter_num % 50 == 0:
        print(f"[Hammer #{iter_num}] GPU mem used: {torch.cuda.memory_allocated()/1e6:.1f} MB")
EOF

In [62]:
%%bash
# start the hammer in the background, unbuffered, logging to gpu.log
nohup python3 -u gpu_hammer.py > gpu.log 2>&1 &

# immediately capture its PID
echo $! > gpu_hammer.pid

echo "Started gpu_hammer.py with PID $(cat gpu_hammer.pid)"

Started gpu_hammer.py with PID 10067
