##### Covering fundamentals of PyTorch



1. Tensors and Operations

At its core, pytorch provides two main features:
- An n-dimensional Tensor similar to numpy but can run on a GPU.
- Automatic differentiation for building and training neural networks.


### Tensors
Why we need tensors?
Numpy is a great framework used for scientific computing,  but it cannot utilize GPUs to accelerate it numerical computations. For modern deep neural networks, GPUs often provide speedups of 50x or greater, so unfortunately numpy won't be enough for model deep learning.

Birth of Tensor:
A pytorch tensor is conceptually identical to a numpy array: a Tensor is an n-dimensional array, and PyTorch provides many functions for operating on these tensors. Behind the scenes, Tensors can keep track of computational graph and gradients, but they are also useful as a generic tool for scientific computing.

- Unlike numpy, it utilizes GPU to accelerate scientific computing.



In [19]:
import torch

# Create a tensor
tensor = torch.tensor([1, 2, 3, 4])

# Split
print("Split into chunks of size 2:", torch.split(tensor, 1))

# Unsqueeze
print("Add a new dimension at index 0:", tensor.unsqueeze(0))

# Squeeze
print("Remove a dimension of size 1:", tensor.unsqueeze(0).squeeze(0))

tensor.dim()

Split into chunks of size 2: (tensor([1]), tensor([2]), tensor([3]), tensor([4]))
Add a new dimension at index 0: tensor([[1, 2, 3, 4]])
Remove a dimension of size 1: tensor([1, 2, 3, 4])


1

Why Do We Need Tensors?

While NumPy arrays are sufficient for many numerical computations, tensors offer several advantages:

GPU Support: Tensors can be easily moved between CPU and GPU, making them ideal for GPU-accelerated computations.

Autograd: Tensors are integrated with PyTorch's autograd system, which automatically computes gradients for backpropagation.

Dynamic Computation Graph: Tensors can be used to build dynamic computation graphs, which are essential for many deep learning architectures.

Sparse Tensors: PyTorch provides support for sparse tensors, which can be used to represent sparse data efficiently.

In [29]:
x = torch.rand(2, 4)
y = torch.rand(4, 4)

print(torch.matmul(x, y))
print(tensor)

tensor([[0.9816, 0.2135, 1.1452, 1.1695],
        [1.5069, 0.4854, 1.1027, 1.2231]])
tensor([[0.4215, 0.9817, 0.7830, 0.7039],
        [0.2045, 0.3592, 0.9083, 0.4419],
        [0.2369, 0.9979, 0.9156, 0.1836],
        [0.8910, 0.5093, 0.0989, 0.2062],
        [0.8880, 0.2310, 0.9863, 0.7719],
        [0.3943, 0.5028, 0.2623, 0.1029],
        [0.1789, 0.4935, 0.7941, 0.8147],
        [0.1428, 0.6246, 0.3132, 0.4437],
        [0.3738, 0.8807, 0.5462, 0.1115],
        [0.3678, 0.5286, 0.0794, 0.3722]])


In [30]:
import torch

# Create a tensor with requires_grad=True
x = torch.tensor(2.0, requires_grad=True)

# Perform operations
y = x ** 2
z = y * 2

# Compute gradients
z.backward()

# Print gradients
print(x.grad)

tensor(8.)


In [31]:
import torch
import torch.nn as nn

class LinearModule(nn.Module):
    def __init__(self, input_size, output_size):
        super(LinearModule, self).__init__()
        self.linear = nn.Linear(input_size, output_size)

    def forward(self, x):
        return torch.relu(self.linear(x))

# Create an instance of the custom module
module = LinearModule(5, 3)

# Perform a forward pass
input_tensor = torch.randn(1, 5)
output = module(input_tensor)
print(output)

tensor([[0.5992, 0.7764, 0.0000]], grad_fn=<ReluBackward0>)


In [32]:
import torch
import torch.nn as nn

class SimpleNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        return out

# Create an instance of the custom module
model = SimpleNN(784, 128, 10)

# Print the model's architecture
print(model)

SimpleNN(
  (fc1): Linear(in_features=784, out_features=128, bias=True)
  (relu): ReLU()
  (fc2): Linear(in_features=128, out_features=10, bias=True)
)


In [36]:
import torch
from torchvision import datasets, transforms

# Define the transform
transform = transforms.Compose([transforms.ToTensor()])

# Load the training and test datasets
train_dataset = datasets.MNIST('~/.pytorch/MNIST_data/', download=True, train=True, transform=transform)
test_dataset = datasets.MNIST('~/.pytorch/MNIST_data/', download=True, train=False, transform=transform)

# Create data loaders
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)



import torch
import torch.nn as nn
import torch.optim as optim

# Define the custom module
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(784, 128)  # input size is 784 (28x28 flattened)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = torch.flatten(x, 1)  # flatten the input images
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        return out

# Initialize the custom module
model = SimpleNN()
model.to(device="cuda:0")
# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Train the model
for epoch in range(10):
    for images, labels in train_loader:
        # Zero the gradients
        optimizer.zero_grad()
        images, labels = images.to(device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')), labels.to(device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu'))
        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)
        
        # Backward pass
        loss.backward()
        
        # Update the model parameters
        optimizer.step()
    
    print('Epoch {}: Loss = {:.4f}'.format(epoch+1, loss.item()))
    
    
    
# Set the model to evaluation mode
model.eval()

# Initialize the test loss and accuracy
test_loss = 0
correct = 0

# Make predictions on the test dataset
with torch.no_grad():
    for images, labels in test_loader:
        outputs = model(images)
        loss = criterion(outputs, labels)
        test_loss += loss.item()
        _, predicted = torch.max(outputs, 1)
        correct += (predicted == labels).sum().item()

# Calculate the test accuracy
accuracy = correct / len(test_dataset)
print('Test Accuracy: {:.2f}%'.format(100 * accuracy))

Epoch 1: Loss = 0.6055
Epoch 2: Loss = 0.3798
Epoch 3: Loss = 0.5946
Epoch 4: Loss = 0.0951
Epoch 5: Loss = 0.2064
Epoch 6: Loss = 0.3743
Epoch 7: Loss = 0.3883
Epoch 8: Loss = 0.4421
Epoch 9: Loss = 0.3160
Epoch 10: Loss = 0.3737


RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

In [37]:
device = "cuda:0"
with torch.no_grad():
    for images, labels in test_loader:
        # Move the images and labels to the GPU
        images, labels = images.to(device), labels.to(device)
        
        outputs = model(images)
        loss = criterion(outputs, labels)
        test_loss += loss.item()
        _, predicted = torch.max(outputs, 1)
        correct += (predicted == labels).sum().item()

# Calculate the test accuracy
accuracy = correct / len(test_dataset)
print('Test Accuracy: {:.2f}%'.format(100 * accuracy))

Test Accuracy: 93.03%
