
# Introduction to PyTorch

PyTorch is an open-source deep learning framework developed by Facebook's AI Research lab. It is popular for its ease of use, dynamic computational graphs, and strong GPU support. PyTorch is widely used in both academia and industry for various applications like natural language processing, computer vision, and reinforcement learning.

In this notebook, we'll cover the basic concepts and operations in PyTorch that will help build the foundation for implementing neural networks and using pre-built models from PyTorch.



## Tensors in PyTorch

Tensors are the basic building blocks in PyTorch, similar to NumPy arrays but with the added advantage of GPU acceleration. In this section, we will learn how to create and manipulate tensors in PyTorch.


In [2]:

# Import PyTorch library
import torch, time

# Create a tensor
x = torch.tensor([[1, 2], [3, 4]], dtype=torch.float32)
print("Tensor:\n", x)

# Create a random tensor
random_tensor = torch.rand((2, 3))
print("\nRandom Tensor:\n", random_tensor)

# Perform tensor addition
y = torch.tensor([[5, 6], [7, 8]], dtype=torch.float32)
z = x + y
print("\nTensor Addition:\n", z)

# Reshaping a tensor
reshaped_tensor = x.view(4)
print("\nReshaped Tensor:\n", reshaped_tensor)

# Broadcasting in tensors
broadcast_tensor = x + torch.tensor([1, 2])
print("\nBroadcast Tensor:\n", broadcast_tensor)

# # Tensor operations on GPU (if available)
# if torch.cuda.is_available():
#     x_cuda = x.to('cuda')
#     print("\nTensor on GPU:\n", x_cuda)
# else:
#     print("\nCUDA not available. Running on CPU.")

# Create a tensor
x = torch.rand((10000, 10000), dtype=torch.float32)

# Perform tensor operations on CPU
start_time_cpu = time.time()
cpu_result = x + x
cpu_time = time.time() - start_time_cpu
print("Tensor on CPU:\n", cpu_result)
print(f"Time taken on CPU: {cpu_time:.6f} seconds")

# Perform tensor operations on GPU (if available)
if torch.cuda.is_available():
    x_cuda = x.to('cuda')
    start_time_gpu = time.time()
    gpu_result = x_cuda + x_cuda
    gpu_time = time.time() - start_time_gpu
    print("\nTensor on GPU:\n", gpu_result)
    print(f"Time taken on GPU: {gpu_time:.6f} seconds")
else:
    print("\nCUDA not available. Running only on CPU.")


Tensor:
 tensor([[1., 2.],
        [3., 4.]])

Random Tensor:
 tensor([[0.7218, 0.3572, 0.9995],
        [0.0427, 0.5590, 0.3319]])

Tensor Addition:
 tensor([[ 6.,  8.],
        [10., 12.]])

Reshaped Tensor:
 tensor([1., 2., 3., 4.])

Broadcast Tensor:
 tensor([[2., 4.],
        [4., 6.]])
Tensor on CPU:
 tensor([[1.7695, 1.6661, 0.6373,  ..., 0.9061, 1.5843, 0.2765],
        [1.0162, 0.2150, 1.4708,  ..., 0.2819, 1.9416, 1.7394],
        [0.7891, 1.1084, 1.6542,  ..., 1.2855, 1.5199, 0.3721],
        ...,
        [0.6938, 0.9346, 1.2154,  ..., 0.0750, 0.7258, 1.2016],
        [1.9199, 0.4928, 1.1403,  ..., 1.3637, 0.1972, 1.6581],
        [0.6172, 0.4212, 1.4229,  ..., 0.9813, 0.2339, 1.0867]])
Time taken on CPU: 0.304184 seconds

Tensor on GPU:
 tensor([[1.7695, 1.6661, 0.6373,  ..., 0.9061, 1.5843, 0.2765],
        [1.0162, 0.2150, 1.4708,  ..., 0.2819, 1.9416, 1.7394],
        [0.7891, 1.1084, 1.6542,  ..., 1.2855, 1.5199, 0.3721],
        ...,
        [0.6938, 0.9346, 1.2154,  .


## Autograd and Computational Graphs

PyTorch provides automatic differentiation using a feature called `autograd`. It tracks all operations on tensors that have `requires_grad=True`, and builds a computational graph. During the backward pass, it computes the gradients of all tensor operations automatically.

Let's see a simple example of how this works.


In [3]:

# Create a tensor with requires_grad=True to track computation
x = torch.tensor(2.0, requires_grad=True)

# Define a simple function f(x) = x^2
y = x**2

# Compute the gradient of y with respect to x
y.backward()

# Print the gradient (dy/dx)
print("Gradient (dy/dx):", x.grad)


Gradient (dy/dx): tensor(4.)



## Building a Simple Neural Network

In PyTorch, neural networks are built by creating classes that inherit from `torch.nn.Module`. These classes represent different layers, activation functions, and forward passes. Let's see an example of building a simple neural network with one hidden layer.


In [4]:

import torch
import torch.nn as nn

# Define a simple neural network with one hidden layer
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.layer1 = nn.Linear(2, 3)  # Input layer (2 features) to hidden layer (3 neurons)
        self.relu = nn.ReLU()          # Activation function (ReLU)
        self.layer2 = nn.Linear(3, 1)  # Hidden layer (3 neurons) to output layer (1 output)

    def forward(self, x):
        x = self.layer1(x)
        x = self.relu(x)
        x = self.layer2(x)
        return x

# Create an instance of the neural network
model = SimpleNN()

# Sample input
input_data = torch.tensor([[1.0, 2.0]])
output = model(input_data)
print("Network output:\n", output)


Network output:
 tensor([[0.3990]], grad_fn=<AddmmBackward0>)



## Optimizers and Loss Functions

Once the model is built, we need to define how it learns. This is done by specifying a loss function and an optimizer. Common loss functions include Mean Squared Error (MSE) and Cross-Entropy Loss. For optimizers, we can use methods like Stochastic Gradient Descent (SGD) or Adam. PyTorch provides the `torch.optim` module to easily implement optimizers.

Here's an example of using a loss function and an optimizer.


In [5]:

# Define loss function (Mean Squared Error)
loss_fn = nn.MSELoss()

# Define optimizer (Stochastic Gradient Descent)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

# Sample target output
target = torch.tensor([[0.0]])

# Compute loss
loss = loss_fn(output, target)
print("Loss:", loss.item())

# Backpropagation: compute gradients
optimizer.zero_grad()
loss.backward()

# Update weights
optimizer.step()

print("Updated model parameters:")
for param in model.parameters():
    print(param)


Loss: 0.15918532013893127
Updated model parameters:
Parameter containing:
tensor([[-0.3942, -0.6374],
        [-0.2481, -0.3214],
        [ 0.3430, -0.2309]], requires_grad=True)
Parameter containing:
tensor([ 0.1191,  0.1849, -0.3192], requires_grad=True)
Parameter containing:
tensor([[-0.1630,  0.0298, -0.2697]], requires_grad=True)
Parameter containing:
tensor([0.3910], requires_grad=True)



## PyTorch Computational Graphs

PyTorch uses **dynamic computational graphs** which means that the graph is generated on-the-fly as operations are executed. This is different from **static computational graphs** (like those used in TensorFlow 1.x), which require you to define the entire graph before running computations.

### Dynamic vs Static Graphs

In PyTorch, the graph is recreated after each iteration, making it more intuitive and easier to debug. You can use Python’s native control flow (e.g., loops and conditionals) while defining neural networks, which allows for flexibility in experimentation.

**Static graphs** (like in TensorFlow) need to be defined first, and then they are executed. This can make them less intuitive to work with.

### Benefits for Neural Network Training

With dynamic graphs, you can easily modify the network architecture during training. This allows for more flexibility when experimenting with new ideas or debugging errors.

For example, you can change the structure of the graph on every iteration of training, which is highly useful for tasks like reinforcement learning or neural architecture search.



## Datasets and Data Loaders

Neural networks require large amounts of data for training. PyTorch provides two classes: `torch.utils.data.Dataset` and `torch.utils.data.DataLoader` to help manage and load data efficiently.

### Working with Datasets

The `Dataset` class represents a dataset, and its primary purpose is to return individual samples and their corresponding labels. PyTorch provides built-in datasets like `torchvision.datasets`, but you can also define your custom dataset by subclassing `Dataset`.

### DataLoader for Batching

The `DataLoader` wraps a dataset and provides an iterator that can split the dataset into batches, shuffle data, and load data in parallel using multiple workers.

Let's see how you can create a custom dataset and use DataLoader to handle it.


In [None]:

from torch.utils.data import Dataset, DataLoader
from PIL import Image
import os

# Define a custom dataset class
class CustomImageDataset(Dataset):
    def __init__(self, image_dir, transform=None):
        self.image_dir = image_dir
        self.image_names = os.listdir(image_dir)
        self.transform = transform

    def __len__(self):
        return len(self.image_names)

    def __getitem__(self, idx):
        img_path = os.path.join(self.image_dir, self.image_names[idx])
        image = Image.open(img_path)
        label = 1 if "cat" in img_path else 0  # Example label
        if self.transform:
            image = self.transform(image)
        return image, label

# Create an instance of the custom dataset
image_dir = 'path/to/your/images'
custom_dataset = CustomImageDataset(image_dir)

# Use DataLoader to create batches of data
data_loader = DataLoader(custom_dataset, batch_size=4, shuffle=True, num_workers=2)

# Iterate over the data loader
for images, labels in data_loader:
    print(f"Batch of images: {images.shape}, Batch of labels: {labels}")



## PyTorch Modules and `nn.Module`

In PyTorch, every layer or neural network is implemented as a subclass of the `nn.Module` class. This class provides the structure for building complex networks and layers, and it is the base class for all neural networks in PyTorch.

### What is `nn.Module`?

`nn.Module` serves as a container for layers and methods. Layers in a neural network, such as fully connected layers (e.g., `nn.Linear`), convolutional layers (e.g., `nn.Conv2d`), and activation functions (e.g., `nn.ReLU`), are subclasses of `nn.Module`.

### Basic Usage of `nn.Module`

Let's create a simple neural network using the `nn.Module` class.


In [7]:

import torch.nn as nn

# Define a simple neural network using nn.Module
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(4, 3)  # Fully connected layer (4 inputs to 3 outputs)
        self.relu = nn.ReLU()       # ReLU activation function
        self.fc2 = nn.Linear(3, 1)  # Fully connected layer (3 inputs to 1 output)

    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

# Create an instance of the network
model = SimpleNN()

# Sample input
input_data = torch.rand(1, 4)  # Batch of size 1 with 4 features
output = model(input_data)
print("Model output:", output)


Model output: tensor([[-0.0886]], grad_fn=<AddmmBackward0>)
