<a href="https://colab.research.google.com/github/lonestarcan/alexamathgameskill/blob/master/lab1/Lab1.3%20-%20Introduction%20to%20PyTorch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<table align="center">
  <td align="center">
    <a target="_blank" href="http://inspiredk.org">
    <img align="center" src="https://i.ibb.co/Z6HZPSbH/Inspired-K-org-Logo-No-Whitespace-Extra-Small.png">InspiredK.org Website</a>
  </td>
  
  <td align="center">
    <a target="_blank" href="https://colab.research.google.com/github/InspiredK-organization/MITintrotodeeplearning/blob/master/lab1/Lab1.3 - Introduction to PyTorch.ipynb">
    <img align="center" src="https://i.ibb.co/2P3SLwK/colab.png"/>Run in Google Colab</a>
  </td>
</table>

# Copyright Information

In [None]:
# Copyright 2025 MIT Introduction to Deep Learning. All Rights Reserved.
#
# Licensed under the MIT License. You may not use this file except in compliance
# with the License. Use and/or modification of this code outside of MIT Introduction
# to Deep Learning must reference:
#
# © MIT Introduction to Deep Learning
# http://introtodeeplearning.com
#
# Original lab is adopted from http://introtodeeplearning.com
# Lab is edited by http://InspiredK.org

# Lab 1: Intro to PyTorch and Music Generation with RNNs

In this lab, you'll get exposure to using PyTorch and learn how it can be used for deep learning. Go through the code and run each cell. Along the way, you'll encounter several ***TODO*** blocks -- follow the instructions to fill them out before running those cells and continuing.


# Part 1: Intro to PyTorch

## 0.1 Install PyTorch

[PyTorch](https://pytorch.org/) is a popular deep learning library known for its flexibility and ease of use. Here we'll learn how computations are represented and how to define a simple neural network in PyTorch. For all the labs in Introduction to Deep Learning 2025, there will be a PyTorch version available.

Let's install PyTorch and a couple of dependencies.

In [None]:
import torch
import torch.nn as nn

# Download and import the MIT Introduction to Deep Learning package
!pip install mitdeeplearning --quiet
import mitdeeplearning as mdl

import numpy as np # For nparrays
import matplotlib.pyplot as plt # For graphical visualizations of different statistics

## 1.1 What is PyTorch?

PyTorch is a machine learning library, like TensorFlow. At its core, PyTorch provides an interface for creating and manipulating [tensors](https://pytorch.org/docs/stable/tensors.html), which are data structures that you can think of as multi-dimensional arrays. Tensors are represented as n-dimensional arrays of base datatypes such as a string or integer -- they provide a way to generalize vectors and matrices to higher dimensions. PyTorch provides the ability to perform computation on these tensors, define neural networks, and train them efficiently.

The [```shape```](https://pytorch.org/docs/stable/generated/torch.Tensor.shape.html#torch.Tensor.shape) of a PyTorch tensor defines its number of dimensions and the size of each dimension. The `ndim` or [```dim```](https://pytorch.org/docs/stable/generated/torch.Tensor.dim.html#torch.Tensor.dim) of a PyTorch tensor provides the number of dimensions (n-dimensions) -- this is equivalent to the tensor's rank (as is used in TensorFlow), and you can also think of this as the tensor's order or degree.

Let’s start by creating some tensors and inspecting their properties:


In [None]:
integer = torch.tensor(1234)
decimal = torch.tensor(3.14159265359)

print(f"`integer` is a {integer.ndim}-d Tensor: {integer}")
print(f"`decimal` is a {decimal.ndim}-d Tensor: {decimal}")

Vectors and lists can be used to create 1-d tensors:

In [None]:
fibonacci = torch.tensor([1, 1, 2, 3, 5, 8])
count_to_100 = torch.tensor(range(100))

print(f"`fibonacci` is a {fibonacci.ndim}-d Tensor with shape: {fibonacci.shape}")
print(f"`count_to_100` is a {count_to_100.ndim}-d Tensor with shape: {count_to_100.shape}")

Next, let’s create 2-d (i.e., matrices) and higher-rank tensors. In image processing and computer vision, we will use 4-d Tensors with dimensions corresponding to batch size, number of color channels, image height, and image width.

In [None]:
### Defining multi-dimensional Tensors ###

'''TODO: Define a 2-dimensional Tensor'''
matrix = # TODO This tensor has a shape of 2 x 4 (2 rows, 4 columns).

assert isinstance(matrix, torch.Tensor), "matrix must be a torch Tensor object" # Is the object a Tensor?
assert matrix.ndim == 2 # Is it two-dimensional?

In [None]:
'''TODO: Define a 4-d Tensor.'''
# Use torch.zeros to initialize a 4-d Tensor of zeros with size 10 x 256 x 256 x 3.
# You can think of this as 10 images where each image is RGB 256 x 256.
images = # TODO This tensor contains all zeros, but in actual implementations there would be a wide variety of values.

assert isinstance(images, torch.Tensor), "images must be a torch Tensor object" # Is the object a Tensor?
assert images.ndim == 4, "images must have 4 dimensions" # Is it four-dimensional?
assert images.shape == (10, 3, 256, 256), "images is incorrect shape" # Does it have the correct shape?

As you have seen, the `shape` of a tensor provides the number of elements in each tensor dimension. The `shape` is quite useful, and we'll use it often. You can also use slicing to access subtensors within a higher-rank tensor:

In [None]:
row_vector = matrix[1] # Get the second row at index 1.
column_vector = matrix[:, 1] # Get the second column at index 1.
scalar = matrix[0, 1] # Get the value in the first row (index 0) and second column (index 1).

print(f"`row_vector`: {row_vector.numpy()}")
print(f"`column_vector`: {column_vector.numpy()}")
print(f"`scalar`: {scalar.numpy()}")

## 1.2 Computations on Tensors

A convenient way to think about and visualize computations in a machine learning framework like PyTorch is in terms of graphs. We can define this graph in terms of tensors, which hold data, and the mathematical operations that act on these tensors in some order. Let's look at a simple example, and define this computation using PyTorch:

![alt text](https://raw.githubusercontent.com/MITDeepLearning/introtodeeplearning/2025/lab1/img/add-graph.png)

In [None]:
# Create the inputs from the graph and initialize their values.
a = torch.tensor(15)
b = torch.tensor(61)

# Add them together.
c1 = torch.add(a, b)
c2 = a + b  # Pytorch overrides the "+" operation so that it can be used on Tensors.
print(f"c1: {c1}")
print(f"c2: {c2}")

Notice how we've created a computation graph consisting of PyTorch operations, and how the output is a tensor with value 76 -- we've just created a computation graph consisting of operations, and it's executed them and given us back the result.

Now let's consider a slightly more complicated example:

![alt text](https://raw.githubusercontent.com/MITDeepLearning/introtodeeplearning/2025/lab1/img/computation-graph.png)

Here, we take two inputs, `a, b`, and compute an output `e`. Each node in the graph represents an operation that takes some input, does some computation, and passes its output to another node.

Let's define a simple function in PyTorch to construct this computation function:

In [None]:
# Construct a computation function based on the above graph.
def func(a, b):
    '''TODO: Define the operation for c, d, e (use torch.add, torch.subtract, torch.multiply).'''
    c = # TODO Add a and b.
    d = # TODO Subtract 1 from b.
    e = # TODO Multiply c and d.
    return e

Now, we can call this function to execute the computation graph given some inputs `a,b`:

In [None]:
# Here are some example values for a and b.
a, b = 1.5, 2.5
# Call the computation function on these inputs.
e_out = func(a, b)
# c = 1.5 + 2.5 = 4
# d = 2.5 - 1 = 1.5
# e = 4 * 1.5 = 6.0
print(e_out) # 6.0

Notice how our output is a tensor with value defined by the output of the computation, and that the output has no shape as it is a single scalar value.

## 1.3 Neural networks in PyTorch
We can also define neural networks in PyTorch. PyTorch uses [``torch.nn.Module``](https://pytorch.org/docs/stable/generated/torch.nn.Module.html), which serves as a base class for all neural network modules in PyTorch and thus provides a framework for building and training neural networks.

Let's consider the example of a simple perceptron defined by just one dense (aka fully-connected or linear) layer: $ y = \sigma(Wx + b) $, where $W$ represents a matrix of weights, $b$ is a bias, $x$ is the input, $\sigma$ is the sigmoid activation function, and $y$ is the output.

![alt text](https://raw.githubusercontent.com/MITDeepLearning/introtodeeplearning/2025/lab1/img/computation-graph-2.png)

We will use `torch.nn.Module` to define layers -- the building blocks of neural networks. Layers implement common neural networks operations. In PyTorch, when we implement a layer, we subclass `nn.Module` and define the parameters of the layer as attributes of our new class. We also define and override a function [``forward``](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.forward), which will define the forward pass computation that is performed at every step. All classes subclassing `nn.Module` should override the `forward` function.

Let's write a dense layer class to implement a perceptron defined above.

In [None]:
### Defining a custom network layer ###

class OurDenseLayer(torch.nn.Module):
    def __init__(self, num_inputs, num_outputs):
        super(OurDenseLayer, self).__init__()
        # Define and initialize parameters - a weight matrix W and bias b.
        self.W = torch.nn.Parameter(torch.randn(num_inputs, num_outputs)) # These are the weights.
        self.b = torch.nn.Parameter(torch.randn(num_outputs)) # These are the biases.
        # Note: parameter initialization is random.

    def forward(self, x):
        '''TODO: Define the operation for z (hint: use torch.matmul).'''
        z = # TODO This is the operation that combines the inputs, weights, and biases.

        '''TODO: Define the operation for y (hint: use torch.sigmoid).'''
        y = # TODO This is the function 'sigma' that reaches the final output.
        return y

num_inputs = 2 # We want our dense layer to have two inputs.
num_outputs = 3 # And three outputs.
layer = OurDenseLayer(num_inputs, num_outputs) # Define the dense layer with the set input and output values.
x_input = torch.tensor([[1, 2.]]) # Test our dense layer with an input.
y = layer(x_input) # Use the layer on that input.

print(f"Input shape: {x_input.shape}")
print(f"Output shape: {y.shape}")
print(f"Output: {y}")

Conveniently, PyTorch has defined a number of ```nn.Modules``` (or Layers) that are commonly used in neural networks, for example a [```nn.Linear```](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) or [`nn.Sigmoid`](https://pytorch.org/docs/stable/generated/torch.nn.Sigmoid.html) module.

Now, instead of using a single ```Module``` to define our simple neural network, we'll use the  [`nn.Sequential`](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html) module from PyTorch and a single [`nn.Linear` ](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) layer to define our network. With the `Sequential` API, you can readily create neural networks by stacking together layers like building blocks.

In [None]:
### Defining a neural network using Sequential ###

# Define the number of inputs and outputs with the same values as before.
n_input_nodes = 2
n_output_nodes = 3

# Define the model using these values.
'''TODO: Use the Sequential API to define a neural network with a single linear (dense) layer, followed by a non-linearity to compute z'''
model = nn.Sequential('''TODO''')
    # Add a linear layer with input size 2 and output size 3.
    # Add a sigmoid activation function for non-linearity.

We've defined our model using the Sequential API. Now, we can test it out using an example input:

In [None]:
# Test the model with example input
x_input = torch.tensor([[1, 2.]]) # Use the model with the same example input from before.

'''TODO: Feed the input into the model and predict the output!'''
model_output = # TODO Test our model with that input.

print(f"Input shape: {x_input.shape}")
print(f"Output shape: {y.shape}")
print(f"Output: {y}") # The output should be the same as before.

With PyTorch, we can create more flexible models by subclassing [`nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html). The `nn.Module` class allows us to group layers together flexibly to define new architectures.

As we saw earlier with `OurDenseLayer`, we can subclass `nn.Module` to create a class for our model, and then define the forward pass through the network using the `forward` function. Subclassing affords the flexibility to define custom layers, custom training loops, custom activation functions, and custom models. Let's define the same neural network model as above (i.e., Linear layer with an activation function after it), now using subclassing and using PyTorch's built in linear layer from `nn.Linear`.

In [None]:
### Create a custom model using subclassing ###

class LinearWithSigmoidActivation(nn.Module):
    # In __init__, we define the model's layers.
    def __init__(self, num_inputs, num_outputs):
        super(LinearWithSigmoidActivation, self).__init__()
        '''TODO: Our model consists of a single linear (dense) layer and a sigmoid activation. Define this model just like before.'''
        self.linear = '''TODO: linear layer''' # The linear layer with a set number of inputs and outputs.
        self.activation = '''TODO: sigmoid activation''' # The non-linear activation function, sigmoid.

    # Define the call function that lets our model receive inputs and provide outputs.
    def forward(self, inputs):
        linear_output = self.linear(inputs)
        output = self.activation(linear_output)
        return output

Let's test out our new model, using an example input, setting `n_input_nodes=2` and `n_output_nodes=3` as before.

In [None]:
# Define the number of inputs and outputs like in previous examples.
n_input_nodes = 2
n_output_nodes = 3

model = LinearWithSigmoidActivation(n_input_nodes, n_output_nodes) # Define our model with these set values.
x_input = torch.tensor([[1, 2.]]) # Again, use the same example input from before.

y = model(x_input) # Unlike last time, however, the outputs will be different every time the code is run.
print(f"Input shape: {x_input.shape}")
print(f"Output shape: {y.shape}")
print(f"Output: {y}")

Importantly, `nn.Module` affords us a lot of flexibility to define custom models. For example, we can use boolean arguments in the `forward` function to specify different network behaviors, for example different behaviors during training and inference. Let's suppose under some instances we want our network to simply output the input, without any perturbation. We define a boolean argument `isidentity` to control this behavior:

In [None]:
### Defining a custom model using subclassing and adding custom behavior ###

class LinearWithIdentity(nn.Module):
    # As before, in __init__ we define the model's layers.
    def __init__(self, num_inputs, num_outputs):
        super(LinearWithIdentity, self).__init__()
        self.linear = nn.Linear(num_inputs, num_outputs) # Define the model like before with a Linear layer.

    '''TODO: Implement the behavior where the network outputs the input, unchanged, under control of the isidentity argument.'''

    # def forward(self, inputs, isidentity=False):
        # If isidentity is True, then return the inputs as-is.
        # If isidentity is False, then send the inputs through our model and provide an output.

Let's test this behavior:

In [None]:
model = LinearWithIdentity(num_inputs=2, num_outputs=3) # Create the model with the same inputs and outputs as before.
x_input = torch.tensor([[1, 2.]]) # Yet again, use the same example input from before.

'''TODO: Pass the input into the model and call with and without the input identity option.'''
out_with_linear = # TODO Use the model on this input with isidentity set to False.
out_with_identity = # TODO Then, use the model on the input with isidentity set to True.

print("Network linear output: {}\nNetwork identity output: {}".format(out_with_linear, out_with_identity)) # Compare the outputs.

Now that we have learned how to define layers and models in PyTorch using both the Sequential API and subclassing `nn.Module`, we're ready to turn our attention to how to actually implement network training with backpropagation.

## 1.4 Automatic Differentiation in PyTorch

In PyTorch, [`torch.autograd`](https://pytorch.org/docs/stable/autograd.html) is used for [automatic differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation), which is critical for training deep learning models with [backpropagation](https://en.wikipedia.org/wiki/Backpropagation).

We will use the PyTorch [`.backward()`](https://pytorch.org/docs/stable/generated/torch.Tensor.backward.html) method to trace operations for computing gradients. On a tensor, the [`requires_grad`](https://pytorch.org/docs/stable/generated/torch.Tensor.requires_grad_.html) attribute controls whether autograd should record operations on that tensor. When a forward pass is made through the network, PyTorch builds a computational graph dynamically; then, to compute the gradient, the `backward()` method is called to perform backpropagation.

Let's compute the gradient of $ y = x^2 $:

In [None]:
### Gradient computation using PyTorch autograd ###

x = torch.tensor(3.0, requires_grad=True) # Create a given variable `x` equal to 3.0, and have PyTorch record its operations.
y = x ** 2 # Let's use the example of `y = x^2`
y.backward() # Compute the gradient, the derivative of all mathematical operations that happen to a given value until the .backward() method is called.

dy_dx = x.grad # Call .grad on our input so we can get the gradient value after computing it.
print("dy_dx of y = x^2 at x = 3.0 is", dy_dx.numpy())
assert dy_dx == 6.0 # Is the gradient equal to 6?

In training neural networks, we use differentiation and stochastic gradient descent (SGD) to optimize a loss function. Now that we have a sense of how PyTorch's autograd can be used to compute and access derivatives, we will look at an example where we use automatic differentiation and SGD to find the minimum of $ L=(x-x_f)^2 $. Here $x_f$ is a variable for a desired value we are trying to optimize for; $L$ represents a loss that we are trying to minimize. While we can clearly solve this problem analytically ($ x_{min}=x_f $), considering how we can compute this using PyTorch's autograd sets us up nicely for future labs where we use gradient descent to optimize entire neural network losses.

In [None]:
### Function minimization with autograd and gradients ###

x = torch.randn(1) # Create a single random value for the initial x.
print(f"Initializing x={x.item()}") # Print the initial value.

x_f = 4  # Create a target value that x should reach.
learning_rate = 1e-2 # Set a rate to change x to reach the target value.
iterations = 500 # Set a number of iterations to loop through.
history = [] # Create an empty list to store all x values for graphing.

# In each iteration, we compute the loss, compute the gradient of the loss, and update the x value based on the gradient.
for i in range(iterations):
    x = torch.tensor([x], requires_grad=True) # Create a value for our x that tracks all its operations in our current iteration.
    loss = (x - x_f) ** 2  # Compute the loss as the difference between the initial and target values.

    loss.backward() # Backpropagate through the loss for our input x to compute the gradient.
    x = x.item() - learning_rate * x.grad # Change the initial value to move it slightly towards the target value.
    history.append(x.item()) # Add the updated value to the graphing list.

print("Final x={}".format(x.item())) # Print the final value to see how close it is to the target value.

# Plot the changes of the initial value as it moves towards the target value.
plt.plot(history)
plt.plot([0, iterations], [x_f, x_f])
plt.legend(('Initial Value', 'Target Value'))
plt.xlabel('Iteration')
plt.ylabel('Value')


This process of taking an initial value and moving towards a target value is called Stochastic Gradient Descent (SGD). SGD is used in machine learning to optimize weights for higher model accuracy. Even though many other methods have been developed, it remains one of the most commonly used in machine learning for weight optimization.

PyTorch's `autograd` provides an extremely flexible framework for automatic differentiation. In order to backpropagate errors through a neural network, we track operations on a variable, use this information to determine the gradients, and then use these gradients for optimization using SGD.

I highly encourage you to experiment with different values for `learning_rate` and `iterations` to see what effects they have on the final value and the trend of the graph. Additionally, go back to the custom layers and models we defined to see what happens when you change the input, input shape, output shape, and other characteristics of the model.