In the given neural network code, we work with tensors that must have compatible dimensions for matrix operations. Here's how these dimensions play out:

- `X` is the input tensor with dimensions 3, representing the features input to the network.
- `W1` is the weight matrix for the first layer, with dimensions 2x3, enabling the transformation of `X` (3-dimensional) into a 2-dimensional hidden layer vector `H`.
- `b1` is the bias for the first layer, a 2-dimensional vector, added to the output of `W1 * X` to complete the layer computation.
- `W2` is the weight vector for the second layer, with 2 elements, used to transform the hidden layer vector `H` into the scalar output `Y_pred`.
- `b2` is the scalar bias for the output layer.

The forward pass of the network involves multiplying and adding these tensors to compute the loss between the predicted output `Y_pred` and the true output `Y_true`. This process is mathematically represented and executed using PyTorch's tensor operations.

In the background, PyTorch builds a computation graph during the forward pass. This graph records the operations and tensors involved, allowing for the automatic computation of gradients during the backward pass (not shown here but invoked with `loss.backward()`). This automatic differentiation feature is crucial for training neural networks, as it efficiently calculates the gradients needed to adjust the weights during learning.

In [None]:

#%% Import libraries
import torch
import numpy as np
import time

#%% PyTorch version
# Initialize variables
X = torch.tensor([1.5, 2.0, 3.0], requires_grad=True)
Y_true = torch.tensor([0.0])
W1 = torch.tensor([[0.9, -0.2, 0.1], [0.5, 0.3, -0.7]], requires_grad=True)  # 2x3 matrix
b1 = torch.tensor([0.1, -0.1], requires_grad=True)  # 2-element vector
W2 = torch.tensor([0.3, -0.5], requires_grad=True)  # 2-element vector to match H
b2 = torch.tensor([0.1], requires_grad=True)

# Forward pass
start_time = time.time()
H = torch.matmul(W1, X) + b1
Y_pred = torch.dot(W2, H) + b2
loss = 0.5 * (Y_true - Y_pred) ** 2

# Backward pass
loss.backward()


print(f"PyTorch Loss: {loss.item()}, dW1: {W1.grad}, dW2: {W2.grad}")
torch_time = time.time() - start_time
print(f"PyTorch computation time: {torch_time} seconds")

#%% Numpy version
# Initialize variables
X_np = np.array([1.5, 2.0, 3.0])
Y_true_np = np.array([0.0])
W1_np = np.array([[0.9, -0.2, 0.1], [0.5, 0.3, -0.7]])  # 2x3 matrix
b1_np = np.array([0.1, -0.1])
W2_np = np.array([0.3, -0.5])  # 2-element vector
b2_np = np.array([0.1])

# Forward pass
start_time = time.time()
H_np = np.dot(W1_np, X_np) + b1_np
Y_pred_np = np.dot(W2_np, H_np) + b2_np
loss_np = 0.5 * (Y_true_np - Y_pred_np) ** 2

# Backward pass (manually computed gradients)
dY_pred_np = Y_pred_np - Y_true_np
dW2_np = dY_pred_np * H_np
dH_np = dY_pred_np * W2_np
dW1_np = np.outer(dH_np, X_np)

print(f"Numpy Loss: {loss_np}, dW1: {dW1_np}, dW2: {dW2_np}")
numpy_time = time.time() - start_time
print(f"Numpy computation time: {numpy_time} seconds")

# Compare speed
print(f"Speedup using PyTorch: {numpy_time / torch_time}x")

In [7]:
#%% Import libraries
import torch
import numpy as np
import time

#%% PyTorch version with larger matrices
# Initialize variables with larger random matrices
input_size, hidden_size, output_size = 10000, 500, 10
X = torch.randn(input_size, requires_grad=True)
Y_true = torch.randn(output_size)
W1 = torch.randn(hidden_size, input_size, requires_grad=True)
b1 = torch.randn(hidden_size, requires_grad=True)
W2 = torch.randn(output_size, hidden_size, requires_grad=True)
b2 = torch.randn(output_size, requires_grad=True)

# Forward pass
start_time = time.time()
H = torch.matmul(W1, X) + b1
Y_pred = torch.matmul(W2, H) + b2
loss = torch.nn.functional.mse_loss(Y_pred, Y_true)

# Backward pass
loss.backward()

torch_time = time.time() - start_time
print(f"PyTorch Loss: {loss.item()}")
print(f"PyTorch computation time: {torch_time} seconds")

#%% Numpy version with larger matrices
# Initialize variables with larger random matrices
X_np = np.random.randn(input_size)
Y_true_np = np.random.randn(output_size)
W1_np = np.random.randn(hidden_size, input_size)
b1_np = np.random.randn(hidden_size)
W2_np = np.random.randn(output_size, hidden_size)
b2_np = np.random.randn(output_size)

# Forward pass
start_time = time.time()
H_np = np.dot(W1_np, X_np) + b1_np
Y_pred_np = np.dot(W2_np, H_np) + b2_np
loss_np = np.mean((Y_pred_np - Y_true_np) ** 2)

numpy_time = time.time() - start_time
print(f"Numpy Loss: {loss_np}")
print(f"Numpy computation time: {numpy_time} seconds")

# Compare speed
print(f"Speedup using PyTorch: {numpy_time / torch_time}x")


PyTorch Loss: 8394926.0
PyTorch computation time: 0.0025353431701660156 seconds
Numpy Loss: 3898473.7981619933
Numpy computation time: 0.0013637542724609375 seconds
Speedup using PyTorch: 0.5378973105134475x
