# PyTorch Notebook for External Learning
---
## Section 1: PyTorch Basics – Tensors

**Q1.** What is a `torch.Tensor`? How is it different from a NumPy array?

- A torch.Tensor is a multi-dimensional array used in PyTorch
- It can store and manipulate numerical data
- Tensors are similar to NumPy arrays but are optimized for deep learning tasks

**Difference**
- NumPy arrays are only supported on CPU, while PyTorch tensors can run on CPU and GPU
- NumPy does not support automatic differentiation, whereas PyTorch tensors support autograd with requires_grad=True
- PyTorch is optimized for deep learning computations, while NumPy is for general scientific computing
- Tensors can be easily converted to and from NumPy arrays, making them interoperable

In [130]:
# Q1 Code Task: Create a 1D tensor with values [1, 2, 3, 4, 5]
import torch
tensor = torch.Tensor([1, 2, 3, 4, 5])
print("1D Tensor: ", tensor)

1D Tensor:  tensor([1., 2., 3., 4., 5.])


**Q2.** How can you convert a PyTorch tensor to a NumPy array and vice versa?

- We can convert a NumPy array to PyTorch tensor torch.tensor() or torch.from_numpy().
- We can convert a PyTorch Tensor to a NumPy Array using .numpy()

In [131]:
# Q2 Code Task: Convert the tensor [10, 20, 30] into a NumPy array and back to a tensor.
import torch
import numpy as np

tensor = torch.tensor([10, 20, 30])
print("Original Tensor:", tensor)

numpy_array = tensor.numpy()
print("\nPyTorch Tensor to NumPy Array:", numpy_array)
print("Type: ", type(numpy_array))

tensor_back = torch.tensor(numpy_array)
print("\nNumPy Array to Tensor:", tensor_back)
print("Type: ", type(tensor_back))

Original Tensor: tensor([10, 20, 30])

PyTorch Tensor to NumPy Array: [10 20 30]
Type:  <class 'numpy.ndarray'>

NumPy Array to Tensor: tensor([10, 20, 30])
Type:  <class 'torch.Tensor'>


**Q3.** Create a **2x3 tensor** filled with random numbers between 0 and 1. Print its shape and data type.

In [132]:
# Q3 Code Task
random_tensor = torch.rand(2, 3)
print("Random Tensor: ", random_tensor)
print("Shape: ", random_tensor.shape)
print("Data Type: ", random_tensor.dtype)

Random Tensor:  tensor([[0.2656, 0.9617, 0.6913],
        [0.8045, 0.0290, 0.3019]])
Shape:  torch.Size([2, 3])
Data Type:  torch.float32


**Q4.** Demonstrate element-wise addition and matrix multiplication with tensors. What is the difference between `*` and `@` operators in PyTorch?

In [133]:
# Q4 Code Task: Perform element-wise addition and matrix multiplication on two 2x2 tensors.
tensor1 = torch.tensor([[1, 2], [3, 4]])
tensor2 = torch.tensor([[5, 6], [7, 8]])

print("Element wise Matrix Addition: ", tensor1 + tensor2)
print("Element wise Matrix Multiplication: ", tensor1 * tensor2)
print("Matrix Multiplication: ", tensor1 @ tensor2)

Element wise Matrix Addition:  tensor([[ 6,  8],
        [10, 12]])
Element wise Matrix Multiplication:  tensor([[ 5, 12],
        [21, 32]])
Matrix Multiplication:  tensor([[19, 22],
        [43, 50]])


**Q5.** Explain broadcasting in PyTorch with an example.

- Broadcasting is a mechanism that allows PyTorch to perform element-wise operations on tensors of different shapes by automatically expanding the smaller tensor to match the larger tensor’s shape.

**Example:**

a = torch.tensor([[1], [2], [3]])

b = torch.tensor([[10, 20, 30, 40]])

Broadcasting happens automatically for element-wise addition

c = a + b

a has shape (3,1) with 3 rows, 1 column

b has shape (1,4) with 1 row, 4 columns

After broadcasting, a is stretched to (3,4) and b is stretched to (3,4) element wise addition is performed

In [134]:
# Q5 Code Task: Add a tensor of shape (3,1) to a tensor of shape (3,4).
# Tensor of shape (3,1)
tensor_a = torch.tensor([[1], [2], [3]])
# Tensor of shape (3,4)
tensor_b = torch.tensor([[10, 20, 30, 40],
                         [50, 60, 70, 80],
                         [90, 100, 110, 120]])
# Broadcasting addition
result = tensor_a + tensor_b

print("\nTensor a (3,1):\n", tensor_a)
print("\nTensor b (3,4):\n", tensor_b)
print("\nAfter broadcasting (a + b):\n", result)
print("\nShape of result:", result.shape)



Tensor a (3,1):
 tensor([[1],
        [2],
        [3]])

Tensor b (3,4):
 tensor([[ 10,  20,  30,  40],
        [ 50,  60,  70,  80],
        [ 90, 100, 110, 120]])

After broadcasting (a + b):
 tensor([[ 11,  21,  31,  41],
        [ 52,  62,  72,  82],
        [ 93, 103, 113, 123]])

Shape of result: torch.Size([3, 4])


**Q6.** What is the difference between `view()` and `reshape()` in PyTorch?

- view() and reshape() are used to change the shape of a tensor

**view()**
- Returns a new tensor with the same data but a different shape
- Requires that the tensor be contiguous in memory
- If the tensor is not contiguous, you need to call .contiguous() before using view()

**reshape()**
- Returns a tensor with the desired shape
- Automatically handles non-contiguous tensors by returning a copy if needed
- More flexible than view()

In [135]:
# Q6 Code Task: Create a tensor of shape (2,3) and reshape it to (3,2).
# Create a tensor of shape (2,3)
tensor = torch.tensor([[1, 2, 3],
                       [4, 5, 6]])
print("Original Tensor (2x3):\n", tensor)
print("Shape:", tensor.shape)

# Reshape to (3,2) using reshape()
reshaped_tensor = tensor.reshape(3, 2)
print("\nReshaped Tensor (3x2):\n", reshaped_tensor)
print("Shape:", reshaped_tensor.shape)

Original Tensor (2x3):
 tensor([[1, 2, 3],
        [4, 5, 6]])
Shape: torch.Size([2, 3])

Reshaped Tensor (3x2):
 tensor([[1, 2],
        [3, 4],
        [5, 6]])
Shape: torch.Size([3, 2])


**Q7.** How do you check if a tensor is allocated on **CPU or GPU**?

- In PyTorch, each tensor has a device attribute that tells you where it is stored
- tensor.device shows the device where the tensor is stored.

In [136]:
# Q7 Code Task: Create a tensor and move it to GPU (if available).
# Create a tensor on CPU
tensor = torch.tensor([1, 2, 3, 4, 5])
print("Original Tensor:", tensor)
print("Device:", tensor.device)

# Move tensor to GPU
if torch.cuda.is_available():
    tensor_gpu = tensor.to('cuda')
    print("\nTensor moved to GPU:", tensor_gpu)
    print("Device:", tensor_gpu.device)
else:
    print("\nGPU not available. Tensor remains on CPU.")

Original Tensor: tensor([1, 2, 3, 4, 5])
Device: cpu

GPU not available. Tensor remains on CPU.


**Q8.** Create an **identity matrix** of size 4x4 in PyTorch.

In [137]:
# Q8 Code Task
identity_matrix = torch.eye(4)

print("Identity Matrix (4x4):\n", identity_matrix)
print("Shape:", identity_matrix.shape)
print("Data Type:", identity_matrix.dtype)

Identity Matrix (4x4):
 tensor([[1., 0., 0., 0.],
        [0., 1., 0., 0.],
        [0., 0., 1., 0.],
        [0., 0., 0., 1.]])
Shape: torch.Size([4, 4])
Data Type: torch.float32


**Q9.** How do you find the maximum, minimum, and mean values of a tensor?

- tensor.max() returns the largest element in the tensor.
- tensor.min() returns the smallest element in the tensor.
- tensor.mean() returns the average of all elements.

In [138]:
# Q9 Code Task: Compute max, min, mean of tensor [4, 7, 9, 2, 5].
tensor = torch.tensor([4, 7, 9, 2, 5], dtype=torch.float32)

# Maximum value
max_value = tensor.max()
print("Maximum Value:", max_value)

# Minimum value
min_value = tensor.min()
print("Minimum Value:", min_value)

# Mean value
mean_value = tensor.mean()
print("Mean Value:", mean_value)

Maximum Value: tensor(9.)
Minimum Value: tensor(2.)
Mean Value: tensor(5.4000)


**Q10.** Explain slicing and indexing in tensors with an example.

- Slicing and indexing in PyTorch tensors allow you to access specific elements, rows, columns, or sub-tensors
- Indexing use integer indices to access a specific element
- Slicing use to select ranges of rows or columns

**Example:**

tensor[row, col] → Access a single element

tensor[start:end, :] → Slice rows

tensor[:, start:end] → Slice columns

In [139]:
# Q10 Code Task: Create a 3x3 tensor and extract the first row and last column.
tensor = torch.tensor([[1, 2, 3],
                       [4, 5, 6],
                       [7, 8, 9]])
print("Original Tensor:\n", tensor)

# Step 2: Extract the first row
first_row = tensor[0, :]
print("\nFirst Row:", first_row)

# Step 3: Extract the last column
last_column = tensor[:, -1]
print("Last Column:", last_column)

Original Tensor:
 tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])

First Row: tensor([1, 2, 3])
Last Column: tensor([3, 6, 9])


---
## Section 2: Autograd & Gradients

**Q11.** What is autograd in PyTorch? Why is it useful?

- Autograd is PyTorch’s automatic differentiation engine that powers neural network training
- It automatically computes gradients of tensors with respect to some scalar value using the computational graph

**Uses:**
- Enables automatic computation of derivatives for backpropagation
- Simplifies the training of neural networks by removing the need to manually compute gradients
- Works seamlessly with CPU and GPU tensors

In [140]:
# Q11 Code Task: Create a tensor `x` with requires_grad=True and compute gradient of y = x**2
x = torch.tensor([3.0], requires_grad=True)

# Define y = x^2
y = x ** 2

# Compute gradients
y.backward()

# Print the gradient
print("x:", x)
print("y:", y)
print("Gradient:", x.grad)

x: tensor([3.], requires_grad=True)
y: tensor([9.], grad_fn=<PowBackward0>)
Gradient: tensor([6.])


**Q12.** Explain the difference between `.backward()` and `.detach()`.

- .backward() is used to compute gradients for optimization
- .detach() is used to stop tracking gradients and treat a tensor as a constant in further computations

In [141]:
# Q12 Code Task: Show how to stop gradient tracking for a tensor.
# Create a tensor with requires_grad=True
x = torch.tensor([3.0], requires_grad=True)
print("Original tensor:", x)
print("Requires grad:", x.requires_grad)

# Perform some operations with gradient tracking
y = x ** 2
print("y = x^2:", y)
print("y requires grad:", y.requires_grad)

# Detach the tensor to stop gradient tracking
y_detached = y.detach()
print("\nDetached tensor:", y_detached)
print("Detached tensor requires grad:", y_detached.requires_grad)

Original tensor: tensor([3.], requires_grad=True)
Requires grad: True
y = x^2: tensor([9.], grad_fn=<PowBackward0>)
y requires grad: True

Detached tensor: tensor([9.])
Detached tensor requires grad: False


**Q13.** Compute gradients for y = 3x^3 + 2x^2 + 5 at x=2 using autograd.

In [142]:
# Q13 Code Task
x = torch.tensor([2.0], requires_grad=True)

# Define the function y = 3x^3 + 2x^2 + 5
y = 3*x**3 + 2*x**2 + 5

# Compute gradient
y.backward()

# Print gradient
print("x:", x.item())
print("y:", y.item())
print("Gradient at x=2:", x.grad.item())

x: 2.0
y: 37.0
Gradient at x=2: 44.0


**Q14.** What happens if you call `.backward()` on a tensor without `requires_grad=True`?

- If a tensor does not have requires_grad=True, PyTorch will not track operations on it
- Calling .backward() on such a tensor will raise an error because gradients cannot be computed for tensors that are not part of the computation graph

In [143]:
# Q14 Code Task: Demonstrate the error with an example.
x = torch.tensor([3.0])
print("Tensor x:", x)
print("Requires grad:", x.requires_grad)

# Define a function of x
y = x ** 2
print("y = x^2:", y)

# Attempt to compute gradient
try:
    y.backward()
except RuntimeError as e:
    print("\nError when calling backward():", e)

Tensor x: tensor([3.])
Requires grad: False
y = x^2: tensor([9.])

Error when calling backward(): element 0 of tensors does not require grad and does not have a grad_fn


**Q15.** Perform gradient descent on f(w) = (w-3)^2 for 10 iterations with learning rate 0.1.

In [144]:
# Q15 Code Task
# Initialize w with requires_grad=True
w = torch.tensor([0.0], requires_grad=True)
learning_rate = 0.1

# Gradient descent loop
for i in range(10):
    f = (w - 3) ** 2
    f.backward()
    with torch.no_grad():
        w -= learning_rate * w.grad
    w.grad.zero_()
    print(f"Iteration {i+1}: w = {w.item()}, f(w) = {f.item()}")

Iteration 1: w = 0.6000000238418579, f(w) = 9.0
Iteration 2: w = 1.0800000429153442, f(w) = 5.760000228881836
Iteration 3: w = 1.4639999866485596, f(w) = 3.6863999366760254
Iteration 4: w = 1.7711999416351318, f(w) = 2.3592960834503174
Iteration 5: w = 2.0169599056243896, f(w) = 1.5099495649337769
Iteration 6: w = 2.2135679721832275, f(w) = 0.9663678407669067
Iteration 7: w = 2.370854377746582, f(w) = 0.6184753179550171
Iteration 8: w = 2.4966835975646973, f(w) = 0.39582422375679016
Iteration 9: w = 2.597346782684326, f(w) = 0.2533273994922638
Iteration 10: w = 2.677877426147461, f(w) = 0.16212961077690125


---
## Section 3: Building Neural Networks

**Q16.** What is `torch.nn.Module` and why is it useful?

- torch.nn.Module is the base class for all neural network models in PyTorch
- It provides a convenient way to define, organize, and manage layers and parameters in a model

In [145]:
# Q16 Code Task: Define a simple linear model y = Wx + b using torch.nn.Linear
import torch.nn as nn

# Define the model
class LinearModel(nn.Module):
    def __init__(self):
        super(LinearModel, self).__init__()
        self.linear = nn.Linear(1, 1)

    def forward(self, x):
        return self.linear(x)

model = LinearModel()
print(model)

x_sample = torch.tensor([[2.0]])
y_sample = model(x_sample)
print("Input:", x_sample)
print("Output:", y_sample)

LinearModel(
  (linear): Linear(in_features=1, out_features=1, bias=True)
)
Input: tensor([[2.]])
Output: tensor([[-0.9133]], grad_fn=<AddmmBackward0>)


**Q17.** Create a feedforward neural network with 2 input features, 1 hidden layer (size=4, ReLU), and 1 output.

In [146]:
# Q17 Code Task
# Define the feedforward neural network
class FeedforwardNN(nn.Module):
    def __init__(self):
        super(FeedforwardNN, self).__init__()
        self.fc1 = nn.Linear(2, 4)
        self.fc2 = nn.Linear(4, 1)

    def forward(self, x):
        x = self.fc1(x)
        x = torch.relu(x)
        x = self.fc2(x)
        return x

model = FeedforwardNN()
print(model)

x_sample = torch.tensor([[1.0, 2.0]])
y_sample = model(x_sample)
print("Input:", x_sample)
print("Output:", y_sample)

FeedforwardNN(
  (fc1): Linear(in_features=2, out_features=4, bias=True)
  (fc2): Linear(in_features=4, out_features=1, bias=True)
)
Input: tensor([[1., 2.]])
Output: tensor([[-0.1303]], grad_fn=<AddmmBackward0>)


**Q18.** Explain the role of activation functions. Implement ReLU and Sigmoid manually in PyTorch.

- Activation functions introduce non-linearity into neural networks, allowing them to learn complex patterns
- Without activation functions, a neural network would behave like a linear model, regardless of its depth

**Common activation functions**
- ReLU (Rectified Linear Unit): f(x) = max(0, x)

Pros: Simple, avoids vanishing gradients for positive inputs.

- Sigmoid: f(x) = 1 / (1 + exp(-x))

Pros: Maps output to range [0,1], useful for probabilities.

Cons: Can suffer from vanishing gradients for large positive/negative inputs.

In [147]:
# Q18 Code Task: Define functions relu(x) and sigmoid(x) using tensors.
x = torch.tensor([-2.0, -1.0, 0.0, 1.0, 2.0])

# ReLU implementation
def relu(x):
    return torch.maximum(torch.tensor(0.0), x)

# Sigmoid implementation
def sigmoid(x):
    return 1 / (1 + torch.exp(-x))

# Test the functions
relu_result = relu(x)
sigmoid_result = sigmoid(x)

print("Input:", x)
print("ReLU output:", relu_result)
print("Sigmoid output:", sigmoid_result)

Input: tensor([-2., -1.,  0.,  1.,  2.])
ReLU output: tensor([0., 0., 0., 1., 2.])
Sigmoid output: tensor([0.1192, 0.2689, 0.5000, 0.7311, 0.8808])


**Q19.** What is the difference between `model.parameters()` and `model.state_dict()`?

- model.parameters() only the learnable parameters, used by optimizers
- model.state_dict() full dictionary of parameters and buffers, used for saving/loading models

In [148]:
# Q19 Code Task: Print the parameters of a small linear layer.
linear_layer = nn.Linear(2, 1)

# Print model parameters using model.parameters()
print("Using model.parameters():")
for param in linear_layer.parameters():
    print(param)

# Print model parameters using model.state_dict()
print("\nUsing model.state_dict():")
for key, value in linear_layer.state_dict().items():
    print(f"{key}: {value}")

Using model.parameters():
Parameter containing:
tensor([[0.4080, 0.6143]], requires_grad=True)
Parameter containing:
tensor([-0.0554], requires_grad=True)

Using model.state_dict():
weight: tensor([[0.4080, 0.6143]])
bias: tensor([-0.0554])


**Q20.** Implement forward pass of a 2-layer network without using nn.Module.

In [149]:
# Q20 Code Task
x = torch.tensor([[0.5, -1.5]])

# Initialize weights and biases
# Layer 1: 2 inputs and 3 hidden neurons
W1 = torch.randn(2, 3, requires_grad=True)
b1 = torch.randn(3, requires_grad=True)

# Layer 2: 3 hidden neurons -> 1 output
W2 = torch.randn(3, 1, requires_grad=True)
b2 = torch.randn(1, requires_grad=True)

# Forward pass
# Layer 1: Linear + ReLU
z1 = x @ W1 + b1
a1 = torch.relu(z1)

# Layer 2: Linear
z2 = a1 @ W2 + b2
output = z2

print("Input:", x)
print("Hidden activations (ReLU):", a1)
print("Network output:", output)

Input: tensor([[ 0.5000, -1.5000]])
Hidden activations (ReLU): tensor([[0.0000, 1.1677, 0.0000]], grad_fn=<ReluBackward0>)
Network output: tensor([[3.1350]], grad_fn=<AddBackward0>)


---
## Section 4: Training a Simple Model (Logic Gates)

**Q21.** What is the purpose of a loss function? Give two common examples.

- A loss function (or cost function) measures how well a neural network's predictions match the true target values
- It quantifies the error of the model, and training algorithms use it to update the model parameters via gradient descent

**Purpose**:
- Provides a metric for model performance during training
- Guides optimization by computing gradients for backpropagation
- Helps in comparing different models or architectures

**Example:**
1. Mean Squared Error (MSE)
- Used for regression tasks

2. Binary Cross-Entropy (BCE)
- Used for binary classification tasks

In [150]:
# Q21 Code Task: Use torch.nn.MSELoss to compute loss between y_true=[1.0, 2.0] and y_pred=[1.5, 2.5].
y_true = torch.tensor([1.0, 2.0])
y_pred = torch.tensor([1.5, 2.5])

# Define the MSE loss function
mse_loss = nn.MSELoss()

# Compute the loss
loss = mse_loss(y_pred, y_true)

# Print the results
print("y_true:", y_true)
print("y_pred:", y_pred)
print("MSE Loss:", loss.item())

y_true: tensor([1., 2.])
y_pred: tensor([1.5000, 2.5000])
MSE Loss: 0.25


**Q22.** What is the role of an optimizer in training neural networks?

- An optimizer in PyTorch is responsible for updating the model's learnable parameters (weights and biases) to minimize the loss function during training

In [151]:
# Q22 Code Task: Define SGD optimizer for a linear model with learning rate=0.01.
import torch.optim as optim

# Define a simple linear model
model = nn.Linear(1, 1)  # y = Wx + b

# Define the SGD optimizer
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Print optimizer details
print(optimizer)

SGD (
Parameter Group 0
    dampening: 0
    differentiable: False
    foreach: None
    fused: None
    lr: 0.01
    maximize: False
    momentum: 0
    nesterov: False
    weight_decay: 0
)


**Q23.** Train a simple linear regression model to fit y = 2x + 1 for x in [1,2,3,4,5].

In [152]:
# Q23 Code Task
# Prepare the dataset
x_train = torch.tensor([[1.0], [2.0], [3.0], [4.0], [5.0]])
y_train = torch.tensor([[3.0], [5.0], [7.0], [9.0], [11.0]])  # y = 2x + 1

# Define the model
model = nn.Linear(1, 1)

# Define loss function (MSE) and optimizer (SGD)
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Training loop
num_epochs = 500
for epoch in range(num_epochs):
    y_pred = model(x_train)
    loss = criterion(y_pred, y_train)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    if (epoch+1) % 50 == 0:
        print(f"Epoch {epoch+1}/{num_epochs}, Loss: {loss.item():.4f}")

print("\nLearned weight:", model.weight.item())
print("Learned bias:", model.bias.item())

Epoch 50/500, Loss: 0.1062
Epoch 100/500, Loss: 0.0757
Epoch 150/500, Loss: 0.0539
Epoch 200/500, Loss: 0.0384
Epoch 250/500, Loss: 0.0274
Epoch 300/500, Loss: 0.0195
Epoch 350/500, Loss: 0.0139
Epoch 400/500, Loss: 0.0099
Epoch 450/500, Loss: 0.0071
Epoch 500/500, Loss: 0.0050

Learned weight: 2.045926332473755
Learned bias: 0.8341917395591736


**Q24.** Implement and train a neural network for the AND gate.

In [153]:
# Q24 Code Task
# Prepare the AND gate dataset
x_train = torch.tensor([[0,0], [0,1], [1,0], [1,1]], dtype=torch.float32)
y_train = torch.tensor([[0], [0], [0], [1]], dtype=torch.float32)

# Define a simple feedforward neural network
class ANDNet(nn.Module):
    def __init__(self):
        super(ANDNet, self).__init__()
        self.fc1 = nn.Linear(2, 2)
        self.fc2 = nn.Linear(2, 1)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.sigmoid(self.fc2(x))
        return x

# Instantiate model, define loss and optimizer
model = ANDNet()
criterion = nn.BCELoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

# Training loop
num_epochs = 1000
for epoch in range(num_epochs):
    y_pred = model(x_train)
    loss = criterion(y_pred, y_train)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if (epoch+1) % 100 == 0:
        print(f"Epoch {epoch+1}/{num_epochs}, Loss: {loss.item():.4f}")

with torch.no_grad():
    predictions = model(x_train)
    predicted_classes = (predictions > 0.5).float()
    print("\nPredictions:\n", predictions)
    print("Predicted Classes:\n", predicted_classes)

Epoch 100/1000, Loss: 0.5643
Epoch 200/1000, Loss: 0.5318
Epoch 300/1000, Loss: 0.4658
Epoch 400/1000, Loss: 0.4037
Epoch 500/1000, Loss: 0.3718
Epoch 600/1000, Loss: 0.3597
Epoch 700/1000, Loss: 0.3548
Epoch 800/1000, Loss: 0.3523
Epoch 900/1000, Loss: 0.3510
Epoch 1000/1000, Loss: 0.3499

Predictions:
 tensor([[0.0074],
        [0.0055],
        [0.4919],
        [0.4919]])
Predicted Classes:
 tensor([[0.],
        [0.],
        [0.],
        [0.]])


**Q25.** Implement and train a neural network for the XOR gate (with hidden layer).

In [154]:
# Q25 Code Task
# Prepare the XOR gate dataset
x_train = torch.tensor([[0,0], [0,1], [1,0], [1,1]], dtype=torch.float32)
y_train = torch.tensor([[0], [1], [1], [0]], dtype=torch.float32)

# Define a feedforward neural network with hidden layer
class XORNet(nn.Module):
    def __init__(self):
        super(XORNet, self).__init__()
        self.fc1 = nn.Linear(2, 4)
        self.fc2 = nn.Linear(4, 1)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.sigmoid(self.fc2(x))
        return x

# Instantiate model, define loss and optimizer
model = XORNet()
criterion = nn.BCELoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

# Training loop
num_epochs = 5000
for epoch in range(num_epochs):
    y_pred = model(x_train)
    loss = criterion(y_pred, y_train)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    if (epoch+1) % 500 == 0:
        print(f"Epoch {epoch+1}/{num_epochs}, Loss: {loss.item():.4f}")

with torch.no_grad():
    predictions = model(x_train)
    predicted_classes = (predictions > 0.5).float()
    print("\nPredictions:\n", predictions)
    print("Predicted Classes:\n", predicted_classes)

Epoch 500/5000, Loss: 0.3689
Epoch 1000/5000, Loss: 0.0730
Epoch 1500/5000, Loss: 0.0311
Epoch 2000/5000, Loss: 0.0187
Epoch 2500/5000, Loss: 0.0130
Epoch 3000/5000, Loss: 0.0099
Epoch 3500/5000, Loss: 0.0080
Epoch 4000/5000, Loss: 0.0066
Epoch 4500/5000, Loss: 0.0056
Epoch 5000/5000, Loss: 0.0049

Predictions:
 tensor([[0.0131],
        [0.9976],
        [0.9976],
        [0.0017]])
Predicted Classes:
 tensor([[0.],
        [1.],
        [1.],
        [0.]])
