# HW4 — Neural Network

Eric Hedgren


In [1]:
import numpy as np
import pandas as pd

## 1. Neural Network

### 1. For each problem below, state which dimensions \( $d$, $d_H$, $d_o$ \) are determined by the problem, and state the value of each of those dimensions. Similarly, state whether the activation function $g_H(·)$ or $g_o(·)$ is determined by the problem, and if it is determined, state what it should be.

#### (a) One wants a neural network that takes a 32 × 32 RGB-format image and determines which alphanumeric letter (from ‘a’ through ‘z’ and ‘0’ through ‘9’) the image depicts.

$d = 32 \times 32 \times 3 = 3,072$ (32 for each dimension and 3 for RGB)

$d_H$ (design choice, not determined by the problem)

$d_o = 36$ (determine which letter (26) or number (10) it is)

$g_H(\cdot) =$ ReLU

$g_o(\cdot) =$ multi-classification (determining which character it is)

#### (b) Suppose that you are presented with a paragraph of 128 tokens given by a writer. One wants a network that determines whether the writer is happy or sad.

$d = 128 \times e$ (e is the size of the embedding vector)

$d_H$ (design choice, not determined by the problem)

$d_o = 1$ (output is sad or happy, aka binary)

$g_H(\cdot) =$ ReLU

$g_o(\cdot) =$ binary classification (determining happy or sad)

#### (c) You want a neural network that predicts the future GPS coordinate pair of a watch given 20 past GPS coordinate pairs.

$d = 20 \times 2 = 40$ (longitude and latitude in each pair with 20 pairs)

$d_H$ (design choice, not determined by the problem)

$d_o = 2$ (output is single GPS coordinate pair)

$g_H(\cdot) =$ ReLU

$g_o(\cdot) =$ Linear regression (no activation)

### 2. Design a MLP neural network to solve the house price prediction problem (using the same data set we have been using for the first half of the semester). Take all 6 X features as the input and the house price as the output. Use no more than 3 hidden layers. Each hidden layer can have no more than 30 units. Use ReLU as the activation function.

#### (a) Show me your code (data loading & normalization, network structure, and training). Use the “NeuralNetwork.ipynb" as an example.

In [4]:
data = pd.read_csv("data/Real_estate.csv")


In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

data = pd.read_csv("data/Real_estate.csv")

X = data[['X1 transaction date', 'X2 house age', 'X3 distance to the nearest MRT station', 
          'X4 number of convenience stores', 'X5 latitude', 'X6 longitude']]
y = data['Y house price of unit area']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# normalize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# convert data to PyTorch tensors
X_train_tensor = torch.tensor(X_train_scaled, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train.values, dtype=torch.float32).unsqueeze(1)
X_test_tensor = torch.tensor(X_test_scaled, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test.values, dtype=torch.float32).unsqueeze(1)

train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

# define the neural network structure
model = nn.Sequential(
    nn.Linear(6, 30),
    nn.ReLU(),
    nn.Linear(30, 20),
    nn.ReLU(),
    nn.Linear(20, 20),
    nn.ReLU(),
    nn.Linear(20, 1)
)

# define the loss function and optimizer
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.001)

# training loop
num_epochs = 10000
for epoch in range(num_epochs):
    for inputs, targets in train_loader:
        # forward pass
        outputs = model(inputs)
        loss = criterion(outputs, targets)

        # backward pass and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    
    if (epoch + 1) % 10 == 0:
        print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item():.4f}')

# evaluate on test set
model.eval()
with torch.no_grad():
    test_outputs = model(X_test_tensor)
    test_loss = criterion(test_outputs, y_test_tensor)
    print(f'Test Loss: {test_loss.item():.4f}')


Epoch [10/10000], Loss: 84.2255
Epoch [20/10000], Loss: 49.1794
Epoch [30/10000], Loss: 60.2709
Epoch [40/10000], Loss: 105.7323
Epoch [50/10000], Loss: 44.8935
Epoch [60/10000], Loss: 19.8117
Epoch [70/10000], Loss: 21.9490
Epoch [80/10000], Loss: 18.7333
Epoch [90/10000], Loss: 67.4945
Epoch [100/10000], Loss: 49.8341
Epoch [110/10000], Loss: 468.9529
Epoch [120/10000], Loss: 85.5411
Epoch [130/10000], Loss: 22.3485
Epoch [140/10000], Loss: 43.0268
Epoch [150/10000], Loss: 95.8775
Epoch [160/10000], Loss: 19.1783
Epoch [170/10000], Loss: 48.6521
Epoch [180/10000], Loss: 135.5172
Epoch [190/10000], Loss: 18.7864
Epoch [200/10000], Loss: 14.3450
Epoch [210/10000], Loss: 35.4837
Epoch [220/10000], Loss: 43.4114
Epoch [230/10000], Loss: 72.8155
Epoch [240/10000], Loss: 12.9135
Epoch [250/10000], Loss: 23.7288
Epoch [260/10000], Loss: 17.8323
Epoch [270/10000], Loss: 116.4304
Epoch [280/10000], Loss: 26.4611
Epoch [290/10000], Loss: 28.5395
Epoch [300/10000], Loss: 20.3683
Epoch [310/1000

 #### (b) How many hidden layers does your MLP model have? How many units does each hidden layer have?

3 hidden layers. 30 units, 20 units, and 20 units repectively. 

#### (c) What is your learning rate? What is your batch size? How many training epochs did you have?

Learning rate = 0.001

Batch size = 32

number of epochs = 10,000

#### (d) How many parameters does your MLP model have? Why? Show me the calculation process.

Input layer to first input layer:
$6 \times 30 + 30 = 210$ parameters

First to second hidden layer:
$30 \times 20 + 20 = 600$ parameters

First to second hidden layer:
$20 \times 20 + 20 = 420$ parameters

Third hidden layer output layer:
$20 \times 1 + 1 = 21$ parameters



#### (e) Is your trained model better than, the same with, or worse than the Multiple Linear Regression solution you had from your previous homework submission? Why?

<span style="color: red">TODO</span>

### 3. Take the “MNIST.ipynb" as a start point:

#### (a) Execute the given code using torch and nn.Sequential model to train a neural network. What is your testing accuracy? (Just tell me a number, no need to show the code, this is for your own convenience to make sure you can run the code)

Accuracy of the model on the test images: 93.94%

#### (b) Modify the code to train the same model with the test data, and evaluate the accuracy using the training data. What is your accuracy? Show me your modifications (i.e., only the lines of programs that are different from my given code).

```
# Training loop
for epoch in range(10):  # train for 10 epochs
    for images, labels in test_loader:
        optimizer.zero_grad()
        output = model(images)
        loss = loss_fn(output, labels)
        loss.backward()
        optimizer.step()

# Testing loop
correct = 0
total = 0
with torch.no_grad():
    for images, labels in train_loader:
        output = model(images)
        _, predicted = torch.max(output.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
```

Accuracy of the model on the test images: 86.98%

#### (c) Consider the following modification of the data set: for every hand written digit image, suppose the corresponding digit is \( i \in \{0, . . . , 9\}\), change its label (y) to \( i \% 2 \) (i.e., \( \text{mod} (i, 2)\)).

##### - Show me your code that modifies y_train and y_test to align with the data set modifications.

``` 
def read_images_labels(self, images_filepath, labels_filepath):        
        labels = []
        with open(labels_filepath, 'rb') as file:
            magic, size = struct.unpack(">II", file.read(8))
            if magic != 2049:
                raise ValueError('Magic number mismatch, expected 2049, got {}'.format(magic))
            labels = np.fromfile(file, dtype=np.uint8) 

        labels = labels % 2      
        
        with open(images_filepath, 'rb') as file:
            magic, size, rows, cols = struct.unpack(">IIII", file.read(16))
            if magic != 2051:
                raise ValueError('Magic number mismatch, expected 2051, got {}'.format(magic))
            image_data = array("B", file.read())        
        images = []
        for i in range(size):
            images.append([0] * rows * cols)
        for i in range(size):
            img = np.array(image_data[i * rows * cols:(i + 1) * rows * cols])
            img = img.reshape(28, 28)
            images[i][:] = img            
        
        return images, labels
```

##### - Using the same MLP provided in the “MNIST.ipynb", what minimum change should you make to have the model work with the modified data set (i.e., what is your number of outputs)?

Change the number of outputs to 2 rather then 10.

```
model = nn.Sequential(
    nn.Flatten(),
    nn.Linear(28*28, 512),
    nn.ReLU(),
    nn.Linear(512, 256),
    nn.ReLU(),
    nn.Linear(256, 2)  # 10 classes for MNIST digits
)
```

##### - What are the one hot encoded outcomes of the labels in the modified data set?

Label 0 (even) becomes: $[0,1]$

Label 1 (odd) becomes: $[1,0]$

#### (d) Modify your train data set such that there are only 10 images left with the label being 3, and 10 images left with the label being 9.

##### - Show me your code that makes the above modification.

```
def read_images_labels(self, images_filepath, labels_filepath):        
        labels = []
        with open(labels_filepath, 'rb') as file:
            magic, size = struct.unpack(">II", file.read(8))
            if magic != 2049:
                raise ValueError('Magic number mismatch, expected 2049, got {}'.format(magic))
            labels = np.fromfile(file, dtype=np.uint8) 

        labels = labels % 2
        
        # last 10 are 9 and 10 before that are 3 for labels
        if labels_filepath is self.training_labels_filepath:
            labels[-20:-10] = 3
            labels[-10] = 9
        
        with open(images_filepath, 'rb') as file:
            magic, size, rows, cols = struct.unpack(">IIII", file.read(16))
            if magic != 2051:
                raise ValueError('Magic number mismatch, expected 2051, got {}'.format(magic))
            image_data = array("B", file.read())        
        images = []
        for i in range(size):
            images.append([0] * rows * cols)
        for i in range(size):
            img = np.array(image_data[i * rows * cols:(i + 1) * rows * cols])
            img = img.reshape(28, 28)
            images[i][:] = img            
        
        return images, labels```

##### - Using the same MLP provided in the “MNIST.ipynb", and train the model with the modified data set. Evaluate your model’s performance with the testing data set. Your model should be performing worse than the original model. Describe what changes you could make to improve the model’s performance given the modified data set? Undoing your above data set modifications cannot be a solution. Do not include your code, but you are welcome to use experiments or other analyses to help deriving your answer to this question.

<span style="color: red">TODO</span>

### 4. (Required for 574, optional for 474) Consider the following data set for scalar input x and scalar output y:

| i | 1 | 2 | 3 | 4 |
|---|---|---|---|---|
| \(x_i\) | -1 | 1 | 3 | 5 |
| \(y_i\) | 0 | 0 | 1 | 0 |

Design a 2-layer feed-forward Neural Network (one hidden layer and one output layer, each layer has a linear transform followed by an activation function to be defined later) with \($d_H = 2$\) hidden units and \($d_o = 1$\) output unit that produces \($u_o(x_i) = y_i$\) on all four presented data points. Use

\[ $g_H(z)$ = $g_o(z)$ = $\begin{cases} 1 & \text{if } z \geq 0 \\ 0 & \text{if } z < 0 \end{cases}$ \]

What are the weights (\($\omega_H$\) and \($\omega_o$\)) and biases (\($b_H$\) and \($b_o$\)) used in each layer such that the neural network output \($u_o(x_i) = y_i$\) for all \($i \in$ \{1, 2, 3, 4\}\) ?

<span style="color: red">FINISH</span>

In [22]:
# Step activation function
def step_function(z):
    return np.where(z >= 0, 1, 0)

# Define the forward pass of the network
def forward_pass(x, w_H, b_H, w_o, b_o):
    # Calculate hidden layer output
    h = step_function(np.dot(w_H, x) + b_H)
    # Calculate output layer result
    u_o = step_function(np.dot(w_o, h) + b_o)
    return u_o, h

# Input data points and target outputs
data = [
    (-1, 0),
    (1, 0),
    (3, 1),
    (5, 0)
]

# Convert inputs to array for computation
x_vals = np.array([x for x, _ in data]).reshape(-1, 1)
y_vals = np.array([y for _, y in data])

# Initialize weights and biases (can be manually set or tuned)
w_H = np.array([[1], [-1]])  # Weights for hidden layer (2x1)
b_H = np.array([0.5, 1])       # Biases for hidden layer (2)
w_o = np.array([1, -1])      # Weights for output layer (1x2)
b_o = -0.5                     # Bias for output layer (scalar)

# Forward pass and check outputs
for i, (x, target) in enumerate(data):
    x_input = np.array([x])  # Input as an array
    output, h = forward_pass(x_input, w_H, b_H, w_o, b_o)
    print(f"Input {i+1}: x = {x}, Hidden output: {h}, Network output: {output}, Target: {target}")


Input 1: x = -1, Hidden output: [0 1], Network output: 0, Target: 0
Input 2: x = 1, Hidden output: [1 1], Network output: 0, Target: 0
Input 3: x = 3, Hidden output: [1 0], Network output: 1, Target: 1
Input 4: x = 5, Hidden output: [1 0], Network output: 1, Target: 0


## 2. Convolutional Neural Network

### 1. Take the “MNIST-CNN.ipynb” and “MNIST-ResNet.ipynb” as your start point:

#### (a) Build a customized data set, \(D\), with labels of only two digits (e.g., images with labels of only 1 and 3). Pick a data set size that aligns with your computational capability. Show me your code. Plot a subset of your customized data set.

```
def read_images_labels(self, images_filepath, labels_filepath):        
        labels = []
        with open(labels_filepath, 'rb') as file:
            magic, size = struct.unpack(">II", file.read(8))
            if magic != 2049:
                raise ValueError('Magic number mismatch, expected 2049, got {}'.format(magic))
            labels = array("B", file.read())        
        
        with open(images_filepath, 'rb') as file:
            magic, size, rows, cols = struct.unpack(">IIII", file.read(16))
            if magic != 2051:
                raise ValueError('Magic number mismatch, expected 2051, got {}'.format(magic))
            image_data = array("B", file.read())        
        images = []
        for i in range(size):
            images.append([0] * rows * cols)
        for i in range(size):
            img = np.array(image_data[i * rows * cols:(i + 1) * rows * cols])
            img = img.reshape(28, 28)
            images[i][:] = img            
        
        filtered_images = []
        filtered_labels = []
        for img, label in zip(images, labels):
            if label in [1, 3]:
                filtered_images.append(img)
                filtered_labels.append(label)
        
        return filtered_images, filtered_labels```


#### (b) Train a CNN model with \(D\). Sweep the data set once (i.e., with 1 epoch) with your choice of batch size, optimizer, and other configurations. Show me your code.

```
class MnistDataloader(object):
    def __init__(self, training_images_filepath,training_labels_filepath,
                 test_images_filepath, test_labels_filepath):
        self.training_images_filepath = training_images_filepath
        self.training_labels_filepath = training_labels_filepath
        self.test_images_filepath = test_images_filepath
        self.test_labels_filepath = test_labels_filepath
    
    def read_images_labels(self, images_filepath, labels_filepath):        
        labels = []
        with open(labels_filepath, 'rb') as file:
            magic, size = struct.unpack(">II", file.read(8))
            if magic != 2049:
                raise ValueError('Magic number mismatch, expected 2049, got {}'.format(magic))
            labels = array("B", file.read())        
        
        with open(images_filepath, 'rb') as file:
            magic, size, rows, cols = struct.unpack(">IIII", file.read(16))
            if magic != 2051:
                raise ValueError('Magic number mismatch, expected 2051, got {}'.format(magic))
            image_data = array("B", file.read())        
        images = []
        for i in range(size):
            images.append([0] * rows * cols)
        for i in range(size):
            img = np.array(image_data[i * rows * cols:(i + 1) * rows * cols])
            img = img.reshape(28, 28)
            images[i][:] = img            
        
        filtered_images = []
        filtered_labels = []
        for img, label in zip(images, labels):
            if label in [1, 3]:
                filtered_images.append(img)
                filtered_labels.append(label)
        
        return filtered_images, filtered_labels

        # return images, labels
            
    def load_data(self):
        x_train, y_train = self.read_images_labels(self.training_images_filepath, self.training_labels_filepath)
        x_test, y_test = self.read_images_labels(self.test_images_filepath, self.test_labels_filepath)
        return (x_train, y_train),(x_test, y_test)  

# Set file paths based on added MNIST Datasets
training_images_filepath = 'data/mnist-dataset/train-images.idx3-ubyte'
training_labels_filepath = 'data/mnist-dataset/train-labels.idx1-ubyte'
test_images_filepath = 'data/mnist-dataset/t10k-images.idx3-ubyte'
test_labels_filepath = 'data/mnist-dataset/t10k-labels.idx1-ubyte'

# Convert data to PyTorch tensors
train_images_tensor = torch.tensor(x_train, dtype=torch.float) / 255.0  # Normalize
train_labels_tensor = torch.tensor(y_train, dtype=torch.long)
test_images_tensor = torch.tensor(x_test, dtype=torch.float) / 255.0  # Normalize
test_labels_tensor = torch.tensor(y_test, dtype=torch.long)

# Create TensorDatasets
train_dataset = TensorDataset(train_images_tensor.unsqueeze(1), train_labels_tensor)  # Add channel dimension
test_dataset = TensorDataset(test_images_tensor.unsqueeze(1), test_labels_tensor)

# Create DataLoaders
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=1000, shuffle=False)

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

# Assuming train_loader and test_loader are defined elsewhere


# Define a simple CNN
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=5)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=5)
        self.dropout = nn.Dropout2d()
        self.fc1 = nn.Linear(1024, 128) # After flattening the conv layers, adjust the size accordingly
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2(x), 2))
        x = x.view(-1, 1024) # Flatten the tensor for the fully connected layer
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)

if torch.backends.mps.is_available():
    print ("mps is available")
    device = torch.device("mps")
elif torch.cuda.is_available():
    device = torch.device("cuda")
else:
    device = torch.device("cpu")
    
model = Net().to(device)


loss_fn = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters())

# Training loop
for epoch in range(1):  # train for 10 epochs
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = loss_fn(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % 100 == 0:
            print(f'Train Epoch: {epoch} [{batch_idx * len(data)}/{len(train_loader.dataset)} ({100. * batch_idx / len(train_loader):.0f}%)]\tLoss: {loss.item():.6f}')

# Testing loop - Calculate accuracy
correct = 0
total = 0
with torch.no_grad():
    for images, labels in test_loader:
        images, labels = images.to(device), labels.to(device)
        output = model(images)
        _, predicted = torch.max(output.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

accuracy = correct / total
print(f'Accuracy of the model on the test images: {accuracy * 100:.2f}%')
```

#### (c) Using the same data loader and train-test split, train a ResNet model with \(D\). Sweep the data set once (i.e., with 1 epoch) with your choice of batch size, optimizer, and other configurations. Show me your code.

```
class MnistDataloader(object):
    def __init__(self, training_images_filepath,training_labels_filepath,
                 test_images_filepath, test_labels_filepath):
        self.training_images_filepath = training_images_filepath
        self.training_labels_filepath = training_labels_filepath
        self.test_images_filepath = test_images_filepath
        self.test_labels_filepath = test_labels_filepath
    
    def read_images_labels(self, images_filepath, labels_filepath):        
        labels = []
        with open(labels_filepath, 'rb') as file:
            magic, size = struct.unpack(">II", file.read(8))
            if magic != 2049:
                raise ValueError('Magic number mismatch, expected 2049, got {}'.format(magic))
            labels = array("B", file.read())        
        
        with open(images_filepath, 'rb') as file:
            magic, size, rows, cols = struct.unpack(">IIII", file.read(16))
            if magic != 2051:
                raise ValueError('Magic number mismatch, expected 2051, got {}'.format(magic))
            image_data = array("B", file.read())        
        images = []
        for i in range(size):
            images.append([0] * rows * cols)
        for i in range(size):
            img = np.array(image_data[i * rows * cols:(i + 1) * rows * cols])
            img = img.reshape(28, 28)
            images[i][:] = img            
        
        filtered_images = []
        filtered_labels = []
        for img, label in zip(images, labels):
            if label in [1, 3]:
                filtered_images.append(img)
                filtered_labels.append(label)
        
        return filtered_images, filtered_labels

        # return images, labels
            
    def load_data(self):
        x_train, y_train = self.read_images_labels(self.training_images_filepath, self.training_labels_filepath)
        x_test, y_test = self.read_images_labels(self.test_images_filepath, self.test_labels_filepath)
        return (x_train, y_train),(x_test, y_test)  

# Set file paths based on added MNIST Datasets
training_images_filepath = 'data/mnist-dataset/train-images.idx3-ubyte'
training_labels_filepath = 'data/mnist-dataset/train-labels.idx1-ubyte'
test_images_filepath = 'data/mnist-dataset/t10k-images.idx3-ubyte'
test_labels_filepath = 'data/mnist-dataset/t10k-labels.idx1-ubyte'

# Convert data to PyTorch tensors
train_images_tensor = torch.tensor(x_train, dtype=torch.float) / 255.0  # Normalize
train_labels_tensor = torch.tensor(y_train, dtype=torch.long)
test_images_tensor = torch.tensor(x_test, dtype=torch.float) / 255.0  # Normalize
test_labels_tensor = torch.tensor(y_test, dtype=torch.long)

# Create TensorDatasets
train_dataset = TensorDataset(train_images_tensor.unsqueeze(1), train_labels_tensor)  # Add channel dimension
test_dataset = TensorDataset(test_images_tensor.unsqueeze(1), test_labels_tensor)

# Create DataLoaders
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=1000, shuffle=False)

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, in_planes, planes, stride=1):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)

        self.shortcut = nn.Sequential()
        if stride != 1 or in_planes != self.expansion * planes:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_planes, self.expansion*planes, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(self.expansion*planes)
            )

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += self.shortcut(x)
        out = F.relu(out)
        return out

class SmallResNet(nn.Module):
    def __init__(self, block, num_blocks, num_classes=10):
        super(SmallResNet, self).__init__()
        self.in_planes = 64

        self.conv1 = nn.Conv2d(1, 64, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
        self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
        self.linear = nn.Linear(128*block.expansion, num_classes)

    def _make_layer(self, block, planes, num_blocks, stride):
        strides = [stride] + [1]*(num_blocks-1)
        layers = []
        for stride in strides:
            layers.append(block(self.in_planes, planes, stride))
            self.in_planes = planes * block.expansion
        return nn.Sequential(*layers)

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.layer1(out)
        out = self.layer2(out)
        out = F.avg_pool2d(out, out.size()[3])
        out = out.view(out.size(0), -1)
        out = self.linear(out)
        return out
```

#### (d) Using the same data loader and train-test split, train a Feedforward Neural Network (NN) model with \(D\). Sweep the data set once (i.e., with 1 epoch) with your choice of batch size, optimizer, and other configurations. Show me your code.

<span style="color: red">TODO</span>

### 2. Consider a CNN layer with the following characteristics:

   - Input volume size: 32x32x3 (where 32x32 is the spatial dimension of the input, and 3 is the number of input channels, e.g., RGB image)
   - Number of conv kernels: 10
   - Kernel size: 5×5
   - Stride: 1
   - Padding: 0

#### (a) How many parameters does the above model have?

Number of parameters per ach convolutional kernel:
$$5 \times 5 \times 3 = 75$$

Number of parameters for all kernels:
$$75 \times 10 = 750$$

Eahc kernel has a bias term therefore the total number of parameters is:
$$750 + 10 = 760 \text{ parameters}$$

#### (b) What is the minimum size of the image that still allows the above model to remain functional and compatible?

$\bold{5 \times 5}$, because the kernel size is $5 \times 5$ pixels, and the it must fit the kernel.

### 3. What is the key problem that ResNet aims to address in deep neural networks?

ResNet addresses the problem of training very deep neural networks by introducing residual connections, which help mitigate issues like the vanishing gradient problem and the degradation problem. These skip connections allow gradients to flow more easily through the network during backpropagation, making it easier to train deep models and preventing performance degradation as more layers are added.

### 4. What is the key problem that dropout aims to address in deep neural networks?

Dropout aims to address **overfitting**. It does so by randomly dropping neurons so that the network does not fully rely on any certain neurons.