# DS 542 Notebook 6



## Download Data

The following cells download the UCI Abalone data set.
You should not need to modify any of these cells.

In [1]:
!wget https://archive.ics.uci.edu/static/public/1/abalone.zip

--2024-09-28 19:13:08--  https://archive.ics.uci.edu/static/public/1/abalone.zip
Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252
Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified
Saving to: ‘abalone.zip’

abalone.zip             [ <=>                ]  54.06K  --.-KB/s    in 0.1s    

2024-09-28 19:13:08 (404 KB/s) - ‘abalone.zip’ saved [55357]



In [2]:
!unzip abalone.zip

Archive:  abalone.zip
  inflating: Index                   
  inflating: abalone.data            
  inflating: abalone.names           


In [3]:
!cat abalone.names

1. Title of Database: Abalone data

2. Sources:

   (a) Original owners of database:
	Marine Resources Division
	Marine Research Laboratories - Taroona
	Department of Primary Industry and Fisheries, Tasmania
	GPO Box 619F, Hobart, Tasmania 7001, Australia
	(contact: Warwick Nash +61 02 277277, wnash@dpi.tas.gov.au)

   (b) Donor of database:
	Sam Waugh (Sam.Waugh@cs.utas.edu.au)
	Department of Computer Science, University of Tasmania
	GPO Box 252C, Hobart, Tasmania 7001, Australia

   (c) Date received: December 1995


3. Past Usage:

   Sam Waugh (1995) "Extending and benchmarking Cascade-Correlation", PhD
   thesis, Computer Science Department, University of Tasmania.

   -- Test set performance (final 1044 examples, first 3133 used for training):
	24.86% Cascade-Correlation (no hidden nodes)
	26.25% Cascade-Correlation (5 hidden nodes)
	21.5%  C4.5
	 0.0%  Linear Discriminate Analysis
	 3.57% k=5 Nearest Neighbour
      (Problem encoded as a classification task)

   -- Data set samp

## Prepare Data

This section reads the data set and converts it to PyTorch tensors.
You probably do not need to modify this section.

In [1]:
import numpy as np
import torch

In [2]:
abalone_X = []
abalone_Y = []
with open('abalone.data') as abalone_file:
    for line in abalone_file:
        line = line.rstrip("\n")
        line = line.split(",")

        # drop initial sex column
        line = line[1:]

        # convert from strings to numbers
        line = [float(v) for v in line]

        abalone_X.append(line[:-1])
        abalone_Y.append(line[-1])

abalone_X = np.array(abalone_X)
abalone_Y = np.array(abalone_Y)

In [3]:
# GPU configuration

def to_gpu(t):
    if torch.cuda.is_available():
        return t.cuda()
    return t

def to_numpy(t):
    return t.detach().cpu().numpy()

device = to_gpu(torch.ones(1,1)).device
device

device(type='cuda', index=0)

In [4]:
!nvidia-smi

Mon Sep 30 21:41:59 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla V100-SXM2-16GB           On  |   00000000:18:00.0 Off |                    0 |
| N/A   46C    P0             62W /  300W |     312MiB /  16384MiB |      0%   E. Process |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  Tesla V100-SXM2-16GB           On  |   00

In [5]:
# switch from NumPy arrays to Torch tensors

abalone_X = torch.tensor(abalone_X, device=device)
abalone_Y = torch.tensor(abalone_Y, device=device)

In [6]:
abalone_X

tensor([[0.4550, 0.3650, 0.0950,  ..., 0.2245, 0.1010, 0.1500],
        [0.3500, 0.2650, 0.0900,  ..., 0.0995, 0.0485, 0.0700],
        [0.5300, 0.4200, 0.1350,  ..., 0.2565, 0.1415, 0.2100],
        ...,
        [0.6000, 0.4750, 0.2050,  ..., 0.5255, 0.2875, 0.3080],
        [0.6250, 0.4850, 0.1500,  ..., 0.5310, 0.2610, 0.2960],
        [0.7100, 0.5550, 0.1950,  ..., 0.9455, 0.3765, 0.4950]],
       device='cuda:0', dtype=torch.float64)

In [7]:
abalone_X.shape

torch.Size([4177, 7])

In [8]:
abalone_Y

tensor([15.,  7.,  9.,  ...,  9., 10., 12.], device='cuda:0',
       dtype=torch.float64)

In [9]:
abalone_Y.shape

torch.Size([4177])

## Problem 1 - Linear Regression

Use PyTorch to implement linear regression of the abalone Rings column saved in `abalone_Y` using the columns in `abalone_X` as inputs.
Train your linear model using gradient descent as described in lecture.

You can freely use code from the [example training notebook shared in class](https://colab.research.google.com/drive/1xWo_rF0exGdewtaMZP5LNBKJolxUVt8u?usp=sharing).
This model should be much simpler than that example - in particular, you do not not need (and should not have) the Fourier features, hidden layers, and activation functions.

Feel free to add extra cells as you feel appropriate.

In [10]:
# BUILD AND TRAIN YOUR LINEAR MODEL HERE

# Linear Regression Model without hidden layers and activation functions
class LinearRegressionModel(torch.nn.Module):
    def __init__(self, input_size, output_size):
        super(LinearRegressionModel, self).__init__()
        self.linear = torch.nn.Linear(input_size, output_size)
        
    def forward(self, x):
        return self.linear(x)

# Reshape abalone_Y to be a column vector
abalone_Y = abalone_Y.view(-1, 1)

# Input and Output size
input_size = abalone_X.shape[1] # 7 features
output_size = abalone_Y.shape[1] # 1 feature

# Initialization
model = LinearRegressionModel(input_size, output_size).to(device)

criterion = torch.nn.MSELoss() # Loss function (Mean Squared Error)
optimizer = torch.optim.SGD(model.parameters(), lr=0.001) # Optimizer (Stochastic Gradient Descent)

# Training
num_epochs = 1000

for epoch in range(num_epochs):
    # Forward pass
    outputs = model(abalone_X.float())
    
    # Compute the loss
    loss = criterion(outputs, abalone_Y.float())
    
    # Zero gradients, perform a backward pass, and update weights
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    if (epoch+1) % 100 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

Epoch [100/1000], Loss: 45.1000
Epoch [200/1000], Loss: 21.6714
Epoch [300/1000], Loss: 13.1931
Epoch [400/1000], Loss: 10.0962
Epoch [500/1000], Loss: 8.9381
Epoch [600/1000], Loss: 8.4799
Epoch [700/1000], Loss: 8.2760
Epoch [800/1000], Loss: 8.1658
Epoch [900/1000], Loss: 8.0914
Epoch [1000/1000], Loss: 8.0318


In [11]:
# PRINT YOUR LINEAR MODEL COEFFICIENTS HERE
print("Linear Model Coefficients:", model.linear.weight.data)

Linear Model Coefficients: tensor([[2.3402, 1.9282, 0.7325, 3.2595, 1.4453, 0.5807, 0.7253]],
       device='cuda:0')


In [12]:
# PRINT YOUR LINEAR BIAS HERE.
print("Linear Bias:", model.linear.bias.data)

Linear Bias: tensor([4.0298], device='cuda:0')


## Problem 2 - A Better Model

Build and train a separate model using PyTorch using at least one hidden layer.
Then answer the questions below.

Again, you can freely use code from the [example training notebook shared in class](https://colab.research.google.com/drive/1xWo_rF0exGdewtaMZP5LNBKJolxUVt8u?usp=sharing).
You should not need the Fourier features for this particular model.

Feel free to add extra cells as you feel appropriate.

In [13]:
# BUILD AND TRAIN YOUR SECOND MODEL HERE

# Neural Network with three hidden layers
class NeuralNetworkModel(torch.nn.Module):
    def __init__(self, input_size, hidden_size1, hidden_size2, hidden_size3, output_size):
        super(NeuralNetworkModel, self).__init__()
        self.fc1 = torch.nn.Linear(input_size, hidden_size1)  # First hidden layer
        self.fc2 = torch.nn.Linear(hidden_size1, hidden_size2) # Second hidden layer
        self.fc3 = torch.nn.Linear(hidden_size2, hidden_size3) # Third hidden layer
        self.fc4 = torch.nn.Linear(hidden_size3, output_size)  # Output layer
        self.relu = torch.nn.ReLU()  # Activation function

    def forward(self, x):
        x = self.fc1(x)  # First hidden layer
        x = self.relu(x) # ReLU activation
        x = self.fc2(x)  # Second hidden layer
        x = self.relu(x) # ReLU activation
        x = self.fc3(x)  # Third hidden layer
        x = self.relu(x) # ReLU activation
        x = self.fc4(x)  # Output layer
        return x

# Reshape abalone_Y to be a column vector
abalone_Y = abalone_Y.view(-1, 1)

# Input , Hidden, and Output size
input_size = abalone_X.shape[1] # 7 features
hidden_size1 = 512
hidden_size2 = 128
hidden_size3 = 32
output_size = abalone_Y.shape[1] # 1 feature

# Initialization
model = NeuralNetworkModel(input_size, hidden_size1, hidden_size2, hidden_size3, output_size).to(device)

criterion = torch.nn.MSELoss() # Loss function (Mean Squared Error)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001) # Optimizer (Adaptive Moment Estimation)

# Training
num_epochs = 1000

for epoch in range(num_epochs):
    # Forward pass
    outputs = model(abalone_X.float())
    
    # Compute the loss
    loss = criterion(outputs, abalone_Y.float())
    
    # Zero gradients, perform a backward pass, and update weights
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    if (epoch+1) % 100 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

Epoch [100/1000], Loss: 7.0799
Epoch [200/1000], Loss: 5.9763
Epoch [300/1000], Loss: 4.7600
Epoch [400/1000], Loss: 4.4921
Epoch [500/1000], Loss: 4.4184
Epoch [600/1000], Loss: 4.3759
Epoch [700/1000], Loss: 4.3434
Epoch [800/1000], Loss: 4.2757
Epoch [900/1000], Loss: 4.2269
Epoch [1000/1000], Loss: 4.1941


Describe the second model that you built.
Include the number and widths of the hidden layers, the activation functions, and anything else that you deem important.

YOUR ANSWER HERE.

**Model Description:**
1. **Input Layer**: 7 features.
2. **Hidden Layers**: Three hidden layers, each structured as follows:
   - **First hidden layer**: 512 neurons, followed by the ReLU activation function.
   - **Second hidden layer**: 128 neurons, followed by the ReLU activation function.
   - **Third hidden layer**: 32 neurons, followed by the ReLU activation function.
3. **Activation Function**: ReLU (Rectified Linear Unit).
4. **Output Layer**: 1 neuron, representing the predicted abalone Rings.
5. **Optimizer**: Adam (Adaptive Moment Estimation).
6. **Learning Rate**: 0.001.
7. **Loss Function**: MSE (Mean Squared Error).

What loss value did your second model achieve?

In [14]:
# PRINT YOUR LOSS HERE
print(f'Final Loss: {loss.item():.4f}')

Final Loss: 4.1941
