# Homework 4.1: Neural Network for Cook Islands Māori Parts of Speech
Dartmouth College, LING48/CS72, Winter 2024<br>
Kenneth Lai (Kenneth.Han.Lai@dartmouth.edu)

Code modified from:
https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/<br>
Code converted to PyTorch by Colin Kearns (Colin.R.Kearns.25@dartmouth.edu)

Before you start writing any code, you should first read the starter code and understand what it is doing. Then, in block 5, you will fill in the `__init__` and `forward` functions in the `Model` class, as well as the training loop:

- `__init__(self, input_dims)`: This function defines the structure of the neural network. Please build the network as follows:
 - A hidden Linear layer with 48 neurons
 - A ReLU activation function
 - A hidden Linear layer with 24 neurons
 - A ReLU activation function
 - An output Linear layer with 3 neurons (i.e., the number of output classes)

  Note that in PyTorch, the activation functions are separate from the layers, which just compute the “score” z=xW+b. Also note that there is no activation function after the output layer. While one would expect there to be a softmax function here, the `criterion` (loss function) `nn.CrossEntropyLoss()` includes the softmax function already, so we don’t need to include it here.


- forward(self, x): This function defines the forward pass. Here, you will pass the input `x` through each of the layers and activation functions defined in `__init__`, in sequence.


- Training loop: Finally, in the training loop, you will do five things (in one line of code each):

 1. Run the forward pass: pass the `inputs` through the `pytorch_model`.
 2. Compute the loss (for the current mini-batch).
 3. Reset the gradient to zero (otherwise we will keep adding to the gradient already there).
 4. Run the backward pass.
 5. Update the parameters.

Run the program three times and record the training and test accuracy for each of the three runs. What is the average training accuracy? What is the average accuracy for the test set? How are the F1-scores behaving? (You can see more information on how the predictions are working by looking at the predictions for the first fifteen items.) Write these answers in a PDF file. Include screenshots of your results.

In [33]:
# load packages

import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from torch.utils.data import DataLoader, TensorDataset

In [34]:
# Load file and split into training data (nX) and the labels we are trying to predict (ny)

cimData = pd.read_csv("cim-3pos.csv")
dataset = cimData
nX = cimData.drop('tokenPOS', axis=1)
ny = cimData['tokenPOS']

numberValidPOS = 3  # noun, verb, preposition

In [35]:
# Encode the words into numbers and then split the data randomly into training and test sets.

encoderX = OneHotEncoder(sparse_output=True)
X = encoderX.fit_transform(nX)
encoderY = LabelEncoder()
y = encoderY.fit_transform(ny)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.10)

inputDims = 1497 # the total of features in the OneHotEncoded X vector (total number of unique words)

In [36]:
# Here, you can study what the dataset looks like (as text and encoded)

print("--- Training data, predictive features, first row ---")
print(encoderX.inverse_transform(X_train[0:1]))
print("\n--- Training data, predicted result, first row ---")
print(encoderY.inverse_transform(y_train[0:1]))
print("\n--- Training data, predictive features, first row, one-hot encoded ---")
print(X_train[0:1])
print("\n--- Training data, predicted result, first row, one-hot encoded ---")
print(y_train[0:1])

--- Training data, predictive features, first row ---
[['tei' 'roto' 'te']]

--- Training data, predicted result, first row ---
['n']

--- Training data, predictive features, first row, one-hot encoded ---
  (0, 341)	1.0
  (0, 950)	1.0
  (0, 1440)	1.0

--- Training data, predicted result, first row, one-hot encoded ---
[0]


In [37]:
# Convert sparse array to dense tensor
X_train_tensor = torch.tensor(X_train.toarray(), dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.long)

# Define the PyTorch model
class Model(nn.Module):
    def __init__(self, input_dims):
        super(Model, self).__init__()
        # Add the layers and activation functions here
        self.layers = nn.Sequential(
            nn.Linear(input_dims, 48),
            nn.ReLU(),
            nn.Linear(48, 24),
            nn.ReLU(),
            nn.Linear(24, 3)
        )

    def forward(self, x):
        # Define the forward pass here
        
        #? step through the layers
        y = self.layers(x)
        
        return y

# Initialize the PyTorch model
inputDims = 1497 # Specify the value of inputDims
pytorch_model = Model(inputDims)

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(pytorch_model.parameters())

# Create DataLoader
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

# Training loop
num_epochs = 50
for epoch in range(num_epochs):
    for inputs, labels in train_loader:
        
        #? Forward pass
        outputs = pytorch_model(inputs)
        
        #? Compute the loss
        loss = criterion(outputs, labels)
        
        #? Reset the gradient to zero
        optimizer.zero_grad()
        
        #? Backward pass
        loss.backward()
        
        #? Update the parameters
        optimizer.step()

    print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

Epoch [1/50], Loss: 0.7920
Epoch [2/50], Loss: 0.4093
Epoch [3/50], Loss: 0.1764
Epoch [4/50], Loss: 0.0462
Epoch [5/50], Loss: 0.0106
Epoch [6/50], Loss: 0.0722
Epoch [7/50], Loss: 0.0197
Epoch [8/50], Loss: 0.0088
Epoch [9/50], Loss: 0.0099
Epoch [10/50], Loss: 0.0152
Epoch [11/50], Loss: 0.0047
Epoch [12/50], Loss: 0.0006
Epoch [13/50], Loss: 0.0017
Epoch [14/50], Loss: 0.0015
Epoch [15/50], Loss: 0.0026
Epoch [16/50], Loss: 0.0130
Epoch [17/50], Loss: 0.0017
Epoch [18/50], Loss: 0.0012
Epoch [19/50], Loss: 0.0049
Epoch [20/50], Loss: 0.0760
Epoch [21/50], Loss: 0.0020
Epoch [22/50], Loss: 0.0003
Epoch [23/50], Loss: 0.0000
Epoch [24/50], Loss: 0.0001
Epoch [25/50], Loss: 0.0006
Epoch [26/50], Loss: 0.0003
Epoch [27/50], Loss: 0.0003
Epoch [28/50], Loss: 0.0007
Epoch [29/50], Loss: 0.0001
Epoch [30/50], Loss: 0.0002
Epoch [31/50], Loss: 0.0003
Epoch [32/50], Loss: 0.0000
Epoch [33/50], Loss: 0.0004
Epoch [34/50], Loss: 0.0001
Epoch [35/50], Loss: 0.0002
Epoch [36/50], Loss: 0.0046
E

In [38]:
# Convert test data to PyTorch tensor
X_test_tensor = torch.tensor(X_test.toarray(), dtype=torch.float32)

# Make class predictions with the model
print("===== Results from final layer of the first 5 items in test set =====")
with torch.no_grad():
    t = pytorch_model(X_test_tensor[:5])
    predictions = torch.argmax(t, dim=1).numpy()

for i in range(5):
    print(f"{i + 1}: {predictions[i]}")

# Calculate accuracy
total_correct = 0
predicted_labels = []
with torch.no_grad():
    for i in range(len(y_test)):
        temp_pred = torch.argmax(pytorch_model(X_test_tensor[i]), dim=0).item()
        if temp_pred < numberValidPOS:
            predicted_labels.append(temp_pred)
        else:
            predicted_labels.append(0)

        if predicted_labels[i] == y_test[i]:
            total_correct += 1

accuracy = round((total_correct / len(y_test)) * 100, 0)

print("\n===== Size of test set =====")
print(len(y_test))
print("\n===== Test data, predictive features, first 15 rows =====")
print(encoderX.inverse_transform(X_test[:15]))
print("\n===== Test data, predicted result, first 15 rows =====")
print(predicted_labels[:15])
print("\n===== Test data, expected result, first 15 rows =====")
print(encoderY.inverse_transform(y_test[:15]))
print("\n===== Accuracy of test set =====")
print(f"{accuracy}%")
print("\n===== Predictions =====")

is_it_correct = ""
for i in range(15):
    item_num = "0" + str(i + 1) if i < 9 else str(i + 1)

    if predicted_labels[i] == y_test[i]:
        is_it_correct = "*Correct!*"
    else:
        is_it_correct = ""

    print(f"item {item_num}: Predicted: {predicted_labels[i]} / {encoderY.inverse_transform([predicted_labels[i]])}  \tActual value: {y_test[i]} / {encoderY.inverse_transform([y_test[i]])} \t{is_it_correct}")


===== Results from final layer of the first 5 items in test set =====
1: 0
2: 2
3: 0
4: 1
5: 0

===== Size of test set =====
258

===== Test data, predictive features, first 15 rows =====
[['kua' 'qox' 'ake']
 ['i' 'kai' 'i']
 ['au' 'ariki' 'rava']
 ['reira' 'i' 'a']
 ['te' 'qenua' 'ki']
 ['ko' 'qangaqanga' 'mua']
 ['tei' 'piri' 'mai']
 ['mea' 'tamaiti' 'varevare']
 ['ko' 'pae' 'ra']
 ['e' 'ngatax' 'ana']
 ['e' 'kai' 'ana']
 ['kax' 'pexti' '-']
 ['te' 'tipunu' '-']
 ['ex' 'tei' 'roto']
 ['te' 'maxtipi' '-']]

===== Test data, predicted result, first 15 rows =====
[0, 2, 0, 1, 0, 0, 2, 0, 0, 2, 2, 2, 0, 1, 2]

===== Test data, expected result, first 15 rows =====
['v' 'v' 'n' 'prep' 'n' 'n' 'v' 'n' 'n' 'v' 'n' 'v' 'n' 'prep' 'n']

===== Accuracy of test set =====
93.0%

===== Predictions =====
item 01: Predicted: 0 / ['n']  	Actual value: 2 / ['v'] 	
item 02: Predicted: 2 / ['v']  	Actual value: 2 / ['v'] 	*Correct!*
item 03: Predicted: 0 / ['n']  	Actual value: 0 / ['n'] 	*Correct!*
it

In [39]:
print(classification_report(y_test, predicted_labels))

              precision    recall  f1-score   support

           0       0.91      0.92      0.92       115
           1       0.99      1.00      0.99        79
           2       0.87      0.84      0.86        64

    accuracy                           0.93       258
   macro avg       0.92      0.92      0.92       258
weighted avg       0.93      0.93      0.93       258



In [40]:
print(confusion_matrix(y_test, predicted_labels))

[[106   1   8]
 [  0  79   0]
 [ 10   0  54]]
