# **READ_ME**

# Model Configuration Guide

this explains how to configure the neural network model for different target prompt when training and evaluating essay scoring models.

## Key Configuration Variables
There are 4 main variables that control which prompts are used for different stages of the model:

1. `TEST_PROMPT_ID`: the taget prompt held out for final testing (section 1.4)
2. `TRAIN_PROMPT_RANGE`: list of prompts used in grid search  (section 1.4)
3. `VALIDATION_PROMPT_ID`: use any training prompt that isnt the test promp (section 1.6)
4. `ranges`: list of prompts used for batch optimization (section 1.6)

### Default Configuration (Here target prompt = 2)

```python
TEST_PROMPT_ID = 2
TRAIN_PROMPT_RANGE = [1,3,4,5,6,7,8]
VALIDATION_PROMPT_ID = 8
ranges = [1,3,4,5,6,7]
```

## How to Modify

### Step 1: Choose Test Prompt
```python
TEST_PROMPT_ID = 2  #change to any prompt id you want to test on/target
```
This prompt will be completely held out until final evaluation.

### Step 2: Set Training Range
```python
TRAIN_PROMPT_RANGE = [1,3,4,5,6,7,8]  #(all prompts except test prompt
```
- remove the `TEST_PROMPT_ID` from this list
- its used during grid search to find optimal hyperparameters
- example: If `TEST_PROMPT_ID = 1`, then use `TRAIN_PROMPT_RANGE = [2,3,4,5,6,7,8]`

### Step 3: Set Validation Prompt

```python
VALIDATION_PROMPT_ID = 8  #change to desired validation prompt
```
- used during batch size optimization (since our data is divided 6 prompts testing 1 prompt validation, we let the validation be a manually entered prompt)
- should be one of the prompts from `TRAIN_PROMPT_RANGE`

### Step 4: Update Training Ranges
```python
ranges = [1,3,4,5,6,7]  #remove both test and validation prompts
```
- remove both `TEST_PROMPT_ID` and `VALIDATION_PROMPT_ID`
- used for batch optimization
- Example: If `TEST_PROMPT_ID = 1` and `VALIDATION_PROMPT_ID = 8`, then use `ranges = [2,3,4,5,6,7]`

## Example

### Configuration 1: Testing on Prompt 1
```python
TEST_PROMPT_ID = 1
TRAIN_PROMPT_RANGE = [2,3,4,5,6,7,8]
VALIDATION_PROMPT_ID = 8
ranges = [2,3,4,5,6,7]
```


## Important Notes

1. Make sure:
   - test prompt should not appear in any other list
   - validation prompt should be in `TRAIN_PROMPT_RANGE` but not in `ranges`
   - all remaining prompts should be in both `TRAIN_PROMPT_RANGE` and `ranges`

2. The model uses these splits for:
   - grid Search: Uses `TRAIN_PROMPT_RANGE` for cross-validation
   - batch Size Optimization: Uses `ranges` for training and `VALIDATION_PROMPT_ID` for validation
   - final Training: Uses all prompts except `TEST_PROMPT_ID` for training, then evaluates on `TEST_PROMPT_ID`

3. The model saves the final trained model as "model-A-{TEST_PROMPT_ID}.pt"

# 1. Approach A: Holistic Scoring using FNN

In [None]:
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
import numpy as np
from sklearn.model_selection import KFold
import pandas as pd
from sklearn.metrics import cohen_kappa_score
import time
from datetime import datetime

## 1.1. Starter Code

In [None]:
SCORE_RANGES = {
        1: {'sentence_fluency': (1, 6), 'word_choice': (1, 6), 'conventions': (1, 6),'organization': (1, 6),
            'content': (1, 6), 'holistic': (2, 12)},
        2: {'sentence_fluency': (1, 6), 'word_choice': (1, 6), 'conventions': (1, 6),'organization': (1, 6),
            'content': (1, 6), 'holistic': (1, 6)},
        3: {'narrativity': (0, 3), 'language': (0, 3), 'prompt_adherence': (0, 3), 'content': (0, 3), 'holistic': (0, 3)},
        4: {'narrativity': (0, 3), 'language': (0, 3), 'prompt_adherence': (0, 3), 'content': (0, 3), 'holistic': (0, 3)},
        5: {'narrativity': (0, 4), 'language': (0, 4), 'prompt_adherence': (0, 4), 'content': (0, 4), 'holistic': (0, 4)},
        6: {'narrativity': (0, 4), 'language': (0, 4), 'prompt_adherence': (0, 4), 'content': (0, 4), 'holistic': (0, 4)},
        7: {'conventions': (0, 6), 'organization': (0, 6), 'content': (0, 6),'holistic': (0, 30)},
        8: {'sentence_fluency': (2, 12), 'word_choice': (2, 12), 'conventions': (2, 12),'organization': (2, 12),
            'content': (2, 12), 'holistic': (0, 60)}}

def read_data(path):
    """
    Reads the CSV file and returns a dictionary that has parallel lists of values.

    Parameters:
    - path (str): Path to the CSV file containing the essay data.

    Returns: data_dict (dict): A dictionary that has parallel lists, with the following keys:
        - 'essay_ids': Unique identifiers for each essay
        - 'prompt_ids': Identifiers for the prompt id
        - 'essay_text': Text contents of the essays
        - 'features': The 86 extracted features extracted from the essays
        - 'holistic': Holistic scores
        - 'content': Content scores
        - 'organization': Organization scores
        - 'word_choice': Word choice scores
        - 'sentence_fluency': Sentence fluency scores
        - 'conventions': Conventions scores
        - 'prompt_adherence': Prompt adherence scores
        - 'language': Language scores
        - 'narrativity': Narrativity scores
    """

    data = pd.read_csv(path)
    data_dict = {
        'essay_ids': data['essay_id'].values,
        'prompt_ids': data['prompt_id'].values,
        'essay_text': data['essay_text'].values,
        'features': data.iloc[:, 12:].values,
        'holistic':data['holistic'].values,
        'content':data['content'].values,
        'organization':data['organization'].values,
        'word_choice':data['word_choice'].values,
        'sentence_fluency':data['sentence_fluency'].values,
        'conventions':data['conventions'].values,
        'prompt_adherence':data['prompt_adherence'].values,
        'language':data['language'].values,
        'narrativity':data['narrativity'].values
    }

    return data_dict

def quadratic_weighted_kappa(y_true, y_pred):
    """
    Calculates the Quadratic Weighted Kappa (QWK) score between true labels and predictions using sklearn.

    Parameters:
    - y_true (array-like): The true labels
    - y_pred (array-like): The predicted labels

    Returns:
    - float: The QWK score between y_true and y_pred.
    """
    return cohen_kappa_score(y_true, np.round(y_pred), weights='quadratic')

## 1.2. Defining the NN class

In [None]:
class NeuralNetwork(nn.Module):
    def __init__(self, input_size, hidden_unit, num_layers):
        super(NeuralNetwork, self).__init__()

        layers = []
        #input layer and activaition
        layers.append(nn.Linear(input_size, hidden_unit))
        layers.append(nn.ReLU())

        #hidden layers depending on num of layers we are trying
        for i in range(num_layers - 1):
            layers.append(nn.Linear(hidden_unit, hidden_unit))
            layers.append(nn.ReLU())

        layers.append(nn.Linear(hidden_unit, 1)) #the output layer

        self.model = nn.Sequential(*layers)

        for i in self.modules():
            if isinstance(i, nn.Linear): #make sure we only initialise in the liear layers and not the activiation layer
                nn.init.kaiming_normal_(i.weight) #initialize weights using he initialization
                nn.init.zeros_(i.bias) #set biases to 0s

    def forward(self, x):
        return self.model(x)

## 1.3. Helper Normalization Functions

In [None]:

def normalize_scores(scores, prompt_id):
   #get the min and max score ranges for specified prompt
    score_range = SCORE_RANGES[prompt_id]['holistic']
    return (scores - score_range[0]) / (score_range[1] - score_range[0]) #scale it between 0 and 1

def denormalize_scores(norm_scores, prompt_id):
     #get the min and max score ranges for specified prompt
    score_range = SCORE_RANGES[prompt_id]['holistic']
    return norm_scores * (score_range[1] - score_range[0]) + score_range[0] #get the original score back for descaling later

## 1.4. Data Initialization

Initialize the target prompt and the train prompts

In [None]:
TEST_PROMPT_ID = 2
TRAIN_PROMPT_RANGE = [1,3,4,5,6,7,8]

In [None]:
print(f"\n{'-'*50}")
print(f"starting training process at {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"{'-'*50}\n")

print("reading the dataset\n")
data = read_data('dataset.csv')

filter = data['prompt_ids'] != TEST_PROMPT_ID #returns true for all prompts that are not the test prompt
prompt_ids = data['prompt_ids'][filter] #returns prompt_ids that are true (not the first one)
features = torch.FloatTensor(data['features'][filter]) #returns the features for prompt_ids that are true
scores = torch.FloatTensor(data['holistic'][filter]) #returns the holistic scores for prompt_ids that are true

print(f"feature shape: {features.shape}") #to see how many examples and features we have

normalized_scores = torch.zeros_like(scores)
for prompt_id in TRAIN_PROMPT_RANGE:
    prompt_filter = prompt_ids == prompt_id #true when equal to the current prompt id
    normalized_scores[prompt_filter] = normalize_scores(scores[prompt_filter], prompt_id) #normalize scores for that prompt id based on its range
normalized_scores = normalized_scores.reshape(-1, 1) #reshape to column


--------------------------------------------------
starting training process at 2024-11-19 14:37:40
--------------------------------------------------

reading the dataset

feature shape: torch.Size([11178, 86])


## 1.5. Grid Search using k-fold cv

In [None]:
#we will do 7-fold cross validation for hyper parameter tuning
prompts = TRAIN_PROMPT_RANGE
k_fold = KFold(n_splits=7, shuffle=False) #shuffle = false because each prompt will have a seperate fold, not random
#------------------------ Grid Search Params ---------------------------
hidden_units = [8, 16, 32]
num_layers_options = [1, 2, 4, 8]
learning_rates = [0.001, 0.01, 0.1]
batch_size = 4  #its fixed at this stage

total_combinations = len(hidden_units) * len(num_layers_options) * len(learning_rates)
print(f"\nStarting grid search with {total_combinations} combinations")
print(f"- hidden units per layer (D): {hidden_units}")
print(f"- num of layers (k): {num_layers_options}")
print(f"- learning rates: {learning_rates}")
print(f"- batch size (fixed): {batch_size}\n")
#-----------------------------------------------------------------------
best_qwk = -1 #initialize best qwk as lowest value (QWK is from [-1,1])
best_params = None
combination_count = 0
start_time = time.time()
#starting the grid search for combinations of hyperparams
for hidden_unit in hidden_units:
    for num_layers in num_layers_options:
        for lr in learning_rates:
            combination_count += 1 #keep count of the combination number
            print(f"\n--------------- testing combination {combination_count}/{total_combinations}--------------")
            print(f"hyperparams: units per layer D={hidden_unit}, num of layers k={num_layers}, lr={lr}")

            fold_qwks = [] #list to store QWK score for each model/fold
            fold_start_time = time.time()

            #start the 7 fold cv
            for fold, (train_prompt_idx, val_prompt_idx) in enumerate(k_fold.split(prompts)):
                #get training and validation prompts for current model
                train_prompts = [prompts[i] for i in train_prompt_idx]
                val_prompt = prompts[val_prompt_idx[0]]
                print(f"\nvalidation prompt {val_prompt}:", end=" ", flush=True)

                #create a filter for selecting training and validation data based on prompts
                train_filter = np.isin(prompt_ids, train_prompts)
                val_filter = (prompt_ids == val_prompt)

                #split
                X_train, X_val = features[train_filter], features[val_filter]
                y_train, y_val = normalized_scores[train_filter], normalized_scores[val_filter]

                #initialize model with hyper param combo
                model = NeuralNetwork(86, hidden_unit, num_layers)
                optimizer = torch.optim.AdamW(model.parameters(), lr=lr, betas=(0.9, 0.999), weight_decay=0.1)
                criterion = nn.MSELoss()

                #start training for max epochs 15 with early stopping
                best_val_loss = float('inf') #loss at huge number, "infinity" to start
                epochs_no_improve = 0

                for epoch in range(15):  #max epochs = 15
                    model.train()
                    epoch_loss = 0
                    num_batches = 0

                    #update for batch
                    for i in range(0, len(X_train), batch_size):
                        batch_X = X_train[i:i + batch_size]
                        batch_y = y_train[i:i + batch_size]

                        optimizer.zero_grad() #reset gradients
                        outputs = model(batch_X) #model forward pass
                        loss = criterion(outputs, batch_y) #compute loss
                        loss.backward() #backpropagation
                        optimizer.step() #update model params

                        epoch_loss += loss.item() #accumulate loss per epoch
                        num_batches += 1
                    avg_epoch_loss = epoch_loss / num_batches #calculate average loss for current epoch

                    #------------------------ EARLY STOPPING --------------------------------------
                    #see if training loss is improving, if it isnt, do early stopping
                    if avg_epoch_loss < best_val_loss:
                        best_val_loss = avg_epoch_loss
                        epochs_no_improve = 0
                    else:
                        epochs_no_improve += 1

                    if epochs_no_improve >= 3:  # Early stopping threshold
                        print(f"early stop at epoch {epoch + 1}", end=" ")
                        break

                  #---------------------------------------------------------------------------------

                #validate the fold
                model.eval()
                with torch.no_grad():
                    val_pred = model(X_val)
                    val_pred_denorm = denormalize_scores(val_pred.numpy(), val_prompt)
                    val_true_denorm = denormalize_scores(y_val.numpy(), val_prompt)
                    fold_qwk = quadratic_weighted_kappa(
                        val_true_denorm.round().flatten(),
                        val_pred_denorm.flatten()
                    )
                    fold_qwks.append(fold_qwk)
                    print(f"QWK: {fold_qwk:.4f}")

            #calculate average qwk across all folds of current combo
            avg_qwk = np.mean(fold_qwks)
            fold_time = time.time() - fold_start_time
            print(f"\nAverage QWK for this combination: {avg_qwk:.4f}")
            #------------------------ UPDATE BEST HYPERPARAM -------------------
            #update the best hyperparams if this combination performs better
            if avg_qwk > best_qwk:
                best_qwk = avg_qwk
                best_params = {
                    'hidden_unit': hidden_unit,
                    'num_layers': num_layers,
                    'learning_rate': lr
                }
                print(f"BEST COMBO SO FAR! =)")
            #--------------------------------------------------------------------

total_time = time.time() - start_time
print(f"\n{'-'*50}")
print("---------------- FINISHED GRID SEARCH ----------------")
print(f"mins taken:({total_time/60:.2f} mins)")
print(f"best hyperparam combo: {best_params} with best average QWK: {best_qwk:.4f}")
print(f"{'-'*50}\n")


Starting grid search with 36 combinations
- hidden units per layer (D): [8, 16, 32]
- num of layers (k): [1, 2, 4, 8]
- learning rates: [0.001, 0.01, 0.1]
- batch size (fixed): 4


--------------- testing combination 1/36--------------
hyperparams: units per layer D=8, num of layers k=1, lr=0.001

validation prompt 1: QWK: 0.4439

validation prompt 3: QWK: 0.4940

validation prompt 4: QWK: 0.5224

validation prompt 5: QWK: 0.3136

validation prompt 6: QWK: 0.1217

validation prompt 7: QWK: 0.5566

validation prompt 8: early stop at epoch 13 QWK: 0.3475

Average QWK for this combination: 0.3999
BEST COMBO SO FAR! =)

--------------- testing combination 2/36--------------
hyperparams: units per layer D=8, num of layers k=1, lr=0.01

validation prompt 1: early stop at epoch 9 QWK: 0.3329

validation prompt 3: early stop at epoch 6 QWK: 0.4185

validation prompt 4: early stop at epoch 6 QWK: 0.4609

validation prompt 5: early stop at epoch 5 QWK: 0.0860

validation prompt 6: early stop at

## 1.6. Batch Size Optimization

In [None]:
#--------------------------------------------------------------Batch size optimization-----------------------------------------------
# change depending on target and chosen validation prompt
ranges = [1,3,4,5,6,7]
VALIDATION_PROMPT_ID = 8

print(f"\n{'-'*50}")
print("Starting batch size optimization with the best parameters")
print(f"{'-'*50}\n")

# we will 6 prompts a training set (~85%) and 1 for validation (~15%)
train_filter = (data['prompt_ids'] != TEST_PROMPT_ID) & (data['prompt_ids'] != VALIDATION_PROMPT_ID)
val_filter = data['prompt_ids'] == VALIDATION_PROMPT_ID

#preparing the training data
X_train = torch.FloatTensor(data['features'][train_filter])
y_train_orig = torch.FloatTensor(data['holistic'][train_filter])
prompt_ids_train = data['prompt_ids'][train_filter]

#preparing the validation data (prompt 8)
X_val = torch.FloatTensor(data['features'][val_filter])
y_val_orig = torch.FloatTensor(data['holistic'][val_filter])

print("Dataset after separation:")
print(f"Training: {len(X_train)} essays (prompts {ranges})")
print(f"Validation: {len(X_val)} essays (prompt {VALIDATION_PROMPT_ID})")
print(f"Held out: Prompt {TEST_PROMPT_ID} (for final testing)\n")

#normalizing the training scores separatly for each prompt
y_train = torch.zeros_like(y_train_orig)
for prompt_id in ranges:
    prompt_filter = prompt_ids_train == prompt_id
    y_train[prompt_filter] = normalize_scores(y_train_orig[prompt_filter], prompt_id)
y_train = y_train.reshape(-1, 1)

#normalizing the validation scores
y_val = normalize_scores(y_val_orig, VALIDATION_PROMPT_ID)
y_val = y_val.reshape(-1, 1)

#batch sizes to test for optimization
batch_sizes = [4, 8, 16, 32]
print(f"Testing batch sizes: {batch_sizes}")

#Intializing the variables to track the best batch size
best_batch_qwk = -1
best_batch_size = None
batch_start_time = time.time()

# Testing for each batch size and evaluating its preformance
for batch_size in batch_sizes:
    print(f"\nTesting batch size: {batch_size}")

# Initializing the model with the best parameters from grid search
    model = NeuralNetwork(
        input_size=86,
        hidden_unit=best_params['hidden_unit'],
        num_layers=best_params['num_layers']
    )
#Setting up the optimizer with Adamw
    optimizer = torch.optim.AdamW(
        model.parameters(),
        lr=best_params['learning_rate'],
        betas=(0.9, 0.999),
        weight_decay=0.1
    )

    criterion = nn.MSELoss()


    best_val_loss = float('inf')
    epochs_without_improvement = 0
# Training loop for 15 epochs or until early stopping
    for epoch in range(15):
        model.train()
        epoch_loss = 0
        num_batches = 0

        # Processing the training data in batches
        for i in range(0, len(X_train), batch_size):
            batch_X = X_train[i:i + batch_size]
            batch_y = y_train[i:i + batch_size]

            optimizer.zero_grad()
            outputs = model(batch_X)
            loss = criterion(outputs, batch_y)
            loss.backward()
            optimizer.step()

            epoch_loss += loss.item()
            num_batches += 1

        avg_epoch_loss = epoch_loss / num_batches

        # Calculating the validation loss
        model.eval()
        with torch.no_grad():
            val_outputs = model(X_val)
            val_loss = criterion(val_outputs, y_val)

        # Early stopping we stop the training if  the validation loss doesn't improve for 3 epochs
        if val_loss < best_val_loss:
            best_val_loss = val_loss
            epochs_without_improvement = 0
        else:
            epochs_without_improvement += 1

        if epochs_without_improvement >= 3:
            print(f"Early stopping at epoch {epoch + 1}")
            break

        if (epoch + 1) % 5 == 0:
            print(f"Epoch {epoch + 1}/15 - Train Loss: {avg_epoch_loss:.6f}, Val Loss: {val_loss:.6f}")

    # Calculating the validation loss after every epoch
    model.eval()
    with torch.no_grad():
        val_pred = model(X_val)
        val_pred_denorm = denormalize_scores(val_pred.numpy(), VALIDATION_PROMPT_ID)
        val_true_denorm = y_val_orig.numpy()

        # Calculating the QWK
        qwk = quadratic_weighted_kappa(
            val_true_denorm.round(),
            val_pred_denorm.flatten()
        )
        print(f"Validation QWK: {qwk:.4f}")

        if qwk > best_batch_qwk:
            best_batch_qwk = qwk
            best_batch_size = batch_size
            print("New best batch size found!!!!! :)")

batch_time = time.time() - batch_start_time
print(f"\n{'-'*50}")
print("Done with batch size optimization ")
print(f"Time taken: {batch_time:.2f} seconds ({batch_time/60:.2f} minutes)")
print(f"Best batch size: {best_batch_size}")
print(f"Best validation QWK: {best_batch_qwk:.4f}")
print(f"{'-'*50}\n")


--------------------------------------------------
Starting batch size optimization with the best parameters
--------------------------------------------------

Dataset after separation:
Training: 10455 essays (prompts [1, 3, 4, 5, 6, 7])
Validation: 723 essays (prompt 8)
Held out: Prompt 2 (for final testing)

Testing batch sizes: [4, 8, 16, 32]

Testing batch size: 4
Epoch 5/15 - Train Loss: 0.022807, Val Loss: 0.023835
Early stopping at epoch 7
Validation QWK: 0.4154
New best batch size found!!!!! :)

Testing batch size: 8
Early stopping at epoch 4
Validation QWK: 0.3813

Testing batch size: 16
Early stopping at epoch 4
Validation QWK: 0.5485
New best batch size found!!!!! :)

Testing batch size: 32
Epoch 5/15 - Train Loss: 0.022729, Val Loss: 0.014503
Early stopping at epoch 8
Validation QWK: 0.4987

--------------------------------------------------
Done with batch size optimization 
Time taken: 37.68 seconds (0.63 minutes)
Best batch size: 16
Best validation QWK: 0.5485
--------

## 1.7. Final model training and testing

In [None]:
#now we have finalized all our hyperparameters that we will use to train the model
final_params = best_params.copy()
final_params['batch_size'] = best_batch_size

#retrain on the whole data again except on prompt 1
final_train_filter = data['prompt_ids'] != TEST_PROMPT_ID
X_final_train = torch.FloatTensor(data['features'][final_train_filter])
y_final_train_orig = torch.FloatTensor(data['holistic'][final_train_filter])
prompt_ids_final = data['prompt_ids'][final_train_filter]

#normalize scores again the same way we did before for the final training
y_final_train = torch.zeros_like(y_final_train_orig)
for prompt_id in TRAIN_PROMPT_RANGE:
    prompt_filter = prompt_ids_final == prompt_id
    y_final_train[prompt_filter] = normalize_scores(y_final_train_orig[prompt_filter], prompt_id)
y_final_train = y_final_train.reshape(-1, 1)

#initialize the final model with the best hyperparameters
final_model = NeuralNetwork(
    input_size=86,
    hidden_unit=final_params['hidden_unit'],
    num_layers=final_params['num_layers']
)
#initialize the optimizer with AdamW and MSE loss function
optimizer = torch.optim.AdamW(
    final_model.parameters(),
    lr=final_params['learning_rate'],
    betas=(0.9, 0.999),
    weight_decay=0.1
)
criterion = nn.MSELoss()

#train the final model on the entire training set now
for epoch in range(15):
    final_model.train()
    epoch_loss = 0
    num_batches = 0
    for i in range(0, len(X_final_train), final_params['batch_size']):
        start, end = i, i + final_params['batch_size']
        batch_X = X_final_train[start:end] #get a batch of features then get the corresponding scores too
        batch_y = y_final_train[start:end]

        optimizer.zero_grad() #set the gradients to zero
        outputs = final_model(batch_X)
        loss = criterion(outputs, batch_y)
        loss.backward()
        optimizer.step() #updating the weights
        epoch_loss += loss.item()
        num_batches += 1

#test on heldout prompt
test_filter = data['prompt_ids'] == TEST_PROMPT_ID
X_test = torch.FloatTensor(data['features'][test_filter])
y_test = torch.FloatTensor(data['holistic'][test_filter])

final_model.eval()
with torch.no_grad():
    test_pred = final_model(X_test)
    test_pred_denorm = denormalize_scores(test_pred.numpy(), TEST_PROMPT_ID)

    test_qwk = quadratic_weighted_kappa(
        y_test.numpy().round(),
        test_pred_denorm.flatten()
    )

print(f"\n{'-'*50}")
print(f"Results on testing set (Prompt {TEST_PROMPT_ID}):")
print(f"final QWK: {test_qwk:.4f}")
print(f"Final parameters: {final_params}")
print(f"{'-'*50}\n")

final_params, test_qwk = best_params, test_qwk


--------------------------------------------------
Results on testing set (Prompt 2):
final QWK: 0.6176
Final parameters: {'hidden_unit': 32, 'num_layers': 2, 'learning_rate': 0.001, 'batch_size': 16}
--------------------------------------------------



In [None]:
scripted_model = torch.jit.script(final_model)
scripted_model.save("model-A-2.pt")