# CODET5-SMALL MODEL TRAINING/EVALUATION

----------------------------
#### GIST OF CHANGES DONE IN THIS NOTEBOOK
----------------------------
<b>NOTE:</b>
  - The pretrained Model documentation is available at -> https://huggingface.co/Salesforce/codet5-small
  - Since This Notebook Needs Lots of Compute time/GPU, the notebook is limited to training the model. Checkpoints and necesary files are saved/copied to Google Drive. Detailed Model Evaluation/Inference/Assessment will be done in a different notebook

<br>

-   <b>PREPROCESS INPUT DATA</b>
    - Checks/Removes \<br\> characters from the code_snippet.
    - Prefixes the import_line to the code_snippet if applicable.
-   <b>TRAIN/EVALUATE CODET5-SMALL</b>
    -   Model is configured to run training loops. Early stopping is implemented if loss does not decrease after a certain number of epochs.
    -   Since Code Snippets and Code descriptions do not very lengthy. MAX_LENGTH for both were calculated and passed on the training separately to reduce training time.
    - Special tokens like \<extra_id_0\>, \</s\> sep, cls, pad were removed before evaluation.
    - A100 40GB GPU in Collab was used for the model training.
    - Below training configurations were used:
    <br><b>config</b> = {
    <br><b>&emsp;'comment_max_length'</b>: 120,
    <br><b>&emsp;'code_max_length'</b>: 120,
    <br><b>&emsp;'batch_size'</b>: 128,
    <br><b>&emsp;'epochs'    </b>: 20,
    <br><b>&emsp;'patience'  </b>: 5,
    <br><b>&emsp;'learning_rates'</b>: [0.000005, 0.00001, 0.00003, 0.0001, 0.0005]
    <br><b>&emsp;'accumulation_steps'    </b>: 1,
    <br>}

- <b>OBSERVATIONS</b>
  - Note that the codet5-small data was trained on ~ 60 Million parameters.
  - The training data at ~ 20K is comparatively small. Due to compute contraints and long running training loops, this size seemed like an optimal trade-off for the current work.
  - Patience was kept high because Bleu Score was seen to be improving beyond a decline in validation loss. Since checkpoints were created and metrics for each iteration were captured, the appropriate model to select will be evaluated later.
  - Higher gradient accumulation steps were taking very long time for model evaluation and hence was adjusted back to 1

----------------------------
<br>
<br>

In [None]:
# SET RETRAIN FLAG

RETRAIN_FLAG = 'Y'

GOOGLE_DRIVE_DIR = '/content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL'

In [None]:
# DISPLAY FUNCTIONS
def fn_display_header(msg):
  print('-' * 80)
  print(' ' * 10, msg)
  print('-' * 80)

def fn_display_message(msg):
  print(msg)

<br>

# Import/Install Necessary libraries

In [None]:
# INSTALL NECESSARY PACKAGES

RE_INSTALL_FLAG = 'Y'

if RE_INSTALL_FLAG == 'Y':
  import sys

  fn_display_header("Installing transformers[torch]")
  !pip install transformers[torch]

  fn_display_header("Installing torchmetrics")
  !pip install torchmetrics

  fn_display_header("Installing pytorch_lightning")
  !pip install pytorch_lightning

  fn_display_header("Installing codebleu")
  !pip install codebleu

  fn_display_header("Installing tree_sitter-python")
  !pip install tree_sitter-python


--------------------------------------------------------------------------------
           Installing transformers[torch]
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
           Installing torchmetrics
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
           Installing pytorch_lightning
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
           Installing codebleu
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
           Installing tree_sitter-python
--------------------------------------------------------------------------------


In [None]:
# IMPORT NECESSARY PACKAGES
import re
import time
import torch
import shutil
import logging
import numpy as np
import io, os, math
import pandas as pd
from tqdm import tqdm
from datetime import datetime
from google.colab import drive
drive.mount('/content/drive')


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
fn_display_header('Print COLAB GPU Info')
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  fn_display_message('Not connected to a GPU')
else:
  print(gpu_info)

from psutil import virtual_memory
ram_gb = virtual_memory().total / 1e9
fn_display_message('Your runtime has {:.1f} gigabytes of available RAM\n'.format(ram_gb))

if ram_gb < 20:
  fn_display_message('Not using a high-RAM runtime')
else:
  fn_display_message('You are using a high-RAM runtime!')

--------------------------------------------------------------------------------
           Print COLAB GPU Info
--------------------------------------------------------------------------------
Sat Jun  8 18:16:20 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA A100-SXM4-40GB          Off | 00000000:00:04.0 Off |                    0 |
| N/A   34C    P0              47W / 400W |      2MiB / 40960MiB |      0%      Default |
|                                         |           

<br>

# READ/PRE-PROCESS INPUT DATA

In [None]:
# READ WEB SCRAPED/AUGMENTED DATA INTO DATAFRAME AND DISPLAY DETAILS

df_input = pd.read_csv('ETL_P4_data_augmentation_v1_20K.csv').drop('Unnamed: 0', axis=1)

fn_display_header('Display COLUMN/COUNT Details: df_input')
df_input.info()

fn_display_header('Display rows: df_input')
df_input.head().style.set_properties(**{'text-align': 'left'})

--------------------------------------------------------------------------------
           Display COLUMN/COUNT Details: df_input
--------------------------------------------------------------------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19972 entries, 0 to 19971
Data columns (total 5 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   code_description  19972 non-null  object
 1   code_snippet      19972 non-null  object
 2   import_line       1286 non-null   object
 3   Category          19972 non-null  object
 4   function          19972 non-null  object
dtypes: object(5)
memory usage: 780.3+ KB
--------------------------------------------------------------------------------
           Display rows: df_input
--------------------------------------------------------------------------------


Unnamed: 0,code_description,code_snippet,import_line,Category,function
0,Pyspark code to overwrite a DataFrame into a CSV file.The data is overwritten to file or directory MY_DIR .,"df.write.mode(""overwrite"").format(""csv"").save(MY_DIR)",,Input/Output,pyspark.sql.DataFrameReader.csv
1,Pyspark code to Read the CSV file as a DataFrame with 'nullValue' option set to 'Hyukjin Kwon'.The data is read from file or directory MY_DIR.,"df = spark.read.csv(MY_DIR, schema=df.schema, nullValue=""Hyukjin Kwon"")",,Input/Output,pyspark.sql.DataFrameReader.csv
2,Pyspark code to overwrite a DataFrame into a JSON file.The data is overwritten to file or directory MY_DIR .,"df.write.mode(""overwrite"").format(""json"").save(MY_DIR)",,Input/Output,pyspark.sql.DataFrameReader.format
3,Pyspark code which Specifies the input data source format.The data is overwritten to file or directory MY_DIR .,"df.write.mode(""overwrite"").format(""json"").save(MY_DIR)",,Input/Output,pyspark.sql.DataFrameReader.format
4,Pyspark code to Read the JSON file as a DataFrame.The data is read from file or directory MY_DIR.,df = spark.read.format('json').load(MY_DIR),,Input/Output,pyspark.sql.DataFrameReader.format


In [None]:
# PREPROCESS INPUT DATA

# prefix import_line to code_snippet
df_input['code_snippet'] = df_input.apply(lambda x: x['import_line'] + '\n' + x['code_snippet'] if not pd.isna(x['import_line']) else x['code_snippet'], axis =1)
new_line_char_count = df_input[df_input['code_snippet'].str.contains('\n')].shape[0]
fn_display_message(f" --> Count of rows having newline char in code_snippet: {new_line_char_count}")

# Check if code_snippet contains <br> tags
fn_display_message(f" --> Count of rows having <br> char in code_snippet: {df_input[df_input['code_snippet'].str.contains('<br>')].shape[0]}")

fn_display_header('Display rows: df_input')
df_input.head().style.set_properties(**{'text-align': 'left'})

 --> Count of rows having newline char in code_snippet: 1286
 --> Count of rows having <br> char in code_snippet: 0
--------------------------------------------------------------------------------
           Display rows: df_input
--------------------------------------------------------------------------------


Unnamed: 0,code_description,code_snippet,import_line,Category,function
0,Pyspark code to overwrite a DataFrame into a CSV file.The data is overwritten to file or directory MY_DIR .,"df.write.mode(""overwrite"").format(""csv"").save(MY_DIR)",,Input/Output,pyspark.sql.DataFrameReader.csv
1,Pyspark code to Read the CSV file as a DataFrame with 'nullValue' option set to 'Hyukjin Kwon'.The data is read from file or directory MY_DIR.,"df = spark.read.csv(MY_DIR, schema=df.schema, nullValue=""Hyukjin Kwon"")",,Input/Output,pyspark.sql.DataFrameReader.csv
2,Pyspark code to overwrite a DataFrame into a JSON file.The data is overwritten to file or directory MY_DIR .,"df.write.mode(""overwrite"").format(""json"").save(MY_DIR)",,Input/Output,pyspark.sql.DataFrameReader.format
3,Pyspark code which Specifies the input data source format.The data is overwritten to file or directory MY_DIR .,"df.write.mode(""overwrite"").format(""json"").save(MY_DIR)",,Input/Output,pyspark.sql.DataFrameReader.format
4,Pyspark code to Read the JSON file as a DataFrame.The data is read from file or directory MY_DIR.,df = spark.read.format('json').load(MY_DIR),,Input/Output,pyspark.sql.DataFrameReader.format


<br>

## PRINT MODEL INFO

In [None]:
from transformers import T5ForConditionalGeneration, RobertaTokenizer
import json

# Load the model and tokenizer
model_name = "Salesforce/codet5-small"
model = T5ForConditionalGeneration.from_pretrained(model_name)
tokenizer = RobertaTokenizer.from_pretrained(model_name)

fn_display_header("Model Information")

print(f"Model: {model_name}")
print(f"Architecture: {model.config.architectures[0]}")
print(f"Number of parameters: {model.num_parameters():,}")
print(f"Vocabulary Size: {len(tokenizer)}")

fn_display_header("Model Config")
print(json.dumps(model.config.to_dict(), indent=4))


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


--------------------------------------------------------------------------------
           Model Information
--------------------------------------------------------------------------------
Model: Salesforce/codet5-small
Architecture: T5ForConditionalGeneration
Number of parameters: 60,492,288
Vocabulary Size: 32100
--------------------------------------------------------------------------------
           Model Config
--------------------------------------------------------------------------------
{
    "vocab_size": 32100,
    "d_model": 512,
    "d_kv": 64,
    "d_ff": 2048,
    "num_layers": 6,
    "num_decoder_layers": 6,
    "num_heads": 8,
    "relative_attention_num_buckets": 32,
    "relative_attention_max_distance": 128,
    "dropout_rate": 0.1,
    "classifier_dropout": 0.0,
    "layer_norm_epsilon": 1e-06,
    "initializer_factor": 1.0,
    "feed_forward_proj": "relu",
    "use_cache": true,
    "dense_act_fn": "relu",
    "is_gated_act": false,
    "return_dict": true,
  

<br>

## FIND MAX_LENGTH for CODET5 Model

In [None]:
# DETERMINE MAX_LENGTH
# Since size of code_description and code_snippets are less, an optimum MAX_LENGTH can help reduce training time

from transformers import RobertaTokenizer

df_test = df_input.copy()

# Initialize the CODET5 tokenizer
tokenizer = RobertaTokenizer.from_pretrained('Salesforce/codet5-small')

# Tokenize the 'code_description' column
comment_tokenized_texts = df_test['code_description'].apply(lambda x: tokenizer.encode(x, add_special_tokens=True))
code_tokenized_texts    = df_test['code_snippet'].apply(lambda x: tokenizer.encode(x, add_special_tokens=True))

# Find the maximum token length
comment_max_length = max(len(tokens) for tokens in comment_tokenized_texts)
code_max_length    = max(len(tokens) for tokens in code_tokenized_texts)

fn_display_header('Determine MAX_LENGHT for CODET5 Model Training ')
fn_display_message(f" --> Max Length of code_descriptions : {comment_max_length}")
fn_display_message(f" --> Max Length of code_snippet : {code_max_length}")


--------------------------------------------------------------------------------
           Determine MAX_LENGHT for CODET5 Model Training 
--------------------------------------------------------------------------------
 --> Max Length of code_descriptions : 117
 --> Max Length of code_snippet : 105


<br>

# CODET5-SMALL - DEFINE TRAINING FUNCTION

In [None]:
# DEFINE THE CodeDataset CLASS

from codebleu import codebleu
from torch.nn.utils.rnn import pad_sequence
from torchtext.data.metrics import bleu_score
from torch.utils.data import Dataset, DataLoader
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from transformers import RobertaTokenizer, T5ForConditionalGeneration, get_linear_schedule_with_warmup


class CodeDataset(Dataset):
  def __init__(self, descriptions, snippets, max_length_desc, max_length_snippet, tokenizer):
    self.descriptions = descriptions
    self.snippets = snippets
    self.max_length_desc = max_length_desc
    self.max_length_snippet = max_length_snippet
    self.tokenizer = tokenizer

  def __len__(self):
    return len(self.descriptions)

  def __getitem__(self, idx):
    if idx >= len(self.descriptions):
      raise IndexError("Index out of range")
    description = 'Translate prompt to PySpark: ' + self.descriptions[idx]
    snippet = self.snippets[idx]
    inputs = self.tokenizer.encode_plus(
        description,
        snippet,
        add_special_tokens=True,
        padding='max_length',
        max_length=self.max_length_desc,
        return_tensors='pt'
        )
    targets = self.tokenizer.encode(
        snippet,
        add_special_tokens=True,
        padding='max_length',
        max_length=self.max_length_snippet,
        return_tensors='pt'
        )

    # Ensure that the tensors have the same size
    input_ids         = inputs['input_ids'].squeeze()
    attention_mask    = inputs['attention_mask'].squeeze()
    decoder_input_ids = targets[0]

    # Pad the tensors to the same length
    input_ids = pad_sequence([input_ids], batch_first=True, padding_value=0).narrow(1, 0, self.max_length_desc)
    attention_mask = pad_sequence([attention_mask], batch_first=True, padding_value=0).narrow(1, 0, self.max_length_desc)
    decoder_input_ids = pad_sequence([decoder_input_ids], batch_first=True, padding_value=0).narrow(1, 0, self.max_length_snippet)

    #print(f'-- DEBUG -- input_ids.shape: {input_ids.shape} [{type(input_ids)}] + attention_mask.shape: {attention_mask.shape} ')
    return {
        'input_ids': input_ids.long(),
        'attention_mask': attention_mask.float(),
        'decoder_input_ids': decoder_input_ids.long()
    }

fn_display_header("CodeDataset CLASS Defined")



--------------------------------------------------------------------------------
           CodeDataset CLASS Defined
--------------------------------------------------------------------------------


In [None]:
# DEFINE THE CODET5 TRAINING FUNCTION

def fn_codet5_trainer(model_name, input_df, model_config):
  # Set the device to GPU if available
  device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

  code_max_length = model_config['code_max_length']
  comment_max_length = model_config['comment_max_length']
  batch_size = model_config['batch_size']
  epochs = model_config['epochs']
  learning_rates = model_config['learning_rates']
  patience = model_config['patience']
  accumulation_steps = model_config['accumulation_steps']


  # Load pre-trained model tokenizer (vocabulary)
  tokenizer = RobertaTokenizer.from_pretrained(model_name) # BertTokenizer.from_pretrained('bert-base-uncased')

  # Split data into training and validation sets
  desc_train, desc_val, snippet_train, snippet_val = train_test_split(input_df['code_description'], input_df['code_snippet'], test_size=0.20, random_state=1357)

  # Create data loaders
  train_data = CodeDataset(desc_train.tolist(), snippet_train.tolist(), comment_max_length, code_max_length, tokenizer)
  val_data = CodeDataset(desc_val.tolist(), snippet_val.tolist(), comment_max_length, code_max_length, tokenizer)


  train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
  val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)

  # Load pre-trained model
  model = T5ForConditionalGeneration.from_pretrained(model_name).to(device)

  # LOOP THROUGH ALL LEARNING RATES
  final_score_list = []
  for lr_idx, learning_rate in enumerate(learning_rates):

    optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate)
    scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=0, num_training_steps=len(train_loader) * epochs)

    # Define the loss function
    criterion = torch.nn.CrossEntropyLoss()

    # Training loop
    model.train()
    best_loss = np.inf
    patience_idx = 0
    len_train_loader = len(train_loader)

    # LOOP TRHOUGH EACH EPOCH PER LEARNING RATE
    for epoch in range(epochs):
      start_time = time.time()
      iteration_count = epoch + 1 + lr_idx * epochs
      fn_display_header(f'[learning_rate={learning_rate}] - Iteration {iteration_count}/{epochs*len(learning_rates)} - Epoch {epoch+1}/{epochs} - learning rate {lr_idx+1}/{len(learning_rates)}')
      # Initialize variables
      accumulated_loss = 0
      all_labels = []
      all_preds = []
      all_labels_bleu = []
      all_preds_bleu = []
      all_labels_codebleu = []
      all_preds_codebleu = []
      early_stopping_score_dict = {}

      batch_idx = 0
      for batch in tqdm(train_loader):
        input_ids = batch['input_ids'].squeeze(1).long().to(device)
        attention_mask = batch['attention_mask'].squeeze(1).to(device)
        decoder_input_ids = batch['decoder_input_ids'].squeeze(1).long().to(device)

        outputs = model(
            input_ids=input_ids,
            attention_mask=attention_mask,
            labels=decoder_input_ids,
            )

        loss = outputs.loss
        loss = loss / accumulation_steps
        loss.backward()

        # GRADIENT ACCUMULATION: Update weights every accumulation_steps
        accumulated_loss += loss.item()
        if (batch_idx + 1) % accumulation_steps == 0:
            optimizer.step()
            scheduler.step()
            optimizer.zero_grad()

        # Collect labels and predictions for evaluation
        all_labels.extend(decoder_input_ids.view(-1).cpu().numpy())
        all_preds.extend(outputs.logits.argmax(dim=-1).view(-1).cpu().numpy())

        # Collect labels and predictions for Bleu score. The bleu function expects a list of list as arguments.
        for item in decoder_input_ids:
          all_labels_bleu.append(item.cpu().numpy().tolist())
        for item in outputs.logits:
          all_preds_bleu.append(item.argmax(dim=-1).cpu().numpy().tolist())

        batch_idx = batch_idx + 1

      # After each epoch, check the loss on the validation set
      val_loss = 0
      model.eval()
      with torch.no_grad():
        for batch in val_loader:
          outputs = model(input_ids=input_ids, attention_mask=attention_mask, labels=decoder_input_ids)
          val_loss += criterion(outputs.logits.view(-1, outputs.logits.size(-1)), decoder_input_ids.view(-1)).item()

      # ---------------- DERIVE METRICS SECTION ----------------------

      #- Remove all special tokens like <extra_id_0>, pad, cls, sep from the predicted text
      filtered_labels = [label for label in all_labels if label != tokenizer.convert_tokens_to_ids("<extra_id_0>") and label not in [tokenizer.pad_token_id, tokenizer.cls_token_id, tokenizer.sep_token_id]]
      filtered_preds  = [pred  for pred  in all_preds  if pred  != tokenizer.convert_tokens_to_ids("<extra_id_0>") and pred  not in [tokenizer.pad_token_id, tokenizer.cls_token_id, tokenizer.sep_token_id]]

      # Ensure that the filtered arrays have the same length
      # If filtered_labels is longer than filtered_preds, it appends copies of the last prediction token to filtered_preds until their lengths match.
      # If filtered_preds is longer than filtered_labels, it appends copies of the last label token to filtered_labels until their lengths match.
      if len(filtered_labels) > len(filtered_preds):
          filtered_preds = filtered_preds + [filtered_preds[-1]] * (len(filtered_labels) - len(filtered_preds))
      elif len(filtered_preds) > len(filtered_labels):
          filtered_labels = filtered_labels + [filtered_labels[-1]] * (len(filtered_preds) - len(filtered_labels))

      # COMMON EVALUATION METRICS
      accuracy = accuracy_score(filtered_labels, filtered_preds)
      precision = precision_score(filtered_labels, filtered_preds, average='weighted', zero_division=0)
      recall = recall_score(filtered_labels, filtered_preds, average='weighted', zero_division=0)
      F1_score = f1_score(filtered_labels, filtered_preds, average='weighted', zero_division=0)

      # DERIVE CANDIDATE AND REFERENCE CORPUS FOR CODEBLEU SCORES:
      # - ignore all character after the end of line identifier </s> and Remove special tokens like <extra_id_0>
      candidate_corpus = [re.sub(r"<extra_id_\d+>", "", tokenizer.decode(x).split('</s>')[0]).strip() for x in all_preds_bleu]
      reference_corpus = [re.sub(r"<extra_id_\d+>", "", tokenizer.decode(x).split('</s>')[0]).strip() for x in [*all_labels_bleu]]

      #print(f"candidate_corpus: {len(candidate_corpus)} -- {candidate_corpus}")
      #print(f"reference_corpus: {len(reference_corpus)} -- {reference_corpus}")

      CODEBLEU_score = codebleu.calc_codebleu(candidate_corpus, reference_corpus, lang='python')

      checkpoint_saved = 'N'
      checkpoint_file = f'check_{iteration_count}_model_codet5_small_pyspark_generator.pt'
      # If the validation loss is at a minimum
      if val_loss < best_loss:
        best_loss = val_loss
        patience_idx = 0
        # Save model checkpoint here
        torch.save(model.state_dict(), checkpoint_file)
        global GOOGLE_DRIVE_DIR
        shutil.copyfile(checkpoint_file, GOOGLE_DRIVE_DIR + '/' + checkpoint_file)
        checkpoint_saved = 'Y'
        print(f"--> Checkpoint saved : {GOOGLE_DRIVE_DIR + '/' + checkpoint_file}")

      # EARLY STOPPING: If the validation loss does not decrease after a certain number of epochs, stop the training
      else:
        patience_idx += 1
        if patience_idx > patience:
          print(f"--> Early stopping: Loss did not decrease in {patience} epochs")
          break

      early_stopping_score_dict = {
          'iteration_no'          : f"{iteration_count}",
          "epoch_no"              : f"{epoch+1}",
          "learning_rate"         : f"{learning_rate:.6f}",
          "accumulated_loss"      : f"{accumulated_loss:.4f}",
          "val_loss"              : f"{val_loss:.4f}",
          "accurracy"             : f"{accuracy:.4f}",
          "precision"             : f"{precision:.4f}",
          "recall_score"          : f"{recall:.4f}",
          "F1-score"              : f"{F1_score:.4f}",
          "CODEBLEU_score"        : f"{CODEBLEU_score['codebleu']:.4f}",
          "syntax_match_score"    : f"{CODEBLEU_score['syntax_match_score']:.4f}",
          "dataflow_match_score"  : f"{CODEBLEU_score['dataflow_match_score']:.4f}",
          "ngram_match_score"     : f"{CODEBLEU_score['ngram_match_score']:.4f}",
          "weighted_ngram_match_score"  : f"{CODEBLEU_score['weighted_ngram_match_score']:.4f}",
          "checkpoint_file"       : f"{checkpoint_file}",
          "checkpoint_saved"      : f"{checkpoint_saved}"
      }

      fn_display_message('------- Evaluation Scores -------')
      for key, val in early_stopping_score_dict.items():
        if key not in ['iteration_no', 'epoch_no', 'learning_rate', "checkpoint_file", "checkpoint_saved"]:
          fn_display_message(f"{key}: {val}")

      final_score_list.append(early_stopping_score_dict)

      # Overwite Metrics data into Google drive
      df_scores = pd.DataFrame(final_score_list)
      scores_csv = 'model_codet5_small_scores.csv'
      df_scores.to_csv(scores_csv, index=False)

      shutil.copyfile(scores_csv, GOOGLE_DRIVE_DIR + '/' + scores_csv)
      print(f"--> Latest Metrics saved : {GOOGLE_DRIVE_DIR + '/' + scores_csv}")

      end_time = time.time()
      epoch_time = (end_time - start_time)/60
      print(f"--> Iteration {epoch+1+lr_idx}/{epochs*len(learning_rates)} took {epoch_time:.2f} minutes")

  return model, final_score_list

fn_display_header("TRAINING FUNCTION LOADED: fn_codet5_trainer")


--------------------------------------------------------------------------------
           TRAINING FUNCTION LOADED: fn_codet5_trainer
--------------------------------------------------------------------------------


<br>

# CODET5-SMALL - RUN TRAINING LOOP

In [None]:
# TRAIN CODET5 BASE MODEL

# Set QUICK_DEBUG = 'N' to test with a small set of records for quick debugging
QUICK_DEBUG = 'N'

if RETRAIN_FLAG == 'Y':

  if QUICK_DEBUG == 'Y':
    config_codet5_small = {
        'comment_max_length': 120,
        'code_max_length': 120,
        'batch_size': 16,
        'epochs'    : 2,
        'patience'  : 5,
        'learning_rates': [0.000005],
        'accumulation_steps': 1,
        'apply_dataset_sampling': 'N',
        'dataset_sample_size': 20000
    }
    subset_count = 16
  else:
    config_codet5_small = {
      'comment_max_length': 120,
      'code_max_length': 120,
      'batch_size': 128,
      'epochs'    : 20,
      'patience'  : 5,
      'learning_rates': [0.000005, 0.00001, 0.00003, 0.0001, 0.0005],
      'accumulation_steps': 1,
      'apply_dataset_sampling': 'N',
      'dataset_sample_size': 20000
    }
    subset_count = 1000000000

  fn_display_header('------- Model Input Configuration -------')
  for key, val in config_codet5_small.items():
    fn_display_message(f"  {key}: {val}")
  print("\n")

  # Apply Sampling to input dataset to improve training time. Can be controlled in the model configs
  if config_codet5_small['apply_dataset_sampling'] == 'Y':
    df_input_codet5 = df_input.sample(n=config_codet5_small['dataset_sample_size'], random_state=42)[:subset_count]
  else:
    df_input_codet5 = df_input[:subset_count]

  fn_display_header('------- Display Input Dataset Info -------')
  df_input_codet5.info()
  print("\n")

  fn_display_header('------- START MODEL TRAINING/INFERENCE -------')
  model_codet5_small, final_score_list = fn_codet5_trainer('Salesforce/codet5-small', df_input_codet5, config_codet5_small)


--------------------------------------------------------------------------------
           ------- Model Input Configuration -------
--------------------------------------------------------------------------------
  comment_max_length: 120
  code_max_length: 120
  batch_size: 128
  epochs: 20
  patience: 5
  learning_rates: [5e-06, 1e-05, 3e-05, 0.0001, 0.0005]
  accumulation_steps: 1
  apply_dataset_sampling: N
  dataset_sample_size: 20000


--------------------------------------------------------------------------------
           ------- Display Input Dataset Info -------
--------------------------------------------------------------------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19972 entries, 0 to 19971
Data columns (total 5 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   code_description  19972 non-null  object
 1   code_snippet      19972 non-null  object
 2   import_line       1286 non-null   objec

100%|██████████| 125/125 [01:13<00:00,  1.70it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_1_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 257.8845
val_loss: 7.7571
accurracy: 0.0361
precision: 0.0363
recall_score: 0.0361
F1-score: 0.0359
CODEBLEU_score: 0.1997
syntax_match_score: 0.6619
dataflow_match_score: 0.0672
ngram_match_score: 0.0330
weighted_ngram_match_score: 0.0368
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 1/100 took 2.73 minutes
--------------------------------------------------------------------------------
           [learning_rate=5e-06] - Iteration 2/100 - Epoch 2/20 - learning rate 1/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:11<00:00,  1.76it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_2_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 18.9221
val_loss: 3.3785
accurracy: 0.0348
precision: 0.0346
recall_score: 0.0348
F1-score: 0.0344
CODEBLEU_score: 0.4392
syntax_match_score: 0.8087
dataflow_match_score: 0.4730
ngram_match_score: 0.2488
weighted_ngram_match_score: 0.2262
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 2/100 took 2.54 minutes
--------------------------------------------------------------------------------
           [learning_rate=5e-06] - Iteration 3/100 - Epoch 3/20 - learning rate 1/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_3_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 7.1419
val_loss: 1.3601
accurracy: 0.0358
precision: 0.0356
recall_score: 0.0358
F1-score: 0.0357
CODEBLEU_score: 0.5986
syntax_match_score: 0.8906
dataflow_match_score: 0.7077
ngram_match_score: 0.4229
weighted_ngram_match_score: 0.3733
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 3/100 took 2.53 minutes
--------------------------------------------------------------------------------
           [learning_rate=5e-06] - Iteration 4/100 - Epoch 4/20 - learning rate 1/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_4_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 5.2771
val_loss: 0.9561
accurracy: 0.0371
precision: 0.0370
recall_score: 0.0371
F1-score: 0.0371
CODEBLEU_score: 0.6736
syntax_match_score: 0.9359
dataflow_match_score: 0.8502
ngram_match_score: 0.4845
weighted_ngram_match_score: 0.4236
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 4/100 took 2.52 minutes
--------------------------------------------------------------------------------
           [learning_rate=5e-06] - Iteration 5/100 - Epoch 5/20 - learning rate 1/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


------- Evaluation Scores -------
accumulated_loss: 4.3858
val_loss: 0.9808
accurracy: 0.0391
precision: 0.0390
recall_score: 0.0391
F1-score: 0.0390
CODEBLEU_score: 0.7064
syntax_match_score: 0.9487
dataflow_match_score: 0.9114
ngram_match_score: 0.5159
weighted_ngram_match_score: 0.4494
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 5/100 took 2.50 minutes
--------------------------------------------------------------------------------
           [learning_rate=5e-06] - Iteration 6/100 - Epoch 6/20 - learning rate 1/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.76it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_6_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 3.8287
val_loss: 0.8535
accurracy: 0.0486
precision: 0.0485
recall_score: 0.0486
F1-score: 0.0485
CODEBLEU_score: 0.7310
syntax_match_score: 0.9583
dataflow_match_score: 0.9615
ngram_match_score: 0.5372
weighted_ngram_match_score: 0.4671
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 6/100 took 2.53 minutes
--------------------------------------------------------------------------------
           [learning_rate=5e-06] - Iteration 7/100 - Epoch 7/20 - learning rate 1/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.76it/s]


------- Evaluation Scores -------
accumulated_loss: 3.4200
val_loss: 1.0844
accurracy: 0.0457
precision: 0.0457
recall_score: 0.0457
F1-score: 0.0457
CODEBLEU_score: 0.7602
syntax_match_score: 0.9678
dataflow_match_score: 0.9747
ngram_match_score: 0.5880
weighted_ngram_match_score: 0.5102
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 7/100 took 2.52 minutes
--------------------------------------------------------------------------------
           [learning_rate=5e-06] - Iteration 8/100 - Epoch 8/20 - learning rate 1/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_8_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 3.1253
val_loss: 0.4856
accurracy: 0.0963
precision: 0.0963
recall_score: 0.0963
F1-score: 0.0963
CODEBLEU_score: 0.7704
syntax_match_score: 0.9756
dataflow_match_score: 0.9723
ngram_match_score: 0.6073
weighted_ngram_match_score: 0.5265
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 8/100 took 2.51 minutes
--------------------------------------------------------------------------------
           [learning_rate=5e-06] - Iteration 9/100 - Epoch 9/20 - learning rate 1/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:11<00:00,  1.76it/s]


------- Evaluation Scores -------
accumulated_loss: 2.9157
val_loss: 0.8595
accurracy: 0.0389
precision: 0.0389
recall_score: 0.0389
F1-score: 0.0389
CODEBLEU_score: 0.7748
syntax_match_score: 0.9787
dataflow_match_score: 0.9720
ngram_match_score: 0.6151
weighted_ngram_match_score: 0.5332
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 9/100 took 2.51 minutes
--------------------------------------------------------------------------------
           [learning_rate=5e-06] - Iteration 10/100 - Epoch 10/20 - learning rate 1/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


------- Evaluation Scores -------
accumulated_loss: 2.7449
val_loss: 1.1610
accurracy: 0.1457
precision: 0.1457
recall_score: 0.1457
F1-score: 0.1457
CODEBLEU_score: 0.7778
syntax_match_score: 0.9808
dataflow_match_score: 0.9720
ngram_match_score: 0.6207
weighted_ngram_match_score: 0.5379
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 10/100 took 2.51 minutes
--------------------------------------------------------------------------------
           [learning_rate=5e-06] - Iteration 11/100 - Epoch 11/20 - learning rate 1/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


------- Evaluation Scores -------
accumulated_loss: 2.5935
val_loss: 0.8721
accurracy: 0.0741
precision: 0.0741
recall_score: 0.0741
F1-score: 0.0741
CODEBLEU_score: 0.7806
syntax_match_score: 0.9829
dataflow_match_score: 0.9727
ngram_match_score: 0.6252
weighted_ngram_match_score: 0.5417
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 11/100 took 2.51 minutes
--------------------------------------------------------------------------------
           [learning_rate=5e-06] - Iteration 12/100 - Epoch 12/20 - learning rate 1/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_12_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 2.4574
val_loss: 0.3728
accurracy: 0.0712
precision: 0.0711
recall_score: 0.0712
F1-score: 0.0711
CODEBLEU_score: 0.7829
syntax_match_score: 0.9845
dataflow_match_score: 0.9734
ngram_match_score: 0.6290
weighted_ngram_match_score: 0.5449
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 12/100 took 2.52 minutes
--------------------------------------------------------------------------------
           [learning_rate=5e-06] - Iteration 13/100 - Epoch 13/20 - learning rate 1/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.76it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_13_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 2.3398
val_loss: 0.2747
accurracy: 0.1413
precision: 0.1413
recall_score: 0.1413
F1-score: 0.1413
CODEBLEU_score: 0.7862
syntax_match_score: 0.9863
dataflow_match_score: 0.9784
ngram_match_score: 0.6324
weighted_ngram_match_score: 0.5476
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 13/100 took 2.52 minutes
--------------------------------------------------------------------------------
           [learning_rate=5e-06] - Iteration 14/100 - Epoch 14/20 - learning rate 1/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


------- Evaluation Scores -------
accumulated_loss: 2.2388
val_loss: 0.7582
accurracy: 0.0231
precision: 0.0231
recall_score: 0.0231
F1-score: 0.0231
CODEBLEU_score: 0.7879
syntax_match_score: 0.9878
dataflow_match_score: 0.9785
ngram_match_score: 0.6353
weighted_ngram_match_score: 0.5499
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 14/100 took 2.52 minutes
--------------------------------------------------------------------------------
           [learning_rate=5e-06] - Iteration 15/100 - Epoch 15/20 - learning rate 1/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


------- Evaluation Scores -------
accumulated_loss: 2.1470
val_loss: 0.6823
accurracy: 0.0621
precision: 0.0621
recall_score: 0.0621
F1-score: 0.0621
CODEBLEU_score: 0.7897
syntax_match_score: 0.9887
dataflow_match_score: 0.9825
ngram_match_score: 0.6367
weighted_ngram_match_score: 0.5510
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 15/100 took 2.51 minutes
--------------------------------------------------------------------------------
           [learning_rate=5e-06] - Iteration 16/100 - Epoch 16/20 - learning rate 1/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_16_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 2.0659
val_loss: 0.1892
accurracy: 0.0723
precision: 0.0723
recall_score: 0.0723
F1-score: 0.0723
CODEBLEU_score: 0.7910
syntax_match_score: 0.9892
dataflow_match_score: 0.9830
ngram_match_score: 0.6388
weighted_ngram_match_score: 0.5528
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 16/100 took 2.52 minutes
--------------------------------------------------------------------------------
           [learning_rate=5e-06] - Iteration 17/100 - Epoch 17/20 - learning rate 1/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


------- Evaluation Scores -------
accumulated_loss: 2.0046
val_loss: 0.7980
accurracy: 0.0434
precision: 0.0434
recall_score: 0.0434
F1-score: 0.0434
CODEBLEU_score: 0.7923
syntax_match_score: 0.9902
dataflow_match_score: 0.9836
ngram_match_score: 0.6408
weighted_ngram_match_score: 0.5545
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 17/100 took 2.50 minutes
--------------------------------------------------------------------------------
           [learning_rate=5e-06] - Iteration 18/100 - Epoch 18/20 - learning rate 1/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:11<00:00,  1.76it/s]


------- Evaluation Scores -------
accumulated_loss: 1.9515
val_loss: 0.2096
accurracy: 0.0593
precision: 0.0593
recall_score: 0.0593
F1-score: 0.0593
CODEBLEU_score: 0.7933
syntax_match_score: 0.9906
dataflow_match_score: 0.9842
ngram_match_score: 0.6424
weighted_ngram_match_score: 0.5559
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 18/100 took 2.51 minutes
--------------------------------------------------------------------------------
           [learning_rate=5e-06] - Iteration 19/100 - Epoch 19/20 - learning rate 1/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


------- Evaluation Scores -------
accumulated_loss: 1.9194
val_loss: 0.6482
accurracy: 0.0521
precision: 0.0521
recall_score: 0.0521
F1-score: 0.0521
CODEBLEU_score: 0.7941
syntax_match_score: 0.9909
dataflow_match_score: 0.9857
ngram_match_score: 0.6433
weighted_ngram_match_score: 0.5567
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 19/100 took 2.52 minutes
--------------------------------------------------------------------------------
           [learning_rate=5e-06] - Iteration 20/100 - Epoch 20/20 - learning rate 1/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


------- Evaluation Scores -------
accumulated_loss: 1.9013
val_loss: 0.6539
accurracy: 0.0424
precision: 0.0424
recall_score: 0.0424
F1-score: 0.0424
CODEBLEU_score: 0.7945
syntax_match_score: 0.9909
dataflow_match_score: 0.9862
ngram_match_score: 0.6438
weighted_ngram_match_score: 0.5571
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 20/100 took 2.51 minutes
--------------------------------------------------------------------------------
           [learning_rate=1e-05] - Iteration 21/100 - Epoch 1/20 - learning rate 2/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:12<00:00,  1.72it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_21_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 2.7328
val_loss: 0.0665
accurracy: 0.0370
precision: 0.0370
recall_score: 0.0370
F1-score: 0.0370
CODEBLEU_score: 0.6488
syntax_match_score: 0.9355
dataflow_match_score: 0.6863
ngram_match_score: 0.5199
weighted_ngram_match_score: 0.4535
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 2/100 took 2.57 minutes
--------------------------------------------------------------------------------
           [learning_rate=1e-05] - Iteration 22/100 - Epoch 2/20 - learning rate 2/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.76it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_22_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 0.3180
val_loss: 0.0343
accurracy: 0.2140
precision: 0.2140
recall_score: 0.2140
F1-score: 0.2140
CODEBLEU_score: 0.7962
syntax_match_score: 0.9901
dataflow_match_score: 0.9867
ngram_match_score: 0.6479
weighted_ngram_match_score: 0.5602
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 3/100 took 2.53 minutes
--------------------------------------------------------------------------------
           [learning_rate=1e-05] - Iteration 23/100 - Epoch 3/20 - learning rate 2/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_23_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 0.1644
val_loss: 0.0096
accurracy: 0.3206
precision: 0.3207
recall_score: 0.3206
F1-score: 0.3206
CODEBLEU_score: 0.8020
syntax_match_score: 0.9941
dataflow_match_score: 0.9891
ngram_match_score: 0.6570
weighted_ngram_match_score: 0.5679
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 4/100 took 2.54 minutes
--------------------------------------------------------------------------------
           [learning_rate=1e-05] - Iteration 24/100 - Epoch 4/20 - learning rate 2/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.76it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_24_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 0.1006
val_loss: 0.0025
accurracy: 0.3714
precision: 0.3714
recall_score: 0.3714
F1-score: 0.3714
CODEBLEU_score: 0.8075
syntax_match_score: 0.9965
dataflow_match_score: 0.9968
ngram_match_score: 0.6633
weighted_ngram_match_score: 0.5733
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 5/100 took 2.53 minutes
--------------------------------------------------------------------------------
           [learning_rate=1e-05] - Iteration 25/100 - Epoch 5/20 - learning rate 2/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0675
val_loss: 0.0113
accurracy: 0.1155
precision: 0.1155
recall_score: 0.1155
F1-score: 0.1155
CODEBLEU_score: 0.8109
syntax_match_score: 0.9983
dataflow_match_score: 0.9992
ngram_match_score: 0.6683
weighted_ngram_match_score: 0.5776
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 6/100 took 2.52 minutes
--------------------------------------------------------------------------------
           [learning_rate=1e-05] - Iteration 26/100 - Epoch 6/20 - learning rate 2/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0490
val_loss: 0.0295
accurracy: 0.3228
precision: 0.3228
recall_score: 0.3228
F1-score: 0.3228
CODEBLEU_score: 0.8117
syntax_match_score: 0.9988
dataflow_match_score: 0.9997
ngram_match_score: 0.6696
weighted_ngram_match_score: 0.5787
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 7/100 took 2.52 minutes
--------------------------------------------------------------------------------
           [learning_rate=1e-05] - Iteration 27/100 - Epoch 7/20 - learning rate 2/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0382
val_loss: 0.0061
accurracy: 0.3750
precision: 0.3750
recall_score: 0.3750
F1-score: 0.3750
CODEBLEU_score: 0.8124
syntax_match_score: 0.9992
dataflow_match_score: 0.9997
ngram_match_score: 0.6709
weighted_ngram_match_score: 0.5798
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 8/100 took 2.51 minutes
--------------------------------------------------------------------------------
           [learning_rate=1e-05] - Iteration 28/100 - Epoch 8/20 - learning rate 2/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0308
val_loss: 0.0044
accurracy: 0.3411
precision: 0.3411
recall_score: 0.3411
F1-score: 0.3411
CODEBLEU_score: 0.8128
syntax_match_score: 0.9993
dataflow_match_score: 0.9998
ngram_match_score: 0.6717
weighted_ngram_match_score: 0.5805
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 9/100 took 2.51 minutes
--------------------------------------------------------------------------------
           [learning_rate=1e-05] - Iteration 29/100 - Epoch 9/20 - learning rate 2/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_29_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 0.0260
val_loss: 0.0024
accurracy: 0.4729
precision: 0.4729
recall_score: 0.4729
F1-score: 0.4729
CODEBLEU_score: 0.8130
syntax_match_score: 0.9995
dataflow_match_score: 0.9999
ngram_match_score: 0.6720
weighted_ngram_match_score: 0.5807
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 10/100 took 2.52 minutes
--------------------------------------------------------------------------------
           [learning_rate=1e-05] - Iteration 30/100 - Epoch 10/20 - learning rate 2/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:11<00:00,  1.76it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0226
val_loss: 0.0090
accurracy: 0.9050
precision: 0.9050
recall_score: 0.9050
F1-score: 0.9050
CODEBLEU_score: 0.8133
syntax_match_score: 0.9995
dataflow_match_score: 0.9999
ngram_match_score: 0.6726
weighted_ngram_match_score: 0.5813
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 11/100 took 2.51 minutes
--------------------------------------------------------------------------------
           [learning_rate=1e-05] - Iteration 31/100 - Epoch 11/20 - learning rate 2/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0201
val_loss: 0.0039
accurracy: 0.1439
precision: 0.1439
recall_score: 0.1439
F1-score: 0.1439
CODEBLEU_score: 0.8132
syntax_match_score: 0.9996
dataflow_match_score: 1.0000
ngram_match_score: 0.6723
weighted_ngram_match_score: 0.5810
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 12/100 took 2.51 minutes
--------------------------------------------------------------------------------
           [learning_rate=1e-05] - Iteration 32/100 - Epoch 12/20 - learning rate 2/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0186
val_loss: 0.0030
accurracy: 0.0128
precision: 0.0128
recall_score: 0.0128
F1-score: 0.0128
CODEBLEU_score: 0.8130
syntax_match_score: 0.9995
dataflow_match_score: 1.0000
ngram_match_score: 0.6720
weighted_ngram_match_score: 0.5807
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 13/100 took 2.51 minutes
--------------------------------------------------------------------------------
           [learning_rate=1e-05] - Iteration 33/100 - Epoch 13/20 - learning rate 2/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:11<00:00,  1.76it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0170
val_loss: 0.0026
accurracy: 0.8206
precision: 0.8206
recall_score: 0.8206
F1-score: 0.8206
CODEBLEU_score: 0.8135
syntax_match_score: 0.9997
dataflow_match_score: 1.0000
ngram_match_score: 0.6729
weighted_ngram_match_score: 0.5815
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 14/100 took 2.51 minutes
--------------------------------------------------------------------------------
           [learning_rate=1e-05] - Iteration 34/100 - Epoch 14/20 - learning rate 2/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_34_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 0.0160
val_loss: 0.0019
accurracy: 0.0113
precision: 0.0113
recall_score: 0.0113
F1-score: 0.0113
CODEBLEU_score: 0.8134
syntax_match_score: 0.9996
dataflow_match_score: 1.0000
ngram_match_score: 0.6726
weighted_ngram_match_score: 0.5813
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 15/100 took 2.53 minutes
--------------------------------------------------------------------------------
           [learning_rate=1e-05] - Iteration 35/100 - Epoch 15/20 - learning rate 2/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.76it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0152
val_loss: 0.0021
accurracy: 0.2289
precision: 0.2289
recall_score: 0.2289
F1-score: 0.2289
CODEBLEU_score: 0.8135
syntax_match_score: 0.9995
dataflow_match_score: 1.0000
ngram_match_score: 0.6729
weighted_ngram_match_score: 0.5815
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 16/100 took 2.52 minutes
--------------------------------------------------------------------------------
           [learning_rate=1e-05] - Iteration 36/100 - Epoch 16/20 - learning rate 2/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0146
val_loss: 0.0040
accurracy: 0.7290
precision: 0.7290
recall_score: 0.7290
F1-score: 0.7290
CODEBLEU_score: 0.8135
syntax_match_score: 0.9996
dataflow_match_score: 1.0000
ngram_match_score: 0.6728
weighted_ngram_match_score: 0.5815
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 17/100 took 2.52 minutes
--------------------------------------------------------------------------------
           [learning_rate=1e-05] - Iteration 37/100 - Epoch 17/20 - learning rate 2/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.76it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0140
val_loss: 0.0028
accurracy: 0.0891
precision: 0.0891
recall_score: 0.0891
F1-score: 0.0891
CODEBLEU_score: 0.8131
syntax_match_score: 0.9997
dataflow_match_score: 1.0000
ngram_match_score: 0.6719
weighted_ngram_match_score: 0.5807
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 18/100 took 2.52 minutes
--------------------------------------------------------------------------------
           [learning_rate=1e-05] - Iteration 38/100 - Epoch 18/20 - learning rate 2/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0136
val_loss: 0.0048
accurracy: 0.9795
precision: 0.9795
recall_score: 0.9795
F1-score: 0.9795
CODEBLEU_score: 0.8132
syntax_match_score: 0.9995
dataflow_match_score: 1.0000
ngram_match_score: 0.6723
weighted_ngram_match_score: 0.5810
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 19/100 took 2.52 minutes
--------------------------------------------------------------------------------
           [learning_rate=1e-05] - Iteration 39/100 - Epoch 19/20 - learning rate 2/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.76it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0133
val_loss: 0.0034
accurracy: 0.2641
precision: 0.2641
recall_score: 0.2641
F1-score: 0.2641
CODEBLEU_score: 0.8133
syntax_match_score: 0.9995
dataflow_match_score: 1.0000
ngram_match_score: 0.6724
weighted_ngram_match_score: 0.5811
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 20/100 took 2.52 minutes
--------------------------------------------------------------------------------
           [learning_rate=1e-05] - Iteration 40/100 - Epoch 20/20 - learning rate 2/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


--> Early stopping: Loss did not decrease in 5 epochs
--------------------------------------------------------------------------------
           [learning_rate=3e-05] - Iteration 41/100 - Epoch 1/20 - learning rate 3/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:12<00:00,  1.73it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_41_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 0.6876
val_loss: 0.0130
accurracy: 0.0315
precision: 0.0315
recall_score: 0.0315
F1-score: 0.0315
CODEBLEU_score: 0.7553
syntax_match_score: 0.9746
dataflow_match_score: 0.9099
ngram_match_score: 0.6093
weighted_ngram_match_score: 0.5275
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 3/100 took 2.56 minutes
--------------------------------------------------------------------------------
           [learning_rate=3e-05] - Iteration 42/100 - Epoch 2/20 - learning rate 3/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.76it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_42_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 0.0211
val_loss: 0.0016
accurracy: 0.4770
precision: 0.4770
recall_score: 0.4770
F1-score: 0.4770
CODEBLEU_score: 0.8127
syntax_match_score: 0.9994
dataflow_match_score: 0.9998
ngram_match_score: 0.6714
weighted_ngram_match_score: 0.5802
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 4/100 took 2.53 minutes
--------------------------------------------------------------------------------
           [learning_rate=3e-05] - Iteration 43/100 - Epoch 3/20 - learning rate 3/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0118
val_loss: 0.0017
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8132
syntax_match_score: 0.9995
dataflow_match_score: 1.0000
ngram_match_score: 0.6723
weighted_ngram_match_score: 0.5810
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 5/100 took 2.52 minutes
--------------------------------------------------------------------------------
           [learning_rate=3e-05] - Iteration 44/100 - Epoch 4/20 - learning rate 3/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_44_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 0.0087
val_loss: 0.0014
accurracy: 0.9629
precision: 0.9629
recall_score: 0.9629
F1-score: 0.9629
CODEBLEU_score: 0.8136
syntax_match_score: 0.9995
dataflow_match_score: 1.0000
ngram_match_score: 0.6732
weighted_ngram_match_score: 0.5818
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 6/100 took 2.53 minutes
--------------------------------------------------------------------------------
           [learning_rate=3e-05] - Iteration 45/100 - Epoch 5/20 - learning rate 3/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:11<00:00,  1.75it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_45_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 0.0073
val_loss: 0.0004
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8133
syntax_match_score: 0.9996
dataflow_match_score: 1.0000
ngram_match_score: 0.6725
weighted_ngram_match_score: 0.5812
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 7/100 took 2.54 minutes
--------------------------------------------------------------------------------
           [learning_rate=3e-05] - Iteration 46/100 - Epoch 6/20 - learning rate 3/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0072
val_loss: 0.0107
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8134
syntax_match_score: 0.9996
dataflow_match_score: 1.0000
ngram_match_score: 0.6727
weighted_ngram_match_score: 0.5813
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 8/100 took 2.52 minutes
--------------------------------------------------------------------------------
           [learning_rate=3e-05] - Iteration 47/100 - Epoch 7/20 - learning rate 3/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_47_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 0.0060
val_loss: 0.0004
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8133
syntax_match_score: 0.9995
dataflow_match_score: 1.0000
ngram_match_score: 0.6725
weighted_ngram_match_score: 0.5812
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 9/100 took 2.53 minutes
--------------------------------------------------------------------------------
           [learning_rate=3e-05] - Iteration 48/100 - Epoch 8/20 - learning rate 3/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:11<00:00,  1.76it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0063
val_loss: 0.0015
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8133
syntax_match_score: 0.9996
dataflow_match_score: 1.0000
ngram_match_score: 0.6726
weighted_ngram_match_score: 0.5812
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 10/100 took 2.52 minutes
--------------------------------------------------------------------------------
           [learning_rate=3e-05] - Iteration 49/100 - Epoch 9/20 - learning rate 3/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0052
val_loss: 0.0006
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8138
syntax_match_score: 0.9996
dataflow_match_score: 1.0000
ngram_match_score: 0.6735
weighted_ngram_match_score: 0.5820
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 11/100 took 2.52 minutes
--------------------------------------------------------------------------------
           [learning_rate=3e-05] - Iteration 50/100 - Epoch 10/20 - learning rate 3/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0054
val_loss: 0.0024
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8134
syntax_match_score: 0.9996
dataflow_match_score: 1.0000
ngram_match_score: 0.6728
weighted_ngram_match_score: 0.5814
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 12/100 took 2.51 minutes
--------------------------------------------------------------------------------
           [learning_rate=3e-05] - Iteration 51/100 - Epoch 11/20 - learning rate 3/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:11<00:00,  1.75it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0055
val_loss: 0.0008
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8137
syntax_match_score: 0.9996
dataflow_match_score: 1.0000
ngram_match_score: 0.6734
weighted_ngram_match_score: 0.5819
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 13/100 took 2.52 minutes
--------------------------------------------------------------------------------
           [learning_rate=3e-05] - Iteration 52/100 - Epoch 12/20 - learning rate 3/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0053
val_loss: 0.0026
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8133
syntax_match_score: 0.9994
dataflow_match_score: 1.0000
ngram_match_score: 0.6725
weighted_ngram_match_score: 0.5812
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 14/100 took 2.52 minutes
--------------------------------------------------------------------------------
           [learning_rate=3e-05] - Iteration 53/100 - Epoch 13/20 - learning rate 3/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


--> Early stopping: Loss did not decrease in 5 epochs
--------------------------------------------------------------------------------
           [learning_rate=0.0001] - Iteration 61/100 - Epoch 1/20 - learning rate 4/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:12<00:00,  1.73it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_61_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 0.3008
val_loss: 0.0439
accurracy: 0.1011
precision: 0.1011
recall_score: 0.1011
F1-score: 0.1011
CODEBLEU_score: 0.7688
syntax_match_score: 0.9870
dataflow_match_score: 0.8970
ngram_match_score: 0.6388
weighted_ngram_match_score: 0.5525
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 4/100 took 2.56 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0001] - Iteration 62/100 - Epoch 2/20 - learning rate 4/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_62_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 0.0215
val_loss: 0.0003
accurracy: 0.9998
precision: 0.9998
recall_score: 0.9998
F1-score: 0.9998
CODEBLEU_score: 0.8120
syntax_match_score: 0.9992
dataflow_match_score: 0.9993
ngram_match_score: 0.6703
weighted_ngram_match_score: 0.5792
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 5/100 took 2.54 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0001] - Iteration 63/100 - Epoch 3/20 - learning rate 4/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_63_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 0.0072
val_loss: 0.0001
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8136
syntax_match_score: 0.9994
dataflow_match_score: 1.0000
ngram_match_score: 0.6731
weighted_ngram_match_score: 0.5817
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 6/100 took 2.53 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0001] - Iteration 64/100 - Epoch 4/20 - learning rate 4/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0058
val_loss: 0.0002
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8132
syntax_match_score: 0.9996
dataflow_match_score: 1.0000
ngram_match_score: 0.6722
weighted_ngram_match_score: 0.5809
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 7/100 took 2.51 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0001] - Iteration 65/100 - Epoch 5/20 - learning rate 4/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_65_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 0.0052
val_loss: 0.0001
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8135
syntax_match_score: 0.9995
dataflow_match_score: 1.0000
ngram_match_score: 0.6729
weighted_ngram_match_score: 0.5815
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 8/100 took 2.53 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0001] - Iteration 66/100 - Epoch 6/20 - learning rate 4/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0045
val_loss: 0.0001
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8134
syntax_match_score: 0.9993
dataflow_match_score: 1.0000
ngram_match_score: 0.6728
weighted_ngram_match_score: 0.5814
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 9/100 took 2.51 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0001] - Iteration 67/100 - Epoch 7/20 - learning rate 4/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.76it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_67_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 0.0048
val_loss: 0.0001
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8132
syntax_match_score: 0.9994
dataflow_match_score: 1.0000
ngram_match_score: 0.6724
weighted_ngram_match_score: 0.5811
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 10/100 took 2.53 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0001] - Iteration 68/100 - Epoch 8/20 - learning rate 4/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0038
val_loss: 0.0022
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8136
syntax_match_score: 0.9998
dataflow_match_score: 1.0000
ngram_match_score: 0.6729
weighted_ngram_match_score: 0.5815
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 11/100 took 2.52 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0001] - Iteration 69/100 - Epoch 9/20 - learning rate 4/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0042
val_loss: 0.0005
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8132
syntax_match_score: 0.9995
dataflow_match_score: 1.0000
ngram_match_score: 0.6723
weighted_ngram_match_score: 0.5810
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 12/100 took 2.51 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0001] - Iteration 70/100 - Epoch 10/20 - learning rate 4/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:11<00:00,  1.76it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_70_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 0.0044
val_loss: 0.0001
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8131
syntax_match_score: 0.9994
dataflow_match_score: 1.0000
ngram_match_score: 0.6722
weighted_ngram_match_score: 0.5809
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 13/100 took 2.54 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0001] - Iteration 71/100 - Epoch 11/20 - learning rate 4/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.76it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0039
val_loss: 0.0001
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8138
syntax_match_score: 0.9995
dataflow_match_score: 1.0000
ngram_match_score: 0.6737
weighted_ngram_match_score: 0.5822
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 14/100 took 2.52 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0001] - Iteration 72/100 - Epoch 12/20 - learning rate 4/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0041
val_loss: 0.0001
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8133
syntax_match_score: 0.9996
dataflow_match_score: 1.0000
ngram_match_score: 0.6724
weighted_ngram_match_score: 0.5811
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 15/100 took 2.52 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0001] - Iteration 73/100 - Epoch 13/20 - learning rate 4/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0035
val_loss: 0.0001
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8138
syntax_match_score: 0.9995
dataflow_match_score: 1.0000
ngram_match_score: 0.6735
weighted_ngram_match_score: 0.5820
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 16/100 took 2.51 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0001] - Iteration 74/100 - Epoch 14/20 - learning rate 4/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0036
val_loss: 0.0014
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8135
syntax_match_score: 0.9996
dataflow_match_score: 1.0000
ngram_match_score: 0.6729
weighted_ngram_match_score: 0.5815
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 17/100 took 2.52 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0001] - Iteration 75/100 - Epoch 15/20 - learning rate 4/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_75_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 0.0037
val_loss: 0.0001
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8135
syntax_match_score: 0.9996
dataflow_match_score: 1.0000
ngram_match_score: 0.6729
weighted_ngram_match_score: 0.5815
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 18/100 took 2.53 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0001] - Iteration 76/100 - Epoch 16/20 - learning rate 4/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:11<00:00,  1.76it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_76_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 0.0033
val_loss: 0.0001
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8139
syntax_match_score: 0.9997
dataflow_match_score: 1.0000
ngram_match_score: 0.6737
weighted_ngram_match_score: 0.5822
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 19/100 took 2.53 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0001] - Iteration 77/100 - Epoch 17/20 - learning rate 4/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.76it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0035
val_loss: 0.0001
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8135
syntax_match_score: 0.9996
dataflow_match_score: 1.0000
ngram_match_score: 0.6729
weighted_ngram_match_score: 0.5815
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 20/100 took 2.52 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0001] - Iteration 78/100 - Epoch 18/20 - learning rate 4/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0033
val_loss: 0.0001
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8131
syntax_match_score: 0.9995
dataflow_match_score: 1.0000
ngram_match_score: 0.6721
weighted_ngram_match_score: 0.5808
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 21/100 took 2.51 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0001] - Iteration 79/100 - Epoch 19/20 - learning rate 4/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.76it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0033
val_loss: 0.0001
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8136
syntax_match_score: 0.9997
dataflow_match_score: 1.0000
ngram_match_score: 0.6731
weighted_ngram_match_score: 0.5817
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 22/100 took 2.51 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0001] - Iteration 80/100 - Epoch 20/20 - learning rate 4/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_80_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 0.0031
val_loss: 0.0000
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8135
syntax_match_score: 0.9997
dataflow_match_score: 1.0000
ngram_match_score: 0.6729
weighted_ngram_match_score: 0.5815
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 23/100 took 2.54 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0005] - Iteration 81/100 - Epoch 1/20 - learning rate 5/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:12<00:00,  1.72it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_81_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 0.7274
val_loss: 0.0075
accurracy: 0.0495
precision: 0.0495
recall_score: 0.0495
F1-score: 0.0495
CODEBLEU_score: 0.7505
syntax_match_score: 0.9772
dataflow_match_score: 0.8703
ngram_match_score: 0.6188
weighted_ngram_match_score: 0.5358
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 5/100 took 2.57 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0005] - Iteration 82/100 - Epoch 2/20 - learning rate 5/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0572
val_loss: 0.2260
accurracy: 0.9997
precision: 0.9997
recall_score: 0.9997
F1-score: 0.9997
CODEBLEU_score: 0.8112
syntax_match_score: 0.9985
dataflow_match_score: 0.9990
ngram_match_score: 0.6690
weighted_ngram_match_score: 0.5782
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 6/100 took 2.52 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0005] - Iteration 83/100 - Epoch 3/20 - learning rate 5/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_83_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 0.0560
val_loss: 0.0012
accurracy: 0.8335
precision: 0.8335
recall_score: 0.8335
F1-score: 0.8335
CODEBLEU_score: 0.8105
syntax_match_score: 0.9986
dataflow_match_score: 0.9971
ngram_match_score: 0.6684
weighted_ngram_match_score: 0.5777
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 7/100 took 2.54 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0005] - Iteration 84/100 - Epoch 4/20 - learning rate 5/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.76it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0310
val_loss: 0.0023
accurracy: 0.4130
precision: 0.4130
recall_score: 0.4130
F1-score: 0.4130
CODEBLEU_score: 0.8102
syntax_match_score: 0.9987
dataflow_match_score: 0.9947
ngram_match_score: 0.6690
weighted_ngram_match_score: 0.5783
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 8/100 took 2.52 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0005] - Iteration 85/100 - Epoch 5/20 - learning rate 5/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0697
val_loss: 0.0037
accurracy: 0.9057
precision: 0.9057
recall_score: 0.9057
F1-score: 0.9057
CODEBLEU_score: 0.8074
syntax_match_score: 0.9975
dataflow_match_score: 0.9930
ngram_match_score: 0.6646
weighted_ngram_match_score: 0.5745
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 9/100 took 2.52 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0005] - Iteration 86/100 - Epoch 6/20 - learning rate 5/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.76it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_86_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 0.0212
val_loss: 0.0002
accurracy: 0.9998
precision: 0.9998
recall_score: 0.9998
F1-score: 0.9998
CODEBLEU_score: 0.8126
syntax_match_score: 0.9991
dataflow_match_score: 0.9996
ngram_match_score: 0.6713
weighted_ngram_match_score: 0.5802
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 10/100 took 2.54 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0005] - Iteration 87/100 - Epoch 7/20 - learning rate 5/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:11<00:00,  1.76it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_87_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 0.0072
val_loss: 0.0001
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8134
syntax_match_score: 0.9995
dataflow_match_score: 1.0000
ngram_match_score: 0.6727
weighted_ngram_match_score: 0.5813
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 11/100 took 2.54 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0005] - Iteration 88/100 - Epoch 8/20 - learning rate 5/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:11<00:00,  1.75it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0058
val_loss: 0.0017
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8134
syntax_match_score: 0.9996
dataflow_match_score: 1.0000
ngram_match_score: 0.6728
weighted_ngram_match_score: 0.5814
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 12/100 took 2.53 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0005] - Iteration 89/100 - Epoch 9/20 - learning rate 5/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.76it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_89_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 0.0064
val_loss: 0.0001
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8132
syntax_match_score: 0.9993
dataflow_match_score: 1.0000
ngram_match_score: 0.6723
weighted_ngram_match_score: 0.5810
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 13/100 took 2.54 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0005] - Iteration 90/100 - Epoch 10/20 - learning rate 5/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.76it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0038
val_loss: 0.0026
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8135
syntax_match_score: 0.9996
dataflow_match_score: 1.0000
ngram_match_score: 0.6728
weighted_ngram_match_score: 0.5814
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 14/100 took 2.53 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0005] - Iteration 91/100 - Epoch 11/20 - learning rate 5/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_91_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 0.0042
val_loss: 0.0001
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8134
syntax_match_score: 0.9993
dataflow_match_score: 1.0000
ngram_match_score: 0.6727
weighted_ngram_match_score: 0.5814
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 15/100 took 2.53 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0005] - Iteration 92/100 - Epoch 12/20 - learning rate 5/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.76it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0037
val_loss: 0.0022
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8131
syntax_match_score: 0.9994
dataflow_match_score: 1.0000
ngram_match_score: 0.6722
weighted_ngram_match_score: 0.5809
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 16/100 took 2.52 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0005] - Iteration 93/100 - Epoch 13/20 - learning rate 5/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.76it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_93_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 0.0037
val_loss: 0.0001
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8132
syntax_match_score: 0.9995
dataflow_match_score: 1.0000
ngram_match_score: 0.6722
weighted_ngram_match_score: 0.5809
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 17/100 took 2.53 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0005] - Iteration 94/100 - Epoch 14/20 - learning rate 5/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:11<00:00,  1.76it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0039
val_loss: 0.0001
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8133
syntax_match_score: 0.9996
dataflow_match_score: 1.0000
ngram_match_score: 0.6724
weighted_ngram_match_score: 0.5811
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 18/100 took 2.52 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0005] - Iteration 95/100 - Epoch 15/20 - learning rate 5/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.77it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_95_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 0.0039
val_loss: 0.0000
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8135
syntax_match_score: 0.9996
dataflow_match_score: 1.0000
ngram_match_score: 0.6730
weighted_ngram_match_score: 0.5816
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 19/100 took 2.54 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0005] - Iteration 96/100 - Epoch 16/20 - learning rate 5/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.76it/s]


--> Checkpoint saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/check_96_model_codet5_small_pyspark_generator.pt
------- Evaluation Scores -------
accumulated_loss: 0.0034
val_loss: 0.0000
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8134
syntax_match_score: 0.9996
dataflow_match_score: 1.0000
ngram_match_score: 0.6726
weighted_ngram_match_score: 0.5812
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 20/100 took 2.54 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0005] - Iteration 97/100 - Epoch 17/20 - learning rate 5/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.76it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0034
val_loss: 0.0026
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8135
syntax_match_score: 0.9995
dataflow_match_score: 1.0000
ngram_match_score: 0.6729
weighted_ngram_match_score: 0.5815
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 21/100 took 2.52 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0005] - Iteration 98/100 - Epoch 18/20 - learning rate 5/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.76it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0034
val_loss: 0.0036
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8135
syntax_match_score: 0.9997
dataflow_match_score: 1.0000
ngram_match_score: 0.6730
weighted_ngram_match_score: 0.5816
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 22/100 took 2.52 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0005] - Iteration 99/100 - Epoch 19/20 - learning rate 5/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.76it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0032
val_loss: 0.0000
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8133
syntax_match_score: 0.9994
dataflow_match_score: 1.0000
ngram_match_score: 0.6726
weighted_ngram_match_score: 0.5812
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 23/100 took 2.53 minutes
--------------------------------------------------------------------------------
           [learning_rate=0.0005] - Iteration 100/100 - Epoch 20/20 - learning rate 5/5
--------------------------------------------------------------------------------


100%|██████████| 125/125 [01:10<00:00,  1.76it/s]


------- Evaluation Scores -------
accumulated_loss: 0.0031
val_loss: 0.0036
accurracy: 0.9999
precision: 0.9999
recall_score: 0.9999
F1-score: 0.9999
CODEBLEU_score: 0.8136
syntax_match_score: 0.9995
dataflow_match_score: 1.0000
ngram_match_score: 0.6731
weighted_ngram_match_score: 0.5817
--> Latest Metrics saved : /content/drive/MyDrive/Colab_Notebooks/LJMU_ETL_PYSPARK/SAVE_FILES_20K/CODET5_SMALL/model_codet5_small_scores.csv
--> Iteration 24/100 took 2.52 minutes


<br>
<br>

---

# <b>NOTE</b>:
---

  - Since This Notebook Needs Lots of Compute time/GPU, the notebook is limited to training the model only
  - Checkpoints and necesary files are saved/copied to Google Drive
  - Further Model Evaluation/Inference/Assessment will be done in a different notebook

<br>
<br>
