# T5 Baseline

The initial exploration will use T5-small as the pre-training model along with ICSI dataset. When the model is ready, we will expand the dataset and also validation set for other hyperparameter tuning.

1. Library Loading  
2. Dataset Loading
3.   Dataset Transformation
4.   Training and Test Splitting
5.   Fine Tuning
6.   Checkpoint saving
7.   Evaluation



## Library Loading

In [None]:
!pip install transformers -q
!pip install wandb -q
#!pip install datasets
!pip install nlp
#!pip install rouge_score
!pip install rouge
#!curl -q https://raw.githubusercontent.com/pytorch/xla/master/contrib/scripts/env-setup.py -o pytorch-xla-env-setup.py
#!python pytorch-xla-env-setup.py --apt-packages libomp5 libopenblas-dev



In [None]:
import numpy as np
import pandas as pd
import torch
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader, RandomSampler, SequentialSampler
import time

# Importing the T5 modules from huggingface/transformers
# T5ForConditionalGeneration is specific for sequence-to-sequence
from transformers import T5Tokenizer, T5ForConditionalGeneration

#from nlp import load_metric
import nlp
from rouge import Rouge

import wandb

In [None]:
# Checking out the GPU we have access to. This is output is from the google colab version. 
!nvidia-smi

Wed Nov 25 01:31:57 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.38       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   40C    P0    26W / 250W |     10MiB / 16280MiB |      0%      Default |
|                               |                      |                 ERR! |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
# # Setting up the device for GPU usage
from torch import cuda
device = 'cuda' if cuda.is_available() else 'cpu'

In [None]:
!wandb login


[34m[1mwandb[0m: Currently logged in as: [33mwuqq09[0m (use `wandb login --relogin` to force relogin)


## Data Loading

Loaded from GDrive the transformed dataset.

This portion is using the dataset from extractive summary to abstractive summary

In [None]:
SEED = 42
torch.manual_seed(SEED)
np.random.seed(SEED)
torch.backends.cudnn.deterministic = True

train_size = 0.8

In [None]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

#/content/drive/My Drive/W266/data/ICSI_extrac_abstrac_512token.csv

Mounted at /content/drive


In [None]:
#df = pd.read_csv('/content/drive/My Drive/W266/512_tokens/ICSI_extrac_abstrac_512token.csv',encoding='latin-1')
#df = df[df['extractive'].notna()][['abstractive','extractive']]
train_dataset = pd.read_csv('/content/drive/My Drive/W266/data/1024_tokens/ICSI_1024_train.csv',encoding='latin-1')
dev_dataset = pd.read_csv('/content/drive/My Drive/W266/data/1024_tokens/ICSI_1024_dev.csv',encoding='latin-1')
test_dataset = pd.read_csv('/content/drive/My Drive/W266/data/1024_tokens/ICSI_1024_test.csv',encoding='latin-1')

#train_dataset = train_dataset[train_dataset.abstractive.notna()]
#dev_dataset = dev_dataset[dev_dataset.abstractive.notna()]
#test_dataset = test_dataset[test_dataset.abstractive.notna()]

train_dataset = train_dataset.dropna(subset=['abstractive'])
train_dataset = train_dataset.reset_index(drop=True)

dev_dataset = dev_dataset.dropna(subset=['abstractive'])
dev_dataset = dev_dataset.reset_index(drop=True)

test_dataset = test_dataset.dropna(subset=['abstractive'])
test_dataset = test_dataset.reset_index(drop=True)

# use the pre-defined "summarize" for abstractive summary
train_dataset.original = 'summarize: ' + train_dataset.original
dev_dataset.original = 'summarize: ' + dev_dataset.original
test_dataset.original = 'summarize: ' + test_dataset.original
print(train_dataset.head(1))
print(len(train_dataset))
print(len(dev_dataset))
print(len(test_dataset))


    meeting  ...                                        abstractive
0  Bdb001.C  ...  On the one hand, a bespoke XML structure that ...

[1 rows x 4 columns]
222
213
40


In [None]:
train_dataset.head(5)

Unnamed: 0,meeting,original,extractive,abstractive
0,Bdb001.C,"summarize: Yeah , we had a long discussion abo...","I mean , we I sort of already have developed a...","On the one hand, a bespoke XML structure that ..."
1,Bdb001.C,summarize: You 're gonna actually run out of s...,Because you have a two - gigabyte limit on mos...,Phone-level analysis can be included in the sa...
2,Bdb001.C,"summarize: Um , th what would would would what...","Um , th what would would would what would worr...","Its advantages are that it is easier to read, ..."
3,Bdb001.C,summarize: But that 's the advantage of ATLAS ...,I guess I 'm just a little hesitant to try to ...,XML standards offer libraries that can be used...
4,Bdb001.F,"summarize: Oh , that 's good . Cuz we have a ...",and the main thing that I was gonna ask people...,Two main options were discussed as to the orga...


In [None]:
#train_dataset=df.sample(frac=train_size,random_state = SEED)
#test_dataset=df.drop(train_dataset.index).reset_index(drop=True)
#train_dataset = train_dataset.reset_index(drop=True)
#print("FULL Dataset: {}".format(df.shape))

print("TRAIN Dataset: {}".format(train_dataset.shape))
print("DEV Dataset: {}".format(dev_dataset.shape))
print("TEST Dataset: {}".format(test_dataset.shape))

TRAIN Dataset: (222, 4)
DEV Dataset: (213, 4)
TEST Dataset: (40, 4)


## Dataset Transformation

Tokenize the input and also perform the attention masking to make sure everything can be done in tensors. 

Tunable Hyprparam:

*   MAX_LEN
*   SUMMARY_LEN
* TRAIN_BATCH_SIZE
* DEV_BATCH_SIZE
* TEST_BATCH_SIZE


In [None]:
# most code from https://colab.research.google.com/drive/1ypT7oCjtBOTSMJv7J5_1vO7hDYSD_-oU?authuser=2#scrollTo=932p8NhxeNw4

class CustomDataset(Dataset):

    def __init__(self, dataframe, tokenizer, source_len, summ_len):
        self.tokenizer = tokenizer
        self.data = dataframe
        self.source_len = source_len
        self.summ_len = summ_len
        self.abstractive = self.data.abstractive
        self.original = self.data.original

    def __len__(self):
        return len(self.abstractive)

    def __getitem__(self, index):
        original = str(self.original[index])
        original = ' '.join(original.split())

        abstractive = str(self.abstractive[index])
        abstractive = ' '.join(abstractive.split())

        source = self.tokenizer.batch_encode_plus([original], max_length= self.source_len, pad_to_max_length=True,return_tensors='pt')
        target = self.tokenizer.batch_encode_plus([abstractive], max_length= self.summ_len, pad_to_max_length=True,return_tensors='pt')
        source_ids = source['input_ids'].squeeze()
        source_mask = source['attention_mask'].squeeze()
        target_ids = target['input_ids'].squeeze()
        target_mask = target['attention_mask'].squeeze()

        return {
            'source_ids': source_ids.to(dtype=torch.long), 
            'source_mask': source_mask.to(dtype=torch.long), 
            'target_ids': target_ids.to(dtype=torch.long),
            'target_ids_y': target_ids.to(dtype=torch.long)
        }

In [None]:
### Training Dataset and Test Dataset 

# TRAIN Dataset: (1231, 4)
# DEV Dataset: (744, 4)
# TEST Dataset: (165, 4)

MAX_LEN = 1024
SUMMARY_LEN= 150

# note here only uses the t5-small model.
tokenizer = T5Tokenizer.from_pretrained("t5-base")
train_set = CustomDataset(train_dataset, tokenizer, MAX_LEN, SUMMARY_LEN)
dev_set = CustomDataset(dev_dataset, tokenizer, MAX_LEN, SUMMARY_LEN)
test_set = CustomDataset(test_dataset, tokenizer, MAX_LEN, SUMMARY_LEN)

In [None]:
# double checking the result size, only for one point
# https://stackoverflow.com/questions/43627405/understanding-getitem-method
print(train_set[0]['source_ids'].shape)
print(train_set[0]['source_mask'].shape)
print(train_set[0]['target_ids'].shape)
print(train_set[0]['target_ids_y'].shape)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


torch.Size([1024])
torch.Size([1024])
torch.Size([150])
torch.Size([150])




## Fine Tuning

Here we directly use the pre-trained model t5-small and will save checkpoint every 500 steps. 

Tunable Parameter:
* T5ForConditionalGeneration or T5
* epoch - train, dev, test
* optimizer - LEARNING_RATE, Adam
* output: num_beams, length_penalty,early_stopping




### Training & Validation Functions

The training part uses the t5-small pretrained model, didn't make any change to the model layer structures, and fine tune the parameters based on the dataset we have.

In [None]:
losslist = []
def train(epoch, tokenizer, model, device, loader, optimizer):
  # put into train mode 
  model.train()
  # enumerate the dataloader for training set into the defined network
  for _,data in enumerate(loader, 0):
      y = data['target_ids'].to(device, dtype = torch.long)
      # https://discuss.pytorch.org/t/contigious-vs-non-contigious-tensor/30107/2
      y_ids = y[:, :-1].contiguous()
      lm_labels = y[:, 1:].clone().detach()
      lm_labels[y[:, 1:] == tokenizer.pad_token_id] = -100
      ids = data['source_ids'].to(device, dtype = torch.long)
      mask = data['source_mask'].to(device, dtype = torch.long)

      outputs = model(input_ids = ids, attention_mask = mask, decoder_input_ids=y_ids, lm_labels=lm_labels)
      loss = outputs[0]
      losslist.append(loss)
      if _%10==0:
        wandb.log({"Training Loss": loss.item()})
      if _%500==0:
        print(f'Epoch: {epoch}, Loss:  {loss.item()}')
      
      # https://stackoverflow.com/questions/48001598/why-do-we-need-to-call-zero-grad-in-pytorch
      # https://discuss.pytorch.org/t/how-are-optimizer-step-and-loss-backward-related/7350
      optimizer.zero_grad()
      loss.backward()
      optimizer.step()


In [None]:
# https://towardsdatascience.com/fine-tuning-a-t5-transformer-for-any-summarization-task-82334c64c81

def dev(epoch, tokenizer, model, device, loader):
  #https://stackoverflow.com/questions/60018578/what-does-model-eval-do-in-pytorch
  model.eval()
  predictions = []
  actuals = []
  #rouge_metric = load_metric('rouge') 
  # https://datascience.stackexchange.com/questions/32651/what-is-the-use-of-torch-no-grad-in-pytorch
  with torch.no_grad():

    for _, data in enumerate(loader, 0):

      y = data['target_ids'].to(device, dtype = torch.long)
      ids = data['source_ids'].to(device, dtype = torch.long)
      mask = data['source_mask'].to(device, dtype = torch.long)

      generated_ids = model.generate(
          input_ids = ids,
          attention_mask = mask, 
          max_length=150, 
          num_beams=9,
          repetition_penalty=2.5, 
          length_penalty=1.0, 
          early_stopping=True
          )
      preds = [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True) for g in generated_ids]
      target = [tokenizer.decode(t, skip_special_tokens=True, clean_up_tokenization_spaces=True)for t in y]
      if _%100==0:
          print(f'Completed {_}')
      predictions.extend(preds)
      actuals.extend(target)
      #print(preds)
      #print(target)
      #rouge_metric.add(preds, target)
      
    #rouge_results = rouge_metric.compute(rouge_types=["rouge2"]) 
  return predictions, actuals

In [None]:
def compute_rouge_scores(cand_list, ref_list):
    """
    :param cand_list: list of candidate summaries
    :param ref_list: list of reference summaries
    :return: rouge scores
    """
    rouge = Rouge()
    rouge_1_f_score = 0.
    rouge_2_f_score = 0.
    rouge_L_f_score = 0.

    rouge_1_r_score = 0.
    rouge_2_r_score = 0.
    rouge_L_r_score = 0.

    rouge_1_p_score = 0.
    rouge_2_p_score = 0.
    rouge_L_p_score = 0.

    doc_count = len(cand_list)

    for cand, ref in zip(cand_list, ref_list):
        rouge_scores = rouge.get_scores(cand, ref)[0]
        rouge_1_f_score += rouge_scores['rouge-1']['f']
        rouge_2_f_score += rouge_scores['rouge-2']['f']
        rouge_L_f_score += rouge_scores['rouge-l']['f']

        rouge_1_r_score += rouge_scores['rouge-1']['r']
        rouge_2_r_score += rouge_scores['rouge-2']['r']
        rouge_L_r_score += rouge_scores['rouge-l']['r']

        rouge_1_p_score += rouge_scores['rouge-1']['p']
        rouge_2_p_score += rouge_scores['rouge-2']['p']
        rouge_L_p_score += rouge_scores['rouge-l']['p']
    rouge_1_f_score = rouge_1_f_score / doc_count
    rouge_2_f_score = rouge_2_f_score / doc_count
    rouge_L_f_score = rouge_L_f_score / doc_count

    results_dict = {}
    results_dict['rouge_1_f_score'] = rouge_1_f_score
    results_dict['rouge_2_f_score'] = rouge_2_f_score
    results_dict['rouge_l_f_score'] = rouge_L_f_score

    return results_dict

In [None]:
AMI_PATH = "/content/drive/My Drive/W266/data/gold_abstractive_summary/goldsummary_AMI_as_dev.csv"
ICSI_PATH = "/content/drive/My Drive/W266/data/gold_abstractive_summary/goldsummary_ICSI_as_dev.csv"

amigold = pd.read_csv(AMI_PATH)
icsigold = pd.read_csv(ICSI_PATH)

def rouge_per_document(final_df,gold):
  merged_df = pd.concat([dev_dataset.meeting, final_df.Generated_Abstractive_Summary], axis=1)
  merged_df["meetinglevel"] = merged_df.meeting.apply(lambda x: x.split(".")[0]) 

  gas_list =[]
  meeting_list = []
  generated_abstractive = ""
  for me in set(merged_df.meetinglevel):
    for gas in merged_df[merged_df.meetinglevel == me]['Generated_Abstractive_Summary']:
      generated_abstractive+= gas + " "
    gas_list.append(generated_abstractive)
    meeting_list.append(me)
    generated_abstractive = " "
  per_doc_summary = pd.DataFrame(
    {'Meeting': meeting_list,
     'Generated_Abstractive_Summary': gas_list
    })
  
  new_df = pd.merge(per_doc_summary, gold,  how='left', left_on='Meeting', right_on ='meeting')
  rouge_results_perdoc = compute_rouge_scores(new_df.Generated_Abstractive_Summary,
                                      new_df.abstractive)
  return rouge_results_perdoc
  

### Run Epoch
Train and Evaluation

In [None]:
id = wandb.util.generate_id()
id
#dwlkfpg3 AMI 1024
#1aei9r6r ICSI 1024
#3ugok7an ICSI 512
#30e6cuxp AMI 512
#3fsv41il ICSI 1024 Cleaned 
#2knqed4a AMI 1024 Cleaned 
#3le4t5oh AMI 1024 Cleaned t5-base
#3oar0l9l ICSI 1024 t5-base


'2zpb09p1'

In [None]:
#run = wandb.init(project="T5_1024_MSFT_AMI_01",resume=True)
run = wandb.init(project="T5_1024_MSFT_ICSI_01", id="3oar0l9l", resume="allow")

config = wandb.config          # Initialize config
config.TRAIN_BATCH_SIZE = 1    # input batch size for training (default: 64)
config.VALID_BATCH_SIZE = 1    # input batch size for testing (default: 1000)
config.EPOCHS = 50        # number of epochs to train (default: 10)
config.LEARNING_RATE = 0.0005   # learning rate (default: 0.01)
config.SEED = 42               # random seed (default: 42)
config.BEAMS = 9
config.MAX_LEN = MAX_LEN
config.SUMMARY_LEN = SUMMARY_LEN 


[34m[1mwandb[0m: Currently logged in as: [33mwuqq09[0m (use `wandb login --relogin` to force relogin)


In [None]:
# https://deeplizard.com/learn/video/kWVgvsejXsE#:~:text=The%20num_workers%20attribute%20tells%20the,sequentially%20inside%20the%20main%20process
# num_workers to default 0
# This means that the training process will work sequentially inside the main process. 
# After a batch is used during the training process and another one is needed, we read the batch data from disk.

TEST_BATCH_SIZE = 1 

train_params = {
  'batch_size': config.TRAIN_BATCH_SIZE,
  'shuffle': True,
  'num_workers': 0
  }

dev_params = {
  'batch_size': config.VALID_BATCH_SIZE,
  'shuffle': False,
  'num_workers': 0
  }

test_params = {
  'batch_size': TEST_BATCH_SIZE,
  'shuffle': False,
  'num_workers': 0
  }

training_loader = DataLoader(train_set, **train_params)
dev_loader = DataLoader(dev_set, **dev_params)
test_loader = DataLoader(test_set, **test_params)

In [None]:
model = T5ForConditionalGeneration.from_pretrained("t5-base")
model = model.to(device)

In [None]:
# optimizer 
# https://pytorch.org/docs/stable/optim.html
optimizer = torch.optim.Adam(params = model.parameters(), lr=config.LEARNING_RATE)

In [None]:
# CP_TEMP_NAME = 'epoch10'
# CP_PATH = "/content/drive/My Drive/W266/checkpoints/MSFT_50EPOCH_Intransit_AMI512_NoNA/" + CP_TEMP_NAME +".pt"
# checkpoint = torch.load(CP_PATH)
# model.load_state_dict(checkpoint['model_state_dict'])
# optimizer.load_state_dict(checkpoint['optimizer_state_dict'])

In [None]:
training_time_log = []
MODEL_NAME = "T5_1024_MSFT_ICSI_base"
start_train_time = time.time()
wandb.watch(model, log='all')


print("starting fine-tuning with training and validation")
i = 0
for epoch in range(config.EPOCHS):

  ## ================= Training =================== ##
  print("start training epoch" + str(i))
  CP_TEMP_NAME = 'epoch' + str(i)
  CP_TEMP_PATH = "/content/drive/My Drive/W266/checkpoints/MSFT_50EPOCH_Intransit_ICSI1024_Based/"+ CP_TEMP_NAME +".pt"
  train(epoch, tokenizer, model, device, training_loader, optimizer)
  torch.save({
      'model_state_dict': model.state_dict(),
      'optimizer_state_dict': optimizer.state_dict(),
      'train_epoch': i
      }, CP_TEMP_PATH)
  training_time = time.time() - start_train_time
  print("done training epoch" +str(i))
  wandb.log({'epoch_traingTime': training_time,
             'epoch': i})
  print("--- %s seconds ---" % (training_time))
  training_time_log.append(training_time)
  i+=1
  ## ================= Validation =================== ##
  # print("strat validation epoch" + str(i))
  # predictions, actuals = dev(epoch, tokenizer, model, device, dev_loader)
  # final_df = pd.DataFrame({'Generated_Abstractive_Summary':predictions,
  #                           'Golden_Abstractive_Text':actuals})
  # final_df.to_csv('/content/drive/My Drive/W266/results/'+MODEL_NAME + "_epoch" +str(i)+'.csv')
  # print("done validation epoch" +str(i))

  # rouge_results = compute_rouge_scores(final_df.Generated_Abstractive_Summary,
  #                                      final_df.Golden_Abstractive_Text)
  
  # wandb.log({'rouge1': rouge_results.get("rouge_1_f_score"), 
  #            'rougeL': rouge_results.get("rouge_l_f_score"),  
  #            'rouge2': rouge_results.get("rouge_2_f_score"),
  #            'epoch': i})
  # i+=1

#run.finish()
  


starting fine-tuning with training and validation
start training epoch0




Epoch: 0, Loss:  7.927191734313965
done training epoch0
--- 100.13376355171204 seconds ---
start training epoch1
Epoch: 1, Loss:  1.8690929412841797
done training epoch1
--- 189.97118210792542 seconds ---
start training epoch2
Epoch: 2, Loss:  0.527834415435791
done training epoch2
--- 273.95350551605225 seconds ---
start training epoch3
Epoch: 3, Loss:  0.35699981451034546
done training epoch3
--- 359.2053573131561 seconds ---
start training epoch4
Epoch: 4, Loss:  0.09226126968860626
done training epoch4
--- 455.89185190200806 seconds ---
start training epoch5
Epoch: 5, Loss:  0.08761376142501831
done training epoch5
--- 540.7267656326294 seconds ---
start training epoch6
Epoch: 6, Loss:  0.07429489493370056
done training epoch6
--- 624.6981515884399 seconds ---
start training epoch7
Epoch: 7, Loss:  0.05986139178276062
done training epoch7
--- 709.078638792038 seconds ---
start training epoch8
Epoch: 8, Loss:  0.22455963492393494
done training epoch8
--- 796.7590210437775 seconds --

In [None]:
validation_time_log = []
MODEL_NAME = "T5_1024_MSFT_ICSI_base"
start_validation_time = time.time()

print("starting fine-tuning with training and validation")
i = 0
for epoch in range(config.EPOCHS):

  ## ================= Validation =================== ##
  print("strat validation epoch" + str(i))
  model = T5ForConditionalGeneration.from_pretrained("t5-base")
  model = model.to(device)
  # optimizer 
  # https://pytorch.org/docs/stable/optim.html
  optimizer = torch.optim.Adam(params = model.parameters(), lr=config.LEARNING_RATE)

  CP_TEMP_NAME = 'epoch' + str(i)
  CP_PATH = "/content/drive/My Drive/W266/checkpoints/MSFT_50EPOCH_Intransit_ICSI1024_Based/" + CP_TEMP_NAME +".pt"
  checkpoint = torch.load(CP_PATH)
  model.load_state_dict(checkpoint['model_state_dict'])
  optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
  wandb.watch(model, log='all')

  predictions, actuals = dev(epoch, tokenizer, model, device, dev_loader)
  final_df = pd.DataFrame({'Generated_Abstractive_Summary':predictions,
                            'Golden_Abstractive_Text':actuals})
  final_df.to_csv('/content/drive/My Drive/W266/results/'+MODEL_NAME + "_epoch" +str(i)+'.csv')
  print("done validation epoch" +str(i))

  rouge_results = compute_rouge_scores(final_df.Generated_Abstractive_Summary,
                                       final_df.Golden_Abstractive_Text)
  
  validation_time = time.time() - start_validation_time
  validation_time_log.append(validation_time)

  # amigold = pd.read_csv(AMI_PATH)
  # icsigold = pd.read_csv(ICSI_PATH)

  rouge_results_perdoc = rouge_per_document(final_df,icsigold)
  wandb.log({'rouge1': rouge_results.get("rouge_1_f_score"), 
            'rougeL': rouge_results.get("rouge_l_f_score"),  
            'rouge2': rouge_results.get("rouge_2_f_score"),
            'rouge1_doclevel': rouge_results_perdoc.get("rouge_1_f_score"), 
            'rougeL_doclevel': rouge_results_perdoc.get("rouge_l_f_score"),  
            'rouge2_doclevel': rouge_results_perdoc.get("rouge_2_f_score"),
            'epoch_validationTime': validation_time,
            'epoch': i})
  i+=1

run.finish()
  


starting fine-tuning with training and validation
strat validation epoch0




Completed 0
Completed 100
Completed 200
done validation epoch0
strat validation epoch1
Completed 0
Completed 100
Completed 200
done validation epoch1
strat validation epoch2
Completed 0
Completed 100
Completed 200
done validation epoch2
strat validation epoch3
Completed 0
Completed 100
Completed 200
done validation epoch3
strat validation epoch4
Completed 0
Completed 100
Completed 200
done validation epoch4
strat validation epoch5
Completed 0
Completed 100
Completed 200
done validation epoch5
strat validation epoch6
Completed 0
Completed 100
Completed 200
done validation epoch6
strat validation epoch7
Completed 0
Completed 100
Completed 200
done validation epoch7
strat validation epoch8
Completed 0
Completed 100
Completed 200
done validation epoch8
strat validation epoch9
Completed 0
Completed 100
Completed 200
done validation epoch9
strat validation epoch10
Completed 0
Completed 100
Completed 200
done validation epoch10
strat validation epoch11
Completed 0
Completed 100
Completed 200


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
Training Loss,0.08014
_timestamp,1606287591.0
_runtime,24267.0
_step,1249.0
epoch,49.0
epoch_traingTime,4446.34666
rouge1,0.156
rougeL,0.12162
rouge2,0.0075
rouge1_doclevel,0.18051


0,1
rouge1,▂▂▅▅▅▅▄▆▆▅▂▅▄▄▅▃▅▆▄▄▆▆▄▇▅▆▇▇▅██▇█▇▇▇▇█▁▇
rougeL,▁▂▅▅▅▅▄▇▆▅▂▅▅▅▅▃▅▇▄▄▇▇▄▇▄███▃████▇███▇▁█
rouge2,▃▂▄▇▆▄▄▅▄▅▃▇▅▄▄▂▄▄▅▄▄▅▄▃▆█▃▃▂▆▆▅▆▆▃▇▃▃▁▄
rouge1_doclevel,▄▅▇▆▆█▇▇█▇▄▇▇▇█▇▇▇▇▇▇▇▇▅▄▃▄▄▂▄▄▅▄▅▄▄▄▄▁▅
rougeL_doclevel,▆▇▇▇▆▇▇██▇▆▇▇▇▇▆▇▇█▇▇▇█▅▂▄▄▄▂▄▄▅▄▄▄▅▄▄▁▄
rouge2_doclevel,▄▅▅▆▆▆▆▇▇▇▄██▇▇▄▇▇▇▇▆▆█▄▄▄▂▂▁▄▄▅▄▄▂▄▂▂▁▃
epoch_validationTime,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇███
epoch,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇███
_step,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇███
_runtime,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇███


#### Checkpoint 

Remember to change the CP_NAME to a new model pt name.

The model is then saved as checkpoints to Google Drive with the related tunable parameters.

In [None]:
# https://pytorch.org/tutorials/recipes/recipes/saving_and_loading_a_general_checkpoint.html
# Checkpoint Saving
CP_NAME = MODEL_NAME

CP_TRAIN_EPOCHS = TRAIN_EPOCHS
CP_DEV_EPOCHS = DEV_EPOCHS
CP_LEARNING_RATE = LEARNING_RATE
CP_PATH = "/content/drive/My Drive/W266/checkpoints/"+ CP_NAME +".pt"
CP_MAX_LEN = MAX_LEN
CP_SUMMARY_LEN = SUMMARY_LEN
CP_TRAIN_BATCH_SIZE = TRAIN_BATCH_SIZE
CP_DEV_BATCH_SIZE = DEV_BATCH_SIZE
CP_MODEL = 'T5ForConditionalGeneration,t5-small'
CP_OPTIMIZER_OPTION = 'Adam'
CP_LOSSLIST = losslist
CP_TEST_OPTIONS = {
    "num_beams":          12,
    "repetition_penalty": 2.5, 
    "length_penalty":     1.0, 
    "early_stopping":     True
}
CT_TRAIN_TIME = training_time
#CT_EVALUATE_TIME = evaluating_time

torch.save({
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            'train_epoch': CP_TRAIN_EPOCHS,
            'dev_epoch': CP_DEV_EPOCHS,
            'learning_rate': CP_LEARNING_RATE,
            'max_source_length':CP_MAX_LEN,
            'max_target_length':CP_SUMMARY_LEN,
            'train_batch_size':CP_TRAIN_BATCH_SIZE,
            'dev_batch_size':CP_DEV_BATCH_SIZE,
            'model_option':CP_MODEL,
            'optimizer_option':CP_OPTIMIZER_OPTION,
            'losslist': CP_LOSSLIST,
            'training_time': CT_TRAIN_TIME,
            #'evaluating_time': CT_EVALUATE_TIME,
            'test_option': CP_TEST_OPTIONS
            }, CP_PATH)

In [None]:
MODEL_NAME = "epoch61"
CP_PATH = "/content/drive/My Drive/W266/checkpoints/MSFT_50EPOCH_Intransit_AMI1024_NoNA/" + MODEL_NAME +".pt"
print(CP_PATH)
checkpoint = torch.load(CP_PATH)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])

# train_epoch = checkpoint['train_epoch']
# dev_epoch = checkpoint['dev_epoch']
# losslist = checkpoint['losslist']
# learning_rate = checkpoint['learning_rate']
# max_source_length = checkpoint['max_source_length']
# max_target_length = checkpoint['max_target_length']
# train_batch_size = checkpoint['train_batch_size']
# dev_batch_size = checkpoint['dev_batch_size']
# optimizer_option = checkpoint['optimizer_option']
# test_option = checkpoint['test_option']
# training_time = checkpoint['training_time']


# evaluating_time = checkpoint['evaluating_time']

/content/drive/My Drive/W266/checkpoints/MSFT_50EPOCH_Intransit_AMI1024_NoNA/epoch61.pt
