## Move to GPU mode if you are in Google Colab
Go to `Runtime` -> `Change runtime type` to activate GPU.

 The following Python libraries are required for this part, and have been tested on Python 3.9 and Python 3.7.
 If you use Google Colab, PyTorch and SciPy are already installed, so you probably just want to install PyTorch Lightning.
  - [PyTorch](https://pytorch.org/get-started/locally/) (tested with 1.10)
  - [PyTorch Lightning](https://pypi.org/project/pytorch-lightning/) (tested with 1.5.8)
  - [SciPy](https://scipy.org/install/) (tested with 1.7.3 and with 1.4.1)


In [None]:
# Download dataset
#!pip install gdown
#!gdown --id 1-FwYkKmml5pMgpfKM_Sz_O1JqDW12QSe -O sst2.zip
#!mkdir data
#!unzip sst2.zip -d .

In [1]:
# You may prefer to upload the data to your google drive and mount your google drive to this colab, 
# because the data will be erased if you stop using this colab for a while.
# Uncomment the code below to do so. After mounting, navigate to the appropriate folder, right click, and "copy path".
# Assign DATA_DIR global variable to that path.
# Remember to copy data files to the google drive folder if you decide to use set `DATA_DIR` as a google-drive folder.

from google.colab import drive
drive.mount('/content/drive', force_remount=True)

# DATA_DIR = "./data"
DATA_DIR = "./drive/MyDrive/CMSC35100/final_project_data"  #  If you have mounted want to use the google-drive folder; modify it as appropriate

Mounted at /content/drive


In [2]:
!ls $DATA_DIR

cmv.tar.bz2


## Load Data

In [3]:
#####
# This script is heavily based on https://vene.ro/blog/winning-arguments-attitude-change-reddit-cmv.html
#####


# load the data
import tarfile
import os.path
import json
import re
from bz2 import BZ2File
from urllib import request
from io import BytesIO

import numpy as np


fname = "cmv.tar.bz2"
# url = "https://chenhaot.com/data/cmv/" + fname

# download if not exists
# if not os.path.isfile(fname):
#     f = BytesIO()
#     with request.urlopen(url) as resp, open(fname, 'wb') as f_disk:
#         data = resp.read()
#         f_disk.write(data)  # save to disk too
#         f.write(data)
#         f.seek(0)
# else:
f = open(os.path.join(DATA_DIR, fname), 'rb')


tar = tarfile.open(fileobj=f, mode="r")

# Extract the file we are interested in

train_fname = "op_task/train_op_data.jsonlist.bz2"
test_fname = "op_task/heldout_op_data.jsonlist.bz2"

#all_train_fname = "all/train_period_data.jsonlist.bz2"
#all_test_fname = "all/heldout_period_data.jsonlist.bz2"

#pair_train_fname = "pair_task/train_pair_data.jsonlist.bz2"
#pair_test_fname = "pair_task/heldout_pair_data.jsonlist.bz2"

train_bzlist = tar.extractfile(train_fname)
#all_train_bzlist = tar.extractfile(all_train_fname)
#pair_train_bzlist = tar.extractfile(pair_train_fname)

# Deserialize the JSON list
original_posts_train = [
    json.loads(line.decode('utf-8'))
    for line in BZ2File(train_bzlist)
]
# all_train = [
#     json.loads(line.decode('utf-8'))
#     for line in BZ2File(all_train_bzlist)
# ]
# pair_train = [
#     json.loads(line.decode('utf-8'))
#     for line in BZ2File(pair_train_bzlist)
# ]
test_bzlist = tar.extractfile(test_fname)
# all_test_bzlist = tar.extractfile(all_test_fname)
# pair_test_bzlist = tar.extractfile(pair_test_fname)

original_posts_test = [
    json.loads(line.decode('utf-8'))
    for line in BZ2File(test_bzlist)
]
# all_test = [
#     json.loads(line.decode('utf-8'))
#     for line in BZ2File(all_test_bzlist)
# ]
# pair_test = [
#     json.loads(line.decode('utf-8'))
#     for line in BZ2File(pair_test_bzlist)
# ]
f.close()

In [4]:
print(f'{len(original_posts_train)} + {len(original_posts_test)} = {len(original_posts_train + original_posts_test)}')
# print(f'{len(all_train)} + {len(all_test)} = {len(all_train + all_test)}')
# print(f'{len(pair_train)} + {len(pair_test)} = {len(pair_train + pair_test)}')

10743 + 1529 = 12272


## Pytorch Lightning Module
The next cell is the same as A2. You only need to implment the LSTM model if you simply want to build the model.
However, it may be useful for you to understand the next cell to truly understand how pytorch-lightning works and get ready for your own project.

In [5]:
# you only need to install the packages if you have not already. On Google Colab you need to reinstall these every time.
!pip install pytorch-lightning=="1.5.8"



In [6]:
import logging
logging.basicConfig(
        format="%(asctime)s - %(levelname)s - %(name)s -   %(message)s",
        datefmt="%m/%d/%Y %H:%M:%S",
        level=logging.INFO
    )
import numpy as np
import scipy
import torch
import torch.nn.functional as F
from torch.utils.data.dataset import Dataset
import argparse
import os
from pathlib import Path
from torch.optim import SGD, Adam
import pytorch_lightning as pl
from torchmetrics import Accuracy
from datetime import datetime 
from pathlib import Path
from pytorch_lightning import loggers as pl_loggers
import time
from argparse import Namespace
import json
import shutil
logger = logging.getLogger(__name__)

class BaseModel(pl.LightningModule):
    def __init__(
        self,
        **config_kwargs
    ):
        """Initialize a model, tokenizer and config."""
        logger.info("Initilazing BaseModel")
        super().__init__()
        self.save_hyperparameters() #save hyperparameters to checkpoint
        self.step_count = 0
        self.output_dir = Path(self.hparams.output_dir)
        self.model = self._load_model()

        self.accuracy = Accuracy()

    def _load_model(self):
        raise NotImplementedError

    def forward(self, **inputs):
        return self.model(**inputs)

    def batch2input(self, batch):
        raise NotImplementedError

    def training_step(self, batch, batch_idx):
        input = self.batch2input(batch)
        labels = input['labels']
        loss, pred_labels, _ = self(**input)

        self.log('train_loss', loss, prog_bar=True)
        self.log('train_acc', self.accuracy(pred_labels.view(-1), labels.view(-1).int()), prog_bar=True)
        
        return {"loss": loss}

    def validation_step(self, batch, batch_idx):
        input = self.batch2input(batch)
        labels = input['labels']
        loss, pred_labels, _ = self(**input)

        self.log('val_loss', loss)
        self.log('val_acc', self.accuracy(pred_labels.view(-1), labels.view(-1).int()))

    def test_step(self, batch, batch_nb):
        input = self.batch2input(batch)
        labels = input['labels']
        loss, pred_labels, _ = self(**input)

        self.log('test_loss', loss)
        self.log('test_acc', self.accuracy(pred_labels.view(-1), labels.view(-1).int()))

    def configure_optimizers(self):
        """Prepare optimizer and schedule (linear warmup and decay)"""
        model = self.model
        # optimizer = SGD(model.parameters(), lr=self.hparams.learning_rate)
        optimizer = Adam(model.parameters(), lr=self.hparams.learning_rate)

        self.opt = optimizer
        return [optimizer]

    def setup(self, stage):
        if stage == "fit":
            self.train_loader = self.get_dataloader("train", self.hparams.train_batch_size, shuffle=True)

    def train_dataloader(self):
        return self.train_loader

    def val_dataloader(self):
        return self.get_dataloader("dev", self.hparams.eval_batch_size, shuffle=False)

    def test_dataloader(self):
        return self.get_dataloader("test", self.hparams.eval_batch_size, shuffle=False)

    @staticmethod
    def add_generic_args(parser, root_dir) -> None:
        parser.add_argument(
            "--max_epochs",
            default=10,
            type=int,
            help="The number of epochs to train your model.",
        )
        ############################################################
        ## WARNING: set --gpus 0 if you do not have access to GPUS #
        ############################################################
        parser.add_argument(
            "--gpus",
            default=1,
            type=int,
            help="The number of GPUs allocated for this, it is by default 1. Set to 0 for no GPU.",
        )
        parser.add_argument(
            "--output_dir",
            default=None,
            type=str,
            required=True,
            help="The output directory where the model predictions and checkpoints will be written.",
        )
        parser.add_argument("--do_train", action="store_true", default=True, help="Whether to run training.")
        parser.add_argument("--do_predict", action="store_true", help="Whether to run predictions on the test set.")
        parser.add_argument("--seed", type=int, default=42, help="random seed for initialization")
        parser.add_argument(
            "--data_dir",
            default="./",
            type=str,
            help="The input data dir. Should contain the training files.",
        )
        parser.add_argument("--learning_rate", default=1e-2, type=float, help="The initial learning rate for training.")
        parser.add_argument("--num_workers", default=16, type=int, help="kwarg passed to DataLoader")
        parser.add_argument("--num_train_epochs", dest="max_epochs", default=3, type=int)
        ##############################################################
        # NOTE: Need to modify this since the arg is ignored somehow #
        ##############################################################
        # parser.add_argument("--train_batch_size", default=32, type=int)
        # parser.add_argument("--eval_batch_size", default=32, type=int)
        parser.add_argument("--train_batch_size", default=16, type=int)
        parser.add_argument("--eval_batch_size", default=16, type=int)
    
def generic_train(
    model: BaseModel,
    args: argparse.Namespace,
    early_stopping_callback=False,
    extra_callbacks=[],
    checkpoint_callback=None,
    logging_callback=None,
    **extra_train_kwargs
):
    
    # init model
    odir = Path(model.hparams.output_dir)
    odir.mkdir(exist_ok=True)
    log_dir = Path(os.path.join(model.hparams.output_dir, 'logs'))
    log_dir.mkdir(exist_ok=True)

    # Tensorboard logger
    pl_logger = pl_loggers.TensorBoardLogger(
        save_dir=log_dir,
        version="version_" + datetime.now().strftime("%d-%m-%Y--%H-%M-%S"),
        name="",
        default_hp_metric=True
    )

    # add custom checkpoints
    ckpt_path = os.path.join(
        args.output_dir, pl_logger.version, "checkpoints",
    )
    if checkpoint_callback is None:
        checkpoint_callback = pl.callbacks.ModelCheckpoint(
            dirpath=ckpt_path, filename="{epoch}-{val_acc:.2f}", monitor="val_acc", mode="max", save_top_k=1, verbose=True
        )

    train_params = {}

    train_params["max_epochs"] = args.max_epochs

    if args.gpus > 1:
        train_params["distributed_backend"] = "ddp"

    trainer = pl.Trainer.from_argparse_args(
        args,
        enable_model_summary=False,
        callbacks= [checkpoint_callback] + extra_callbacks,
        logger=pl_logger,
        **train_params,
    )

    if args.do_train:
        trainer.fit(model)
        # track model performance under differnt hparams settings in "Hparams" of TensorBoard
        pl_logger.log_hyperparams(params=model.hparams, metrics={'hp_metric': checkpoint_callback.best_model_score.item()})
        pl_logger.save()

        # save best model to `best_model.ckpt`
        target_path = os.path.join(ckpt_path, 'best_model.ckpt')
        logger.info(f"Copy best model from {checkpoint_callback.best_model_path} to {target_path}.")
        shutil.copy(checkpoint_callback.best_model_path, target_path)

    
    # Optionally, predict on test set and write to output_dir
    # if args.do_predict:
    #     best_model_path = os.path.join(ckpt_path, "best_model.ckpt")
    #     model = model.load_from_checkpoint(best_model_path)
    #     return trainer.test(model)
    if args.do_predict:
        best_model_path = os.path.join(ckpt_path, "best_model.ckpt")
        model = model.load_from_checkpoint(best_model_path)
        val_results_best = trainer.validate(model)
        print("Validation accuracy on the best model: {: .4f}".format(
            val_results_best[0]['val_acc']))
        return trainer.test(model)

    return trainer


03/10/2022 04:09:25 - INFO - numexpr.utils -   NumExpr defaulting to 2 threads.


# Long Short-term Memory Network (LSTM)

You need to finish two class `LSTM` and `LSTM-Attention` in the following cells. Try to run LSTM first!

For model architecture, you can start with: 
* word embedding dimension: 300
* intermediate layer dimension: 300
* output layer dimension: 1

Feel free to tune hyperparameters to see different results!

You may reuse code for computing loss and model predictions from logistic regression.

In [7]:
!pip3 install --upgrade tensorflow-gpu



In [8]:
# Install TF-Hub.
!pip3 install tensorflow-hub



In [9]:
import tensorflow as tf
import tensorflow_hub as hub
import numpy as np

The next several cells are just for playing around with USE

In [10]:
module_url = "https://tfhub.dev/google/universal-sentence-encoder/4" 
model = hub.load(module_url)
print ("module %s loaded" % module_url)

03/10/2022 04:09:46 - INFO - absl -   Using /tmp/tfhub_modules to cache modules.


module https://tfhub.dev/google/universal-sentence-encoder/4 loaded


In [11]:
def cosine(u, v):
    return np.dot(u, v) / (np.linalg.norm(u) * np.linalg.norm(v))

In [12]:
sentences = ['test this out', 'really random sentence']
sentence_embeddings = model(sentences).numpy()
tmp = torch.tensor(sentence_embeddings)
query = "let's test this out"
query_vec = model([query])[0].numpy()
print(len(query_vec))
for sent in sentences:
  sim = cosine(query_vec, model([sent])[0])
  print("Sentence = ", sent, "; similarity = ", sim)
print(sentence_embeddings)
print(tmp)

512
Sentence =  test this out ; similarity =  0.8348638
Sentence =  really random sentence ; similarity =  0.1765989
[[ 0.00280679 -0.02273874 -0.0101669  ... -0.0289427  -0.03363037
  -0.01175805]
 [-0.05226345  0.03288173  0.04235939 ... -0.04415091 -0.02205265
   0.04378343]]
tensor([[ 0.0028, -0.0227, -0.0102,  ..., -0.0289, -0.0336, -0.0118],
        [-0.0523,  0.0329,  0.0424,  ..., -0.0442, -0.0221,  0.0438]])


Making sure the text processing code works as expected

In [13]:
print(original_posts_train[0]['selftext'])
raw_list = [line.strip().split('.') for line in original_posts_train[0]['selftext'].split('\n') if len(line) > 0]
flattened = [x.strip() for l in raw_list for x in l if x != '' and x != 'CMV']
print('\n'.join(flattened))

I think the world is automating fast enough that a utopia will arise where no one will have to work anymore. Within the next 2 decades or so, having a job won't mean much, and most people will be artists and scientists. 

My parents let me live with them, so I can just wait until the utopia happens.

CMV.
I think the world is automating fast enough that a utopia will arise where no one will have to work anymore
Within the next 2 decades or so, having a job won't mean much, and most people will be artists and scientists
My parents let me live with them, so I can just wait until the utopia happens


Determining the truncation threshold

In [14]:
lengths = []
for each in original_posts_train:
  raw_list = [line.strip().split('.') for line in each['selftext'].split('\n') if len(line) > 0]
  lengths.append(len([x.strip() for l in raw_list for x in l if x != '' and x != 'CMV']))

lengths = np.array(lengths)
np.percentile(lengths, 95)

50.0

In [15]:
max_len = 257
raw_list = [line.strip().split('.') for line in original_posts_train[0]['selftext'].split('\n') if len(line) > 0]
flattened = [x.strip() for l in raw_list for x in l if x != '' and x != 'CMV']
padded_sentences = flattened + [''] * (max_len - len(flattened)) # truncate or pad to max length
mask = [1 if id!='' else 0 for id in padded_sentences]
res = model(padded_sentences).numpy()
print(res.shape)

(257, 512)


In [16]:
len(original_posts_test)

1529

In [17]:


class CMV2Dataset(Dataset):
    """
    Using dataset to process input text on-the-fly
    """
    def __init__(self, data):
        self.data = data
        self.max_len = 50    # based on the training set, using 95th percentile
        self.embedding_size = 512
        self.embedding_model = self._init_embedding()

    def _init_embedding(self):
        module_url = "https://tfhub.dev/google/universal-sentence-encoder/4" 
        return hub.load(module_url)

    def __getitem__(self, index):
        label = 1 if self.data[index]["delta_label"] else 0
        text = self.data[index]["selftext"]
        raw_list = [line.strip().split('.') for line in text.split('\n') if len(line) > 0]
        flattened = [x.strip() for l in raw_list for x in l if x != '' and x != 'CMV']
        padded_sentences = flattened[:self.max_len] + [''] * (self.max_len - len(flattened)) # truncate or pad to max length
        mask = [1 if id!='' else 0 for id in padded_sentences]
        res = self.embedding_model(padded_sentences).numpy()
        return res, label, len(flattened), mask

    def collate_fn(self, batch_data):
        padded_token_ids, labels, lengths, masks = list(zip(*batch_data))
        return (torch.FloatTensor(padded_token_ids).view(-1, self.max_len, self.embedding_size),
                torch.FloatTensor(labels).view(-1,1),
                torch.LongTensor(lengths).view(-1,1),
                torch.FloatTensor(masks).view(-1, self.max_len)
                )

    def __len__(self):
        return len(self.data)

class LSTM_PL(BaseModel):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        
    def _load_model(self):
          return LSTM_Attention(self.hparams.sentence_embedding_size)

    def get_dataloader(self, type_path, batch_size, shuffle=False):
        # using global original_posts_train since the loading process takes about 5 min
        if type_path == 'train':
          dataset = CMV2Dataset(original_posts_train)
        elif type_path in ['test', 'dev']:
          dataset = CMV2Dataset(original_posts_test)

        logger.info(f"Loading {type_path} data and labels")
        data_loader = torch.utils.data.DataLoader(
            dataset=dataset,
            batch_size=batch_size,
            shuffle=shuffle,
            num_workers=0,
            # num_workers=self.hparams.num_workers,
            collate_fn=dataset.collate_fn
        )
        
        return data_loader    

    def configure_optimizers(self):
        """Prepare optimizer and schedule (linear warmup and decay)"""
        model = self.model
        optimizer = Adam(model.parameters(), lr=self.hparams.learning_rate)
        self.opt = optimizer
        return [optimizer]
    
    def batch2input(self, batch):
        return {"input_ids": batch[0], "labels": batch[1], "lengths": batch[2], "masks": batch[3]}

    @staticmethod
    def add_model_specific_args(parser, root_dir):
        parser.add_argument(
            "--optimizer",
            default="adam",
            type=str,
            required=True,
            help="Whether to use SGD or not",
        )
        parser.add_argument(
            "--sentence_embedding_size",
            default=512,
            type=int,
            help="Pretrained tokenizer name or path",
        )

        return parser

In [18]:

class LSTM_Attention(torch.nn.Module):
    """
    LSTM with Attention Seq classification model
    """
    def __init__(self, use_glove=None):
        """
        # Parameters
        vocab_size: int
            size of the vocabulary.
        """
        super(LSTM_Attention, self).__init__()
        self.sentence_embedding_size = 512    # this is the embedding size created by Universal Sentence Encoder
        #################################################
        ## TODO: add LSTM, attention, and output layers #
        #################################################
        self.hidden_size = 300
        self.num_layers = 1
        self.lstm = torch.nn.LSTM(input_size=self.sentence_embedding_size,
                                  hidden_size=self.hidden_size,
                                  num_layers=self.num_layers,
                                  batch_first=True)
        
        self.attn = torch.nn.Linear(self.hidden_size, 1, bias=True)
        # self.dropout = torch.nn.Dropout(dropout_rate)

        self.out = torch.nn.Linear(in_features=self.hidden_size,
                                      out_features=1,
                                      bias=True)
        self.loss_fn = torch.nn.BCELoss()
        
    

    def forward(self, input_ids, labels, lengths, masks):
        """
        # Parameters
        input_ids: matrix of size (batch_size, feature_length). 
            Each row in data represents a sequence of token ids coming from tokenzied input text and vocabulary. 
        label: matrix of size (batch_size,).
            Ground truth labels.
        lengths: matrix of size (batch_size, 1). 
            Token length of input text. Help you to compute average word embedding
        mask: matrix of size (batch_size, feature_length). 
            Input mask that tells you whether the token is pad or not. If not masks = 1, else = 0. This helps you to compute attention weights
        # Returns
        loss: loss should be a scalar averaged accross batches
        predicted_labels : model predictions. Should be either 0 or 1 based on a threshold (usually 0.5).
        """
        #################################################################
        ## TODO: compute loss and predicted_labels based on model output#
        #################################################################
        
        # HINT: you can assign -1e9 to padded tokens based on masks so that after softmax, these tokens get zero attention        
        out, _ = self.lstm(input_ids)
        
        # Attention
        attn = self.attn(out)
        attn[masks==0] = -1e9
        attn = attn.squeeze()
        weights = torch.nn.functional.softmax(attn/3)
        # reshape for multiplication
        weights = weights.reshape(weights.size(0), weights.size(1), 1)
        h_att = (weights * out).sum(1)

        # Output layer
        h_final = self.out(h_att)
        # import pdb; pdb.set_trace()
        h_final = h_final.squeeze()
        probs = torch.sigmoid(h_final)

        # Calculate loss
        loss = self.loss_fn(probs, labels.squeeze())
        # Convert probability to 0 and 1
        predicted_labels = torch.gt(probs, 0.5).int()    
        
        return loss, predicted_labels, weights

In [19]:
import logging
logging.basicConfig(
        format="%(asctime)s - %(levelname)s - %(name)s -   %(message)s",
        datefmt="%m/%d/%Y %H:%M:%S",
        level=logging.INFO
    )
import time
import argparse
import glob
import os
logger = logging.getLogger(__name__)

def main():
    ########################################################
    ## TODO: change args if needed according to your files #
    ########################################################
    # mock_args = f"--data_dir {DATA_DIR} --output_dir lstm-att --optimizer adam \
    # --learning_rate 0.001 --max_epochs 10 --do_predict \
    # --train_batch_size 16" 
    mock_args = f"--data_dir {DATA_DIR} --output_dir lstm-att --optimizer adam \
    --learning_rate 0.001 --max_epochs 10 \
    --train_batch_size 16"

    # load hyperparameters
    parser = argparse.ArgumentParser()
    BaseModel.add_generic_args(parser, os.getcwd())
    parser = LSTM_PL.add_model_specific_args(parser, os.getcwd())
    args = parser.parse_args(mock_args.split())
    print(args)
    # fix random seed to make sure the result is reproducible
    pl.seed_everything(args.seed)

    # If output_dir not provided, a folder will be generated in pwd
    if args.output_dir is None:
        args.output_dir = os.path.join(
            "./results",
            f"{args.task}_{time.strftime('%Y%m%d_%H%M%S')}",
        )
        os.makedirs(args.output_dir)
    dict_args = vars(args)
    model = LSTM_PL(**dict_args)
    trainer = generic_train(model, args)


if __name__ == "__main__":
    main()


03/10/2022 04:10:06 - INFO - pytorch_lightning.utilities.seed -   Global seed set to 42
03/10/2022 04:10:06 - INFO - __main__ -   Initilazing BaseModel
03/10/2022 04:10:06 - INFO - pytorch_lightning.utilities.distributed -   GPU available: True, used: True
03/10/2022 04:10:06 - INFO - pytorch_lightning.utilities.distributed -   TPU available: False, using: 0 TPU cores
03/10/2022 04:10:06 - INFO - pytorch_lightning.utilities.distributed -   IPU available: False, using: 0 IPUs


Namespace(data_dir='./drive/MyDrive/CMSC35100/final_project_data', do_predict=False, do_train=True, eval_batch_size=16, gpus=1, learning_rate=0.001, max_epochs=10, num_workers=16, optimizer='adam', output_dir='lstm-att', seed=42, sentence_embedding_size=512, train_batch_size=16)


03/10/2022 04:10:09 - INFO - __main__ -   Loading train data and labels
03/10/2022 04:10:09 - INFO - pytorch_lightning.accelerators.gpu -   LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Validation sanity check: 0it [00:00, ?it/s]

03/10/2022 04:10:13 - INFO - __main__ -   Loading dev data and labels
03/10/2022 04:10:14 - INFO - pytorch_lightning.utilities.seed -   Global seed set to 42


Training: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

03/10/2022 04:12:28 - INFO - pytorch_lightning.utilities.distributed -   Epoch 0, global step 671: val_acc reached 0.56050 (best 0.56050), saving model to "/content/lstm-att/version_10-03-2022--04-10-06/checkpoints/epoch=0-val_acc=0.56.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

03/10/2022 04:14:42 - INFO - pytorch_lightning.utilities.distributed -   Epoch 1, global step 1343: val_acc reached 0.56181 (best 0.56181), saving model to "/content/lstm-att/version_10-03-2022--04-10-06/checkpoints/epoch=1-val_acc=0.56.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

03/10/2022 04:16:54 - INFO - pytorch_lightning.utilities.distributed -   Epoch 2, global step 2015: val_acc was not in top 1


Validating: 0it [00:00, ?it/s]

03/10/2022 04:19:07 - INFO - pytorch_lightning.utilities.distributed -   Epoch 3, global step 2687: val_acc reached 0.57227 (best 0.57227), saving model to "/content/lstm-att/version_10-03-2022--04-10-06/checkpoints/epoch=3-val_acc=0.57.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

03/10/2022 04:21:20 - INFO - pytorch_lightning.utilities.distributed -   Epoch 4, global step 3359: val_acc reached 0.57946 (best 0.57946), saving model to "/content/lstm-att/version_10-03-2022--04-10-06/checkpoints/epoch=4-val_acc=0.58.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

03/10/2022 04:23:33 - INFO - pytorch_lightning.utilities.distributed -   Epoch 5, global step 4031: val_acc was not in top 1


Validating: 0it [00:00, ?it/s]

03/10/2022 04:25:47 - INFO - pytorch_lightning.utilities.distributed -   Epoch 6, global step 4703: val_acc was not in top 1


Validating: 0it [00:00, ?it/s]

03/10/2022 04:28:00 - INFO - pytorch_lightning.utilities.distributed -   Epoch 7, global step 5375: val_acc reached 0.58012 (best 0.58012), saving model to "/content/lstm-att/version_10-03-2022--04-10-06/checkpoints/epoch=7-val_acc=0.58.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

03/10/2022 04:30:14 - INFO - pytorch_lightning.utilities.distributed -   Epoch 8, global step 6047: val_acc reached 0.58273 (best 0.58273), saving model to "/content/lstm-att/version_10-03-2022--04-10-06/checkpoints/epoch=8-val_acc=0.58.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

03/10/2022 04:32:27 - INFO - pytorch_lightning.utilities.distributed -   Epoch 9, global step 6719: val_acc was not in top 1
03/10/2022 04:32:27 - INFO - __main__ -   Copy best model from /content/lstm-att/version_10-03-2022--04-10-06/checkpoints/epoch=8-val_acc=0.58.ckpt to lstm-att/version_10-03-2022--04-10-06/checkpoints/best_model.ckpt.
