# T5 on TPU 💥🚀

In this notebook we will see how to train T5 model on TPU with Huggingface's awesome new [trainer](https://github.com/huggingface/transformers/blob/master/src/transformers/trainer.py). We will train T5 base model on SQUAD dataset for QA task. We will use the recently released amazing [nlp](https://github.com/huggingface/nlp) package to load and process the dataset in just few lines.

First make sure you are connected to the high RAM instance. This will not work on 12 GB colab instance.

In [0]:
# Crash on purpose to get more ram :
#import torch
#torch.tensor([10.]*10000000000)

Let's install [PyTorch/XLA](https://github.com/pytorch/xla) which enables PyTorch on TPU. Make sure you install the nightly version, as the trainer breaks on other versions.

In [2]:
VERSION = "nightly"  #@param ["1.5" , "20200325", "nightly"]
!curl https://raw.githubusercontent.com/pytorch/xla/master/contrib/scripts/env-setup.py -o pytorch-xla-env-setup.py
!python pytorch-xla-env-setup.py --version $VERSION

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100  4264  100  4264    0     0  54666      0 --:--:-- --:--:-- --:--:-- 54666
Updating TPU and VM. This may take around 2 minutes.
Updating TPU runtime to pytorch-nightly ...
Uninstalling torch-1.6.0a0+caaea08:
  Successfully uninstalled torch-1.6.0a0+caaea08
Uninstalling torchvision-0.7.0a0+98aa805:
  Successfully uninstalled torchvision-0.7.0a0+98aa805
Copying gs://tpu-pytorch/wheels/torch-nightly-cp36-cp36m-linux_x86_64.whl...
- [1 files][ 90.2 MiB/ 90.2 MiB]                                                
Operation completed over 1 objects/90.2 MiB.                                     
Copying gs://tpu-pytorch/wheels/torch_xla-nightly-cp36-cp36m-linux_x86_64.whl...
- [1 files][121.3 MiB/121.3 MiB]                                                
Op

Install transformers and the nlp package. Restart colab after this

In [3]:
!git clone https://github.com/huggingface/transformers.git
!pip install ./transformers


fatal: destination path 'transformers' already exists and is not an empty directory.
Processing ./transformers
Building wheels for collected packages: transformers
  Building wheel for transformers (setup.py) ... [?25l[?25hdone
  Created wheel for transformers: filename=transformers-2.10.0-cp36-none-any.whl size=667026 sha256=5ad16c8bb1c01764c8374b39d4be3ce0bf19e75151dd3ba816e80003722a77af
  Stored in directory: /tmp/pip-ephem-wheel-cache-bjve7vlm/wheels/23/19/dd/2561a4e47240cf6b307729d58e56f8077dd0c698f5992216cf
Successfully built transformers
Installing collected packages: transformers
  Found existing installation: transformers 2.10.0
    Uninstalling transformers-2.10.0:
      Successfully uninstalled transformers-2.10.0
Successfully installed transformers-2.10.0


## Load and process data

Let's load and process the dataset using the nlp library. We will process the examples in follwoing way to cast QA task in text-to-text setting

**input**
question: question_text  context: context 

**target**
answer_text

In [0]:
import torch
from transformers import T5Tokenizer, T5ForConditionalGeneration

In [0]:
tokenizer = T5Tokenizer.from_pretrained('t5-base')

In [0]:
# process the examples in input and target text format and the eos token at the end 
def add_eos_to_examples(example):
    example['input_text'] = '%s </s>' % (example['question'])
    example['target_text'] = '%s </s>' % (example['answers'])
    return example

# tokenize the examples
def convert_to_features(example_batch):
    input_encodings = tokenizer.batch_encode_plus(example_batch['input_text'], pad_to_max_length=True, max_length=512)
    target_encodings = tokenizer.batch_encode_plus(example_batch['target_text'], pad_to_max_length=True, max_length=16)

    encodings = {
        'input_ids': input_encodings['input_ids'], 
        'attention_mask': input_encodings['attention_mask'],
        'target_ids': target_encodings['input_ids'],
        'target_attention_mask': target_encodings['attention_mask']
    }

    return encodings

In [7]:
from google.colab import files
files.upload()

MessageError: ignored

In [0]:
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!pip install kaggle
!kaggle competitions download -c nlp-getting-started

In [0]:
!mkdir data
!mv *.csv data
!mkdir models

In [0]:
import pandas as pd
train = pd.read_csv('data/train.csv')
test = pd.read_csv('data/test.csv')

In [9]:
train.head()

Unnamed: 0,id,keyword,location,text,target
0,1,,,Our Deeds are the Reason of this #earthquake M...,1
1,4,,,Forest fire near La Ronge Sask. Canada,1
2,5,,,All residents asked to 'shelter in place' are ...,1
3,6,,,"13,000 people receive #wildfires evacuation or...",1
4,7,,,Just got sent this photo from Ruby #Alaska as ...,1


In [0]:
def token(t):
  if(int(t)):
    return 'positive </s>'
  else:
    return 'negative </s>'
train['target'] = train['target'].apply(token)

In [0]:
from sklearn.model_selection import train_test_split
train = train[['text','target']]
train, valid = train_test_split(train, test_size=0.2, random_state=42)

In [0]:
from torch.utils.data import Dataset, DataLoader
from transformers import T5Tokenizer

class T5Model(Dataset):
  def __init__(self, tokenizer,df,  max_len=128):
    self.data_column = df["text"].values
    self.class_column = df['target'].values
    self.max_len = max_len
    self.tokenizer = tokenizer
        
  def __len__(self):
      return len(self.data_column)

  def __getitem__(self, index):
    # tokenize inputs
    tokenized_inputs = self.tokenizer.encode_plus( self.data_column[index], max_length=self.max_len, pad_to_max_length=True, return_tensors="pt")
    tokenized_targets = self.tokenizer.encode_plus( self.class_column[index] , max_length=2, pad_to_max_length=True, return_tensors="pt")
    source_ids = tokenized_inputs["input_ids"].squeeze()
    target_ids = tokenized_targets["input_ids"].squeeze()
    src_mask    = tokenized_inputs["attention_mask"].squeeze() # might need to squeeze
    target_mask = tokenized_targets['attention_mask'].squeeze()  # might need to squeeze
    return {"input_ids": source_ids, "attention_mask": src_mask, 
                "target_ids": target_ids, "target_attention_mask": target_mask}

In [0]:
train_dataset = T5Model(tokenizer, train)
valid_dataset = T5Model(tokenizer,valid)

In [14]:
%%timeit
train_dataset[1]

The slowest run took 4.87 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 410 µs per loop


In [15]:
len(train_dataset), len(valid_dataset)

(6090, 1523)

In [0]:
# cach the dataset, so we can load it directly for training

torch.save(train_dataset, 'train_data.pt')
torch.save(valid_dataset, 'valid_data.pt')

For more details on how to use the nlp library check out this [notebook](https://colab.research.google.com/github/huggingface/nlp/blob/master/notebooks/Overview.ipynb).

## Write training script

Using the `Trainer` is pretty straightforward. Here are the 4 basic steps which are needed to use trainer.

1. **Parse the arguments needed**. These are divided in 3 parts for clarity and seperation (TrainingArguments, ModelArguments and DataTrainingArguments).

  1. **TrainingArguments**: These are basicaly the training hyperparameters such as learning rate, batch size, weight decay, gradient accumulation steps etc. See all possible arguments [here](https://github.com/huggingface/transformers/blob/master/src/transformers/training_args.py). These are used by the Trainer.

  2. **ModelArguments**: These are the arguments for the model that you want to use such as the model_name_or_path, tokenizer_name etc. You'll need these to load the model and tokenizer.

  3. **DataTrainingArguments**: These are as the name suggests arguments needed for the dataset. Such as the directory name where your files are stored etc. You'll need these to load/process the dataset.

  TrainingArguments are already defined in the `TrainingArguments` class, you'll need to define `ModelArguments` and `DataTrainingArguments` classes for your task.




2. Load train and eval datasets
3. Initialize the `Trainer`

    These are the mininum parameters which you'll for initializing `Trainer`. For full list check [here](https://github.com/huggingface/transformers/blob/master/src/transformers/trainer.py#L107)

    ```
      model: PreTrainedModel
      args: TrainingArguments
      train_dataset: Optional[Dataset]
      eval_dataset: Optional[Dataset]
    ```
4. Start training with  `trainer.train`

    Call `trainer.train` and let the magic begin!


There are lots of things which the trainer handles for you out of the box such as gradient_accumulation, fp16 training, setting up the optimizer and scheduler, logging with wandb etc. I didn't set-up wandb for this experiment, but will explore it for sure in future experiment.

In [0]:
import dataclasses
import logging
import os
import sys
from dataclasses import dataclass, field
from typing import Dict, List, Optional

import numpy as np
import torch

from transformers import T5ForConditionalGeneration, T5Tokenizer, EvalPrediction
from transformers import (
    HfArgumentParser,
    DataCollator,
    Trainer,
    TrainingArguments,
    set_seed,
)


logger = logging.getLogger(__name__)

# prepares lm_labels from target_ids, returns examples with keys as expected by the forward method
# this is necessacry because the trainer directly passes this dict as arguments to the model
# so make sure the keys match the parameter names of the forward method
@dataclass
class T2TDataCollator(DataCollator):
    def collate_batch(self, batch: List) -> Dict[str, torch.Tensor]:
        """
        Take a list of samples from a Dataset and collate them into a batch.
        Returns:
            A dictionary of tensors
        """
        input_ids = torch.stack([example['input_ids'] for example in batch])
        lm_labels = torch.stack([example['target_ids'] for example in batch])
        lm_labels[lm_labels[:, :] == 0] = -100
        attention_mask = torch.stack([example['attention_mask'] for example in batch])
        decoder_attention_mask = torch.stack([example['target_attention_mask'] for example in batch])
        

        return {
            'input_ids': input_ids, 
            'attention_mask': attention_mask,
            'lm_labels': lm_labels, 
            'decoder_attention_mask': decoder_attention_mask
        }


@dataclass
class ModelArguments:
    """
    Arguments pertaining to which model/config/tokenizer we are going to fine-tune from.
    """

    model_name_or_path: str = field(
        metadata={"help": "Path to pretrained model or model identifier from huggingface.co/models"}
    )
    tokenizer_name: Optional[str] = field(
        default=None, metadata={"help": "Pretrained tokenizer name or path if not the same as model_name"}
    )
    cache_dir: Optional[str] = field(
        default=None, metadata={"help": "Where do you want to store the pretrained models downloaded from s3"}
    )

@dataclass
class DataTrainingArguments:
    """
    Arguments pertaining to what data we are going to input our model for training and eval.
    """
    train_file_path: Optional[str] = field(
        default='train_data.pt',
        metadata={"help": "Path for cached train dataset"},
    )
    valid_file_path: Optional[str] = field(
        default='valid_data.pt',
        metadata={"help": "Path for cached valid dataset"},
    )
    max_len: Optional[int] = field(
        default=512,
        metadata={"help": "Max input length for the source text"},
    )
    target_max_len: Optional[int] = field(
        default=32,
        metadata={"help": "Max input length for the target text"},
    )


def main():
    # See all possible arguments in src/transformers/training_args.py
    # or by passing the --help flag to this script.
    # We now keep distinct sets of args, for a cleaner separation of concerns.

    parser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments))

    # we will load the arguments from a json file, 
    #make sure you save the arguments in at ./args.json
    model_args, data_args, training_args = parser.parse_json_file(json_file=os.path.abspath('args.json'))

    if (
        os.path.exists(training_args.output_dir)
        and os.listdir(training_args.output_dir)
        and training_args.do_train
        and not training_args.overwrite_output_dir
    ):
        raise ValueError(
            f"Output directory ({training_args.output_dir}) already exists and is not empty. Use --overwrite_output_dir to overcome."
        )

    # Setup logging
    logging.basicConfig(
        format="%(asctime)s - %(levelname)s - %(name)s -   %(message)s",
        datefmt="%m/%d/%Y %H:%M:%S",
        level=logging.INFO if training_args.local_rank in [-1, 0] else logging.WARN,
    )
    logger.warning(
        "Process rank: %s, device: %s, n_gpu: %s, distributed training: %s, 16-bits training: %s",
        training_args.local_rank,
        training_args.device,
        training_args.n_gpu,
        bool(training_args.local_rank != -1),
        training_args.fp16,
    )
    logger.info("Training/evaluation parameters %s", training_args)

    # Set seed
    set_seed(training_args.seed)

    # Load pretrained model and tokenizer
    #
    # Distributed training:
    # The .from_pretrained methods guarantee that only one local process can concurrently
    # download model & vocab.

    tokenizer = T5Tokenizer.from_pretrained(
        model_args.tokenizer_name if model_args.tokenizer_name else model_args.model_name_or_path,
        cache_dir=model_args.cache_dir,
    )
    model = T5ForConditionalGeneration.from_pretrained(
        model_args.model_name_or_path,
        cache_dir=model_args.cache_dir,
    )

    # Get datasets
    print('loading data')
    train_dataset  = torch.load(data_args.train_file_path)
    valid_dataset = torch.load(data_args.valid_file_path)
    print('loading done')

    # Initialize our Trainer
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=valid_dataset,
        data_collator=T2TDataCollator(),
        prediction_loss_only=True
    )

    # Training
    if training_args.do_train:
        trainer.train(
            model_path=model_args.model_name_or_path if os.path.isdir(model_args.model_name_or_path) else None
        )
        trainer.save_model()
        # For convenience, we also re-save the tokenizer to the same directory,
        # so that you can share your model easily on huggingface.co/models =)
        if trainer.is_world_master():
            tokenizer.save_pretrained(training_args.output_dir)

    # Evaluation
    results = {}
    if training_args.do_eval and training_args.local_rank in [-1, 0]:
        logger.info("*** Evaluate ***")

        eval_output = trainer.evaluate()

        output_eval_file = os.path.join(training_args.output_dir, "eval_results.txt")
        with open(output_eval_file, "w") as writer:
            logger.info("***** Eval results *****")
            for key in sorted(eval_output.keys()):
                logger.info("  %s = %s", key, str(eval_output[key]))
                writer.write("%s = %s\n" % (key, str(eval_output[key])))
    
        results.update(eval_output)
    
    return results


def _mp_fn(index):
    # For xla_spawn (TPUs)
    main()

## Train

In [0]:
import json

Let's write the arguments in a dict and store in a json file. The above code will load this file and parse the arguments.

In [0]:
args_dict = {
  "num_cores": 8,
  "model_name_or_path": 't5-base',
  "max_len": 128 ,
  "target_max_len": 2,
  "output_dir": './models/tpu',
  "overwrite_output_dir": True,
  "per_device_train_batch_size": 4,
  "per_gpu_eval_batch_size": 4,
  "gradient_accumulation_steps": 4,
  "learning_rate": 3e-5,
  "tpu_num_cores": 8,
  "num_train_epochs": 16,
  "do_train": True
}

In [0]:
with open('args.json', 'w') as f:
  json.dump(args_dict, f)

Start training!

In [0]:
import torch_xla.distributed.xla_multiprocessing as xmp

In [22]:
xmp.spawn(_mp_fn, args=(), nprocs=8, start_method='fork')

06/01/2020 14:01:08 - INFO - transformers.training_args -   PyTorch: setting up devices
06/01/2020 14:01:08 - INFO - __main__ -   Training/evaluation parameters TrainingArguments(output_dir='./models/tpu', overwrite_output_dir=True, do_train=True, do_eval=False, do_predict=False, evaluate_during_training=False, per_device_train_batch_size=4, per_device_eval_batch_size=8, per_gpu_train_batch_size=None, per_gpu_eval_batch_size=4, gradient_accumulation_steps=4, learning_rate=3e-05, weight_decay=0.0, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=16, max_steps=-1, warmup_steps=0, logging_dir=None, logging_first_step=False, logging_steps=500, save_steps=500, save_total_limit=None, no_cuda=False, seed=42, fp16=False, fp16_opt_level='O1', local_rank=-1, tpu_num_cores=8, tpu_metrics_debug=False)
06/01/2020 14:01:08 - INFO - transformers.tokenization_utils -   loading file https://s3.amazonaws.com/models.huggingface.co/bert/t5-spiece.model from cache at /root/.cache/torch/transformers/

loading data
loading done


06/01/2020 14:01:16 - INFO - transformers.modeling_utils -   Weights of T5ForConditionalGeneration not initialized from pretrained model: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight', 'lm_head.weight']


loading data
loading done


06/01/2020 14:01:16 - INFO - transformers.modeling_utils -   Weights of T5ForConditionalGeneration not initialized from pretrained model: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight', 'lm_head.weight']


loading data
loading done


06/01/2020 14:01:17 - INFO - transformers.modeling_utils -   Weights of T5ForConditionalGeneration not initialized from pretrained model: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight', 'lm_head.weight']


loading data


06/01/2020 14:01:17 - INFO - transformers.modeling_utils -   Weights of T5ForConditionalGeneration not initialized from pretrained model: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight', 'lm_head.weight']


loading data


06/01/2020 14:01:17 - INFO - transformers.modeling_utils -   Weights of T5ForConditionalGeneration not initialized from pretrained model: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight', 'lm_head.weight']


loading data


06/01/2020 14:01:17 - INFO - transformers.modeling_utils -   Weights of T5ForConditionalGeneration not initialized from pretrained model: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight', 'lm_head.weight']


loading done
loading data
loading done
loading done
loading done


06/01/2020 14:01:18 - INFO - transformers.modeling_utils -   Weights of T5ForConditionalGeneration not initialized from pretrained model: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight', 'lm_head.weight']


loading data
loading done


06/01/2020 14:02:03 - INFO - transformers.trainer -   You are instantiating a Trainer but W&B is not installed. To use wandb logging, run `pip install wandb; wandb login` see https://docs.wandb.com/huggingface.
06/01/2020 14:02:03 - INFO - transformers.trainer -   ***** Running training *****
06/01/2020 14:02:03 - INFO - transformers.trainer -     Num examples = 6090
06/01/2020 14:02:03 - INFO - transformers.trainer -     Num Epochs = 16
06/01/2020 14:02:03 - INFO - transformers.trainer -     Instantaneous batch size per device = 4
06/01/2020 14:02:03 - INFO - transformers.trainer -     Total train batch size (w. parallel, distributed & accumulation) = 32
06/01/2020 14:02:03 - INFO - transformers.trainer -     Gradient Accumulation steps = 4
06/01/2020 14:02:03 - INFO - transformers.trainer -     Total optimization steps = 752


HBox(children=(FloatProgress(value=0.0, description='Epoch', max=16.0, style=ProgressStyle(description_width='…

HBox(children=(FloatProgress(value=0.0, description='Iteration', max=191.0, style=ProgressStyle(description_wi…

06/01/2020 14:02:07 - INFO - transformers.trainer -   You are instantiating a Trainer but W&B is not installed. To use wandb logging, run `pip install wandb; wandb login` see https://docs.wandb.com/huggingface.
06/01/2020 14:02:07 - INFO - transformers.trainer -   ***** Running training *****
06/01/2020 14:02:07 - INFO - transformers.trainer -     Num examples = 6090
06/01/2020 14:02:07 - INFO - transformers.trainer -     Num Epochs = 16
06/01/2020 14:02:07 - INFO - transformers.trainer -     Instantaneous batch size per device = 4
06/01/2020 14:02:07 - INFO - transformers.trainer -     Total train batch size (w. parallel, distributed & accumulation) = 32
06/01/2020 14:02:07 - INFO - transformers.trainer -     Gradient Accumulation steps = 4
06/01/2020 14:02:07 - INFO - transformers.trainer -     Total optimization steps = 752
06/01/2020 14:02:19 - INFO - transformers.trainer -   You are instantiating a Trainer but W&B is not installed. To use wandb logging, run `pip install wandb; wan




HBox(children=(FloatProgress(value=0.0, description='Iteration', max=191.0, style=ProgressStyle(description_wi…




HBox(children=(FloatProgress(value=0.0, description='Iteration', max=191.0, style=ProgressStyle(description_wi…




HBox(children=(FloatProgress(value=0.0, description='Iteration', max=191.0, style=ProgressStyle(description_wi…




HBox(children=(FloatProgress(value=0.0, description='Iteration', max=191.0, style=ProgressStyle(description_wi…




HBox(children=(FloatProgress(value=0.0, description='Iteration', max=191.0, style=ProgressStyle(description_wi…




HBox(children=(FloatProgress(value=0.0, description='Iteration', max=191.0, style=ProgressStyle(description_wi…




HBox(children=(FloatProgress(value=0.0, description='Iteration', max=191.0, style=ProgressStyle(description_wi…




HBox(children=(FloatProgress(value=0.0, description='Iteration', max=191.0, style=ProgressStyle(description_wi…




HBox(children=(FloatProgress(value=0.0, description='Iteration', max=191.0, style=ProgressStyle(description_wi…




HBox(children=(FloatProgress(value=0.0, description='Iteration', max=191.0, style=ProgressStyle(description_wi…

{"loss": 1.0227806613086723, "learning_rate": 1.0053191489361702e-05, "epoch": 10.6282722513089, "step": 500}
{"loss": 1.0249151987200602, "learning_rate": 1.0053191489361702e-05, "epoch": 10.6282722513089, "step": 500}


06/01/2020 14:15:38 - INFO - transformers.trainer -   Saving model checkpoint to ./models/tpu/checkpoint-500
06/01/2020 14:15:38 - INFO - transformers.trainer -   Saving model checkpoint to ./models/tpu/checkpoint-500


{"loss": 1.0404525419259445, "learning_rate": 1.0053191489361702e-05, "epoch": 10.6282722513089, "step": 500}


06/01/2020 14:15:38 - INFO - transformers.trainer -   Saving model checkpoint to ./models/tpu/checkpoint-500


{"loss": 1.049517052442301, "learning_rate": 1.0053191489361702e-05, "epoch": 10.6282722513089, "step": 500}


06/01/2020 14:15:38 - INFO - transformers.trainer -   Saving model checkpoint to ./models/tpu/checkpoint-500


{"loss": 1.0416638561487197, "learning_rate": 1.0053191489361702e-05, "epoch": 10.6282722513089, "step": 500}


06/01/2020 14:15:38 - INFO - transformers.trainer -   Saving model checkpoint to ./models/tpu/checkpoint-500


{"loss": 1.042657372857677, "learning_rate": 1.0053191489361702e-05, "epoch": 10.6282722513089, "step": 500}


06/01/2020 14:15:38 - INFO - transformers.trainer -   Saving model checkpoint to ./models/tpu/checkpoint-500


{"loss": 1.0431025645425542, "learning_rate": 1.0053191489361702e-05, "epoch": 10.6282722513089, "step": 500}


06/01/2020 14:15:38 - INFO - transformers.trainer -   Saving model checkpoint to ./models/tpu/checkpoint-500


{"loss": 1.0341301245971117, "learning_rate": 1.0053191489361702e-05, "epoch": 10.6282722513089, "step": 500}


06/01/2020 14:15:38 - INFO - transformers.trainer -   Saving model checkpoint to ./models/tpu/checkpoint-500
06/01/2020 14:15:38 - INFO - transformers.configuration_utils -   Configuration saved in ./models/tpu/checkpoint-500/config.json
06/01/2020 14:17:06 - INFO - transformers.modeling_utils -   Model weights saved in ./models/tpu/checkpoint-500/pytorch_model.bin
06/01/2020 14:17:06 - INFO - transformers.modeling_utils -   Model weights saved in ./models/tpu/checkpoint-500/pytorch_model.bin
06/01/2020 14:17:06 - INFO - transformers.modeling_utils -   Model weights saved in ./models/tpu/checkpoint-500/pytorch_model.bin
06/01/2020 14:17:06 - INFO - transformers.modeling_utils -   Model weights saved in ./models/tpu/checkpoint-500/pytorch_model.bin
06/01/2020 14:17:06 - INFO - transformers.modeling_utils -   Model weights saved in ./models/tpu/checkpoint-500/pytorch_model.bin
06/01/2020 14:17:06 - INFO - transformers.modeling_utils -   Model weights saved in ./models/tpu/checkpoint-500/




HBox(children=(FloatProgress(value=0.0, description='Iteration', max=191.0, style=ProgressStyle(description_wi…




HBox(children=(FloatProgress(value=0.0, description='Iteration', max=191.0, style=ProgressStyle(description_wi…




HBox(children=(FloatProgress(value=0.0, description='Iteration', max=191.0, style=ProgressStyle(description_wi…




HBox(children=(FloatProgress(value=0.0, description='Iteration', max=191.0, style=ProgressStyle(description_wi…




HBox(children=(FloatProgress(value=0.0, description='Iteration', max=191.0, style=ProgressStyle(description_wi…

06/01/2020 14:22:54 - INFO - transformers.trainer -   

Training completed. Do not forget to share your model on huggingface.co/models =)


06/01/2020 14:22:54 - INFO - transformers.trainer -   

Training completed. Do not forget to share your model on huggingface.co/models =)


06/01/2020 14:22:54 - INFO - transformers.trainer -   Saving model checkpoint to ./models/tpu
06/01/2020 14:22:54 - INFO - transformers.trainer -   

Training completed. Do not forget to share your model on huggingface.co/models =)

06/01/2020 14:22:54 - INFO - transformers.trainer -   

Training completed. Do not forget to share your model on huggingface.co/models =)

06/01/2020 14:22:54 - INFO - transformers.trainer -   Saving model checkpoint to ./models/tpu


06/01/2020 14:22:54 - INFO - transformers.trainer -   

Training completed. Do not forget to share your model on huggingface.co/models =)

06/01/2020 14:22:54 - INFO - transformers.trainer -   Saving model checkpoint to ./models/tpu

06/01/2020 14:22:5





06/01/2020 14:22:54 - INFO - transformers.trainer -   

Training completed. Do not forget to share your model on huggingface.co/models =)


06/01/2020 14:22:54 - INFO - transformers.trainer -   Saving model checkpoint to ./models/tpu
06/01/2020 14:22:54 - INFO - transformers.trainer -   

Training completed. Do not forget to share your model on huggingface.co/models =)


06/01/2020 14:22:54 - INFO - transformers.trainer -   Saving model checkpoint to ./models/tpu
06/01/2020 14:22:54 - INFO - transformers.configuration_utils -   Configuration saved in ./models/tpu/config.json
06/01/2020 14:23:02 - INFO - transformers.modeling_utils -   Model weights saved in ./models/tpu/pytorch_model.bin
06/01/2020 14:23:02 - INFO - transformers.modeling_utils -   Model weights saved in ./models/tpu/pytorch_model.bin
06/01/2020 14:23:02 - INFO - transformers.modeling_utils -   Model weights saved in ./models/tpu/pytorch_model.bin
06/01/2020 14:23:02 - INFO - transformers.modeling_utils -   Model weight

## Eval

There are two gotchas here. First the metrics functionality in the nlp package is still work-in-progress so we will use the official squad evaluation script. Second, for some reason which I couldn't figure out, the `.generate` method is not working on TPU so will need to do prediction on CPU. For predicting the validation set it almost takes 40 mins.

In [0]:
import torch
import torch_xla
import torch_xla.core.xla_model as xm

from transformers import T5ForConditionalGeneration, T5Tokenizer

from tqdm.auto import tqdm

In [0]:
model = T5ForConditionalGeneration.from_pretrained('models/tpu').to('cpu') # because its loaded on xla by default
tokenizer = T5Tokenizer.from_pretrained('models/tpu')

In [0]:
valid_dataset = torch.load('valid_data.pt')
dataloader = torch.utils.data.DataLoader(valid_dataset, batch_size=32)

In [26]:
answers = []
for batch in tqdm(dataloader):
  outs = model.generate(input_ids=batch['input_ids'], 
                        attention_mask=batch['attention_mask'],
                        max_length=2,
                        early_stopping=True)
  outs = [tokenizer.decode(ids) for ids in outs]
  answers.extend(outs)

HBox(children=(FloatProgress(value=0.0, max=48.0), HTML(value='')))




In [0]:
predictions = []
references = []
for ref, pred in zip(valid_dataset, answers):
  predictions.append(pred)
  references.append(tokenizer.decode(ref['target_ids']))

In [28]:
predictions[0], references[0]

('negative', 'positive')

In [29]:
from sklearn.metrics import classification_report
print(classification_report(references, predictions))

              precision    recall  f1-score   support

    negative       0.86      0.81      0.84       874
    positive       0.77      0.82      0.79       649

    accuracy                           0.82      1523
   macro avg       0.81      0.82      0.81      1523
weighted avg       0.82      0.82      0.82      1523

