<a href="https://colab.research.google.com/github/AxelAllen/Pre-trained-Multimodal-Text-Image-Classifier-in-a-Sparse-Data-Application/blob/master/run_bert_text_only.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Run Text-Only Experiments

This notebook shows the end-to-end pipeline to fine-tune pre-trained BERT model for text classification on our dataset.

Parts of this pipeline are adapted from [McCormick's and Ryan's Tutorial on BERT Fine-Tuning](http://mccormickml.com/2019/07/22/BERT-fine-tuning/) and the
Huggingface `run_mmimdb.py` script to execute the MMBT model. This code can
be accessed [here.](https://github.com/huggingface/transformers/blob/8ea412a86faa8e9edeeb6b5c46b08def06aa03ea/examples/research_projects/mm-imdb/run_mmimdb.py#L305)

### Checking Directory
If you're in the correct directory, the command in the cell below should show the notebooks, MMBT/, data/, runs/, integrated_gradients/ directories. If you're not getting this outputk, you are not in the correct directory to run the subsequent cells in this notebook.

## Check GPU is Available

In [1]:
import torch

# If there's a GPU available...
if torch.cuda.is_available():    

    # Tell PyTorch to use the GPU.    
    device = torch.device("cuda")

    print('There are %d GPU(s) available.' % torch.cuda.device_count())

    print('We will use the GPU:', torch.cuda.get_device_name(0))

# If not...
else:
    print('No GPU available, using the CPU instead.')
    device = torch.device("cpu")

There are 1 GPU(s) available.
We will use the GPU: NVIDIA GeForce RTX 2070 with Max-Q Design


## Install Huggingface Trnasformers and WandB modules

These should have been installed during your environment set-up; you only need to run these cells in Google Colab.

In [None]:
!pip install transformers

Collecting transformers
[?25l  Downloading https://files.pythonhosted.org/packages/2c/d8/5144b0712f7f82229a8da5983a8fbb8d30cec5fbd5f8d12ffe1854dcea67/transformers-4.4.1-py3-none-any.whl (2.1MB)
[K     |████████████████████████████████| 2.1MB 16.0MB/s 
Collecting sacremoses
[?25l  Downloading https://files.pythonhosted.org/packages/7d/34/09d19aff26edcc8eb2a01bed8e98f13a1537005d31e95233fd48216eed10/sacremoses-0.0.43.tar.gz (883kB)
[K     |████████████████████████████████| 890kB 49.7MB/s 
Collecting tokenizers<0.11,>=0.10.1
[?25l  Downloading https://files.pythonhosted.org/packages/71/23/2ddc317b2121117bf34dd00f5b0de194158f2a44ee2bf5e47c7166878a97/tokenizers-0.10.1-cp37-cp37m-manylinux2010_x86_64.whl (3.2MB)
[K     |████████████████████████████████| 3.2MB 52.1MB/s 
Building wheels for collected packages: sacremoses
  Building wheel for sacremoses (setup.py) ... [?25l[?25hdone
  Created wheel for sacremoses: filename=sacremoses-0.0.43-cp37-none-any.whl size=893262 sha256=07eec15f9c

In [2]:
%pip install wandb

Collecting wandb
  Downloading wandb-0.14.0-py3-none-any.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting pathtools
  Downloading pathtools-0.1.2.tar.gz (11 kB)
  Preparing metadata (setup.py) ... [?25ldone
[?25hCollecting appdirs>=1.4.3
  Downloading appdirs-1.4.4-py2.py3-none-any.whl (9.6 kB)
Collecting Click!=8.0.0,>=7.0
  Using cached click-8.1.3-py3-none-any.whl (96 kB)
Collecting sentry-sdk>=1.0.0
  Downloading sentry_sdk-1.18.0-py2.py3-none-any.whl (194 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.8/194.8 kB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m
Collecting docker-pycreds>=0.4.0
  Downloading docker_pycreds-0.4.0-py2.py3-none-any.whl (9.0 kB)
Collecting GitPython!=3.1.29,>=1.0.0
  Downloading GitPython-3.1.31-py3-none-any.whl (184 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m184.3/184.3 kB[0m [31m6.5 MB/s[0m eta 

## Import Required Modules

In [3]:
from textBert_utils import (
    get_train_val_test_data, 
    tokenize_and_encode_data, 
    make_tensor_dataset, 
    make_dataloader, 
    set_seed, 
    get_label_frequencies, 
    get_multiclass_criterion
)

  from .autonotebook import tqdm as notebook_tqdm


In [4]:
from MMBT.mmbt_utils import get_multiclass_labels, get_labels

In [5]:
import textBert_utils

In [6]:
import argparse 
import pandas as pd
import os
import wandb
import glob
import numpy as np

In [7]:
import logging
import json

In [8]:
from transformers import (
    WEIGHTS_NAME,
    AutoConfig,
    AutoModelForSequenceClassification,
    AutoTokenizer,
)

# Set-up Experiment Hyperparameters and Arguments

Specify the training, validation, and test files to run the experiment on. The default here is running the model on both 'findings' and 'impression' texts.  

To re-make the training, validation, and test data, please refer to the information in the **data/** directory.  

Change the default values in the parser.add_argument function for the hyperparameters that you want to specify in the following cell or use the default option.  

For multiple experiment runs, please make sure to change the `output_dir` argument so that new results don't overwrit existing ones.

In [9]:
#train_file = "image_labels_impression_frontal_train.csv"
#val_file = "image_labels_impression_frontal_val.csv"
#test_file = "image_labels_impression_frontal_test.csv"

#train_file = "image_multi_labels_major_findings_frontal_train.csv"
#val_file = "image_multi_labels_major_findings_frontal_val.csv"
#test_file = "image_multi_labels_major_findings_frontal_test.csv"


#train_file = "image_labels_major_findings_frontal_train.csv"
#val_file = "image_labels_major_findings_frontal_val.csv"
#test_file = "image_labels_major_findings_frontal_test.csv"


train_file = "image_labels_findings_frontal_train.csv"
val_file = "image_labels_findings_frontal_val.csv"
test_file = "image_labels_findings_frontal_test.csv"

In [10]:
parser = argparse.ArgumentParser(f'Project Hyperparameters and Other Configurations Argument Parser')

In [11]:
parser = argparse.ArgumentParser()

# Required parameters
parser.add_argument(
    "--data_dir",
    default="data/csv",
    type=str,
    help="The input data dir. Should contain the .jsonl files.",
)
parser.add_argument(
    "--model_name",
    default="bert-base-uncased",
    type=str,
    help="model identifier from huggingface.co/models",
)
parser.add_argument(
    "--output_dir",
    default="10epochs_text_only_findings",
    type=str,
    help="The output directory where the model predictions and checkpoints will be written.",
)

    
parser.add_argument(
    "--config_name", default="bert-base-uncased", type=str, help="Pretrained config name if not the same as model_name"
)
parser.add_argument(
    "--tokenizer_name",
    default="bert-base-uncased",
    type=str,
    help="Pretrained tokenizer name or path if not the same as model_name",
)

parser.add_argument("--train_batch_size", default=32, type=int, help="Batch size for training.")
parser.add_argument(
    "--eval_batch_size", default=32, type=int, help="Batch size for evaluation."
)
parser.add_argument(
    "--max_seq_length",
    default=300,
    type=int,
    help="The maximum total input sequence length after tokenization. Sequences longer "
    "than this will be truncated, sequences shorter will be padded.",
)
parser.add_argument(
    "--num_image_embeds", default=3, type=int, help="Number of Image Embeddings from the Image Encoder"
)
parser.add_argument("--do_train", default=True, type=bool, help="Whether to run training.")
parser.add_argument("--do_eval", default=True, type=bool, help="Whether to run eval on the dev set.")
parser.add_argument(
    "--evaluate_during_training", default=True, type=bool, help="Rul evaluation during training at each logging step."
)


parser.add_argument(
    "--gradient_accumulation_steps",
    type=int,
    default=1,
    help="Number of updates steps to accumulate before performing a backward/update pass.",
)
parser.add_argument("--learning_rate", default=5e-5, type=float, help="The initial learning rate for Adam.")
parser.add_argument("--weight_decay", default=0.1, type=float, help="Weight deay if we apply some.")
parser.add_argument("--adam_epsilon", default=1e-8, type=float, help="Epsilon for Adam optimizer.")
parser.add_argument("--max_grad_norm", default=1.0, type=float, help="Max gradient norm.")
parser.add_argument(
    "--num_train_epochs", default=10.0, type=float, help="Total number of training epochs to perform."
)
parser.add_argument("--patience", default=5, type=int, help="Patience for Early Stopping.")
parser.add_argument(
    "--max_steps",
    default=-1,
    type=int,
    help="If > 0: set total number of training steps to perform. Override num_train_epochs.",
)
parser.add_argument("--warmup_steps", default=0, type=int, help="Linear warmup over warmup_steps.")

parser.add_argument("--logging_steps", type=int, default=25, help="Log every X updates steps.")
parser.add_argument("--save_steps", type=int, default=25, help="Save checkpoint every X updates steps.")
parser.add_argument(
    "--eval_all_checkpoints",
    default=True, type=bool,
    help="Evaluate all checkpoints starting with the same prefix as model_name ending and ending with step number",
)

parser.add_argument("--num_workers", type=int, default=8, help="number of worker threads for dataloading")

parser.add_argument("--seed", type=int, default=42, help="random seed for initialization")


args = parser.parse_args("")

# Setup CUDA, GPU & distributed training
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
args.n_gpu = torch.cuda.device_count() if torch.cuda.is_available() else 0
args.device = device

# Setup Train/Val/Test filenames
args.train_file = train_file
args.val_file = val_file
args.test_file = test_file

# accomodatae multiclass labeling
args.multiclass = False

### Check that the Args dict contains correct configurations

In [12]:
args.__dict__

{'data_dir': 'data/csv',
 'model_name': 'bert-base-uncased',
 'output_dir': '10epochs_text_only_findings',
 'config_name': 'bert-base-uncased',
 'tokenizer_name': 'bert-base-uncased',
 'train_batch_size': 32,
 'eval_batch_size': 32,
 'max_seq_length': 300,
 'num_image_embeds': 3,
 'do_train': True,
 'do_eval': True,
 'evaluate_during_training': True,
 'gradient_accumulation_steps': 1,
 'learning_rate': 5e-05,
 'weight_decay': 0.1,
 'adam_epsilon': 1e-08,
 'max_grad_norm': 1.0,
 'num_train_epochs': 10.0,
 'patience': 5,
 'max_steps': -1,
 'warmup_steps': 0,
 'logging_steps': 25,
 'save_steps': 25,
 'eval_all_checkpoints': True,
 'num_workers': 8,
 'seed': 42,
 'n_gpu': 1,
 'device': device(type='cuda'),
 'train_file': 'image_labels_findings_frontal_train.csv',
 'val_file': 'image_labels_findings_frontal_val.csv',
 'test_file': 'image_labels_findings_frontal_test.csv',
 'multiclass': False}

## Set-up WandB

We are setting up our code to run more experiments later and would be tracking them in the WandB API. You need to sign up for an account first to continue.

In [13]:
wandb.login()

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
[34m[1mwandb[0m: Paste an API key from your profile and hit enter, or press ctrl+c to quit:[34m[1mwandb[0m: Paste an API key from your profile and hit enter, or press ctrl+c to quit:[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /home/gaurab/.netrc


True

In [14]:
wandb.init(name="Train_Findings_Texts_10", tags=['Findings', 'frontal'], project="Text_Only", notes="10 epochs 256 size and 32 batch", config=args.__dict__)
run_name = wandb.run.name
wandb_config = wandb.config

[34m[1mwandb[0m: Currently logged in as: [33msubedigaurab821[0m ([33mgaurab[0m). Use [1m`wandb login --relogin`[0m to force relogin


## Create Dataset

In [15]:
train, val, test = get_train_val_test_data(wandb_config)

Number of training sentences: 1,707

Number of val sentences: 570

Number of test sentences: 570



In [16]:
train.head()

Unnamed: 0.1,Unnamed: 0,img,label,text
0,2573,CXR2728_IM-1187-1001.png,1,Lungs remain hyperexpanded. No change in the ...
1,1061,CXR2156_IM-0775-1001.png,0,The heart is normal in size. The pulmonary va...
2,862,CXR1732_IM-0482-1001.png,0,Cardiac and mediastinal silhouette are unrema...
3,1581,CXR3265_IM-1551-1001.png,0,The heart size and mediastinal contours appea...
4,1847,CXR3760_IM-1883-1001.png,0,XXXX sternotomy XXXX remain in XXXX. The card...


In [17]:
val.head()

Unnamed: 0.1,Unnamed: 0,img,label,text
0,437,CXR857_IM-2378-1001.png,0,The heart size and pulmonary vascularity appe...
1,1142,CXR2349_IM-0914-1001.png,0,Heart size and vascularity normal. These cont...
2,1024,CXR2075_IM-0708-1001.png,0,The cardiomediastinal silhouette is normal in...
3,1454,CXR3020_IM-1395-1001.png,0,Heart size and mediastinal contours are norma...
4,1759,CXR3583_IM-1762-1001.png,0,"The heart, pulmonary XXXX and mediastinum are..."


In [18]:
test.head()

Unnamed: 0.1,Unnamed: 0,img,label,text
0,1813,CXR3704_IM-1851-1001.png,0,"Lungs are clear without focal consolidation, ..."
1,1132,CXR2328_IM-0898-1001.png,0,Heart size is normal. The lungs are clear. Th...
2,2325,CXR1668_IM-0441-1001.png,1,"No pneumothorax, pleural effusion, or focal a..."
3,605,CXR1222_IM-0150-1001.png,0,The heart and lungs have XXXX XXXX in the int...
4,1073,CXR2198_IM-0808-1001.png,0,"Cardiac silhouette, pulmonary vascular patter..."


# sentences and labels

In [19]:
train_sentences = train.text.values
train_labels = train.label.values

val_sentences = val.text.values
val_labels = val.label.values

test_sentences = test.text.values
test_labels = test.label.values

In [20]:
train_sentences[:10]

array([' Lungs remain hyperexpanded. No change in the right middle lobe opacification. No XXXX infiltrates or masses. Pulmonary arteries are prominent centrally.',
       ' The heart is normal in size. The pulmonary vascularity is within normal limits in appearance. No focal air space opacities. No pleural effusions or pneumothorax. No acute bony abnormalities.',
       ' Cardiac and mediastinal silhouette are unremarkable. Lungs are clear. No focal consolidation, pneumothorax, or pleural effusion identified. XXXX and soft tissue are unremarkable.',
       ' The heart size and mediastinal contours appear within normal limits. No focal airspace consolidation, pleural effusion or pneumothorax. No acute bony abnormalities.',
       ' XXXX sternotomy XXXX remain in XXXX. The cardiomediastinal silhouette is within normal limits for appearance. The thoracic aorta is tortuous. No focal areas of pulmonary consolidation. No pneumothorax. No pleural effusion. Moderate degenerative changes of the

In [21]:
train_labels[:10]

array([1, 0, 0, 0, 0, 1, 0, 0, 0, 0])

# Tokenize and Encode with BERT encoder plus

The `tokenizer.encode_plus` function combines multiple steps for us:

1. Split the sentence into tokens.
2. Add the special `[CLS]` and `[SEP]` tokens.
3. Map the tokens to their IDs.
4. Pad or truncate all sentences to the same length.
5. Create the attention masks which explicitly differentiate real tokens from `[PAD]` tokens.

These steps are performed inside the `make_tensor_dataset` function.

# Torch dataset and dataloader

In [22]:
train_dataset = make_tensor_dataset(train_sentences, train_labels, wandb_config)
val_dataset = make_tensor_dataset(val_sentences, val_labels, wandb_config)

Original:   Lungs remain hyperexpanded. No change in the right middle lobe opacification. No XXXX infiltrates or masses. Pulmonary arteries are prominent centrally.
Token IDs: tensor([  101,  8948,  3961, 23760, 10288,  9739,  5732,  1012,  2053,  2689,
         1999,  1996,  2157,  2690, 21833,  6728,  6305,  9031,  1012,  2053,
        22038, 20348, 29543,  2015,  2030, 11678,  1012, 21908, 28915,  2024,
         4069, 25497,  1012,   102,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,  

In [23]:
print(f'{len(train_dataset):>5,} training samples')
print(f'{len(val_dataset):>5,} validation samples')
#print(f'{len(test_dataset):>5,} test samples')

1,707 training samples
  570 validation samples


In [24]:
train_dataset[:3]

(tensor([[  101,  8948,  3961, 23760, 10288,  9739,  5732,  1012,  2053,  2689,
           1999,  1996,  2157,  2690, 21833,  6728,  6305,  9031,  1012,  2053,
          22038, 20348, 29543,  2015,  2030, 11678,  1012, 21908, 28915,  2024,
           4069, 25497,  1012,   102,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,   

Create an iterator for the dataset using the torch DataLoader class. 

In [25]:
data_loaders = {
    'train' : make_dataloader(train_dataset, wandb_config, eval=False),
    'train_size': len(train_dataset),
    'eval' : make_dataloader(val_dataset, wandb_config, eval=True),
    'eval_size' : len(val_dataset)
}

# Fine Tune BERT for Classification

## Setup Logging

In [26]:
# Setup logging
logger = logging.getLogger(__name__)
if not os.path.exists(wandb_config.output_dir):
    os.makedirs(wandb_config.output_dir)
logging.basicConfig(format="%(asctime)s - %(levelname)s - %(name)s -   %(message)s",
                    datefmt="%m/%d/%Y %H:%M:%S",
                    filename=os.path.join(wandb_config.output_dir, f"{os.path.splitext(wandb_config.train_file)[0]}_logging.txt"),
                    level=logging.INFO)
logger.warning("device: %s, n_gpu: %s",
        wandb_config.device,
        wandb_config.n_gpu
)
# Set the verbosity to info of the Transformers logger (on main process only):

# Set seed
set_seed(wandb_config)

## Set up the Model and Train

The Code will simply train and validate the specified train and validation sets. 

Outputs and saved checkpoints are saved in the specifed `--output_dir` argument.
Tensorboard data are saved in the `runs/` directory with the date and time of the experiment as well as the filename of the train/test data file.

In [27]:
%pdb on
# set up model
if args.multiclass:
    labels = get_multiclass_labels()
    num_labels = len(labels)
else:
    labels = get_labels()
    num_labels = len(labels)
transformer_config = AutoConfig.from_pretrained(wandb_config.model_name, num_labels=num_labels)
tokenizer = AutoTokenizer.from_pretrained(
        wandb_config.tokenizer_name,
        do_lower_case=True,
        cache_dir=None,
    )
transformer_model = AutoModelForSequenceClassification.from_pretrained(wandb_config.model_name, config=transformer_config)
transformer_model.to(device)
logger.info(f"Training/evaluation parameters: {wandb_config}")
# Training
if wandb_config.do_train:
    if wandb_config.multiclass:
        criterion = get_multiclass_criterion(train_labels)
        global_step, tr_loss = textBert_utils.train(data_loaders, wandb_config, transformer_model, criterion)
    else:
        global_step, tr_loss = textBert_utils.train(data_loaders, wandb_config, transformer_model)
    logger.info(" global_step = %s, average loss = %s", global_step, tr_loss)

# Saving best-practices: if you use defaults names for the model, you can reload it using from_pretrained()
    logger.info("Saving model checkpoint to %s", wandb_config.output_dir)
    # Save a trained model, configuration and tokenizer using `save_pretrained()`.
    # They can then be reloaded using `from_pretrained()`
    model_to_save = (transformer_model.module if hasattr(transformer_model, "module") else transformer_model)  # Take care of distributed/parallel training
    torch.save(model_to_save.state_dict(), os.path.join(wandb_config.output_dir, WEIGHTS_NAME))
    tokenizer.save_pretrained(wandb_config.output_dir)
    transformer_config.save_pretrained(wandb_config.output_dir)

    # Good practice: save your training arguments together with the trained model
    torch.save(args, os.path.join(wandb_config.output_dir, "training_args.bin"))

    # Load a trained model and vocabulary that you have fine-tuned
    transformer_model = AutoModelForSequenceClassification.from_pretrained(wandb_config.model_name, config=transformer_config)
    transformer_model.load_state_dict(torch.load(os.path.join(wandb_config.output_dir, WEIGHTS_NAME)))
    tokenizer = AutoTokenizer.from_pretrained(wandb_config.output_dir)
    transformer_model.to(device)
logger.info("***** Training Finished *****")
wandb.finish()


Automatic pdb calling has been turned ON


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

OutOfMemoryError: CUDA out of memory. Tried to allocate 34.00 MiB (GPU 0; 7.79 GiB total capacity; 6.82 GiB already allocated; 35.62 MiB free; 7.08 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

> [0;32m/home/gaurab/anaconda3/envs/pytorch/lib/python3.10/site-packages/torch/nn/functional.py[0m(1252)[0;36mdropout[0;34m()[0m
[0;32m   1250 [0;31m    [0;32mif[0m [0mp[0m [0;34m<[0m [0;36m0.0[0m [0;32mor[0m [0mp[0m [0;34m>[0m [0;36m1.0[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m   1251 [0;31m        [0;32mraise[0m [0mValueError[0m[0;34m([0m[0;34m"dropout probability has to be between 0 and 1, "[0m [0;34m"but got {}"[0m[0;34m.[0m[0mformat[0m[0;34m([0m[0mp[0m[0;34m)[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m-> 1252 [0;31m    [0;32mreturn[0m [0m_VF[0m[0;34m.[0m[0mdropout_[0m[0;34m([0m[0minput[0m[0;34m,[0m [0mp[0m[0;34m,[0m [0mtraining[0m[0;34m)[0m [0;32mif[0m [0minplace[0m [0;32melse[0m [0m_VF[0m[0;34m.[0m[0mdropout[0m[0;34m([0m[0minput[0m[0;34m,[0m [0mp[0m[0;34m,[0m [0mtraining[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m   1253 [0;31m[0;34m[0m[0m
[0m[0;32m   1254

# Evaluation on Test set

## tokenizer and prepare test dataset

use the saved tokenizer from the training step

In [None]:
wandb.init(name="Test_Findings_Texts_10", tags=['Findings', 'frontal'], project="Text_Only", notes="10 epochs 256 size and 32 batch", config=args.__dict__)
# wandb.tensorboard.patch(root_logdir="...")
run_name = wandb.run.name
wandb_config = wandb.config

In [None]:
test_dataset = make_tensor_dataset(test_sentences, test_labels, wandb_config, saved_model=True)

Original:   Lungs are clear without focal consolidation, effusion or pneumothorax. Normal heart size. Bony thorax and soft tissues unremarkable
Token IDs: tensor([  101,  8948,  2024,  3154,  2302, 15918, 17439,  1010,  1041,  4246,
        14499,  2030,  1052,  2638,  2819, 29288,  2527,  2595,  1012,  3671,
         2540,  2946,  1012, 22678, 15321,  8528,  1998,  3730, 14095,  4895,
        28578, 17007,  3085,   102,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,  

In [None]:
data_loaders['test'] = make_dataloader(test_dataset, wandb_config, eval=True)
data_loaders['test_size'] = len(test_dataset)

In [None]:
# Evaluation
results = {}
if wandb_config.do_eval:
    checkpoints = [wandb_config.output_dir]
    if wandb_config.eval_all_checkpoints:
        checkpoints = list(os.path.dirname(c) 
        for c in sorted(glob.glob(wandb_config.output_dir + "/**/" + 
                                  WEIGHTS_NAME, recursive=False)))
        # recursive=False because otherwise the parent diretory gets included
        # which is not what we want; only subdirectories

    logger.info("Evaluate the following checkpoints: %s", checkpoints)

    for checkpoint in checkpoints:
        global_step = checkpoint.split("-")[-1] if len(checkpoints) > 1 else ""
        prefix = checkpoint.split("/")[-1] if checkpoint.find("checkpoint") != -1 else ""
        transformer_model = AutoModelForSequenceClassification.from_pretrained(wandb_config.model_name, config=transformer_config)
        checkpoint = os.path.join(checkpoint, 'pytorch_model.bin')
        transformer_model.load_state_dict(torch.load(checkpoint))
        transformer_model.to(wandb_config.device)
        if wandb_config.multiclass:
            result = textBert_utils.evaluate(data_loaders, wandb_config, transformer_model, prefix=prefix, test=True, criterion=criterion)
        else:
            result = textBert_utils.evaluate(data_loaders, wandb_config, transformer_model, prefix=prefix, test=True) # test=True uses the test_dataset not val_dataset
        result = dict((k + "_{}".format(global_step), v) for k, v in result.items())
        results.update(result)
    logger.info("***** Evaluation on Test Data Finished *****")
wandb.finish()

## Saving Test Eval Results

The code automatically saved evaluation result from each checkpoint in its respective folder. This next cell simply saves all of them in one place.

In [None]:
with open(os.path.join(args.output_dir, f"{os.path.splitext(args.test_file)[0]}_eval_results.txt"), mode='w', encoding='utf-8') as out_f:
    print(results, file=out_f)