<a href="https://colab.research.google.com/github/YIKUAN8/Transformers-VQA/blob/master/openI_VQA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**In this notebook, we will classify 15 thoracic findings from Chest X-ray images and associated reports. This can be considered as an VQA task. We will fine-tune 3 pre-trained transformer based V+L models. After running through this notebook, you will be able to fine-tune these models on your customized dataset.**

####**0.1 clone our repo and install dependencies!**


In [1]:
!git clone https://github.com/YIKUAN8/Transformers-VQA.git
%cd Transformers-VQA/
!pip install -r requirements.txt


Cloning into 'Transformers-VQA'...
remote: Enumerating objects: 182, done.[K
remote: Counting objects: 100% (182/182), done.[K
remote: Compressing objects: 100% (101/101), done.[K
remote: Total 182 (delta 64), reused 180 (delta 63), pack-reused 0[K
Receiving objects: 100% (182/182), 1.94 MiB | 2.99 MiB/s, done.
Resolving deltas: 100% (64/64), done.
/workspaces/BERTHop/Transformers-VQA
Collecting tqdm
  Downloading tqdm-4.65.0-py3-none-any.whl (77 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.1/77.1 kB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting boto3
  Downloading boto3-1.26.109-py3-none-any.whl (135 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m135.6/135.6 kB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
Collecting botocore<1.30.0,>=1.29.109
  Downloading botocore-1.29.109-py3-none-any.whl (10.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.6/10.6 MB[0m [31m45.9 MB/s[0m eta [36m0:00:00[0m00:01

**Change the 79th line of param.py from**
```
args = parser.parse_args()
```
to
```
args = parser.parse_args([])
```
This will enable us to use *argparse* in jupyter notebook!



####**0.2 Download pre-trained models and place them to data/pretrained/, you could choose from [VisualBERT](https://github.com/uclanlp/visualbert), [LXMERT](https://github.com/airsplay/lxmert), [UNITER](https://github.com/ChenRocks/UNITER).**

In [2]:
#line 1: UNITER; line 2:LXMERT, line 3: VisualBERT. Comment out selected lines if you don't want to use this model
#if the pre-trained VisualBERT cannot be downloaded succesfully, rerun one more time or refer to this link: https://drive.google.com/file/d/1kuPr187zWxSJbtCbVW87XzInXltM-i9Y/view?usp=sharing
!wget https://convaisharables.blob.core.windows.net/uniter/pretrained/uniter-base.pt -P models/pretrained/
!wget --no-check-certificate https://nlp1.cs.unc.edu/data/model_LXRT.pth -P models/pretrained/
!wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1kuPr187zWxSJbtCbVW87XzInXltM-i9Y' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1kuPr187zWxSJbtCbVW87XzInXltM-i9Y" -O models/pretrained/visualbert.th && rm -rf /tmp/cookies.txt


--2020-09-06 04:03:41--  https://convaisharables.blob.core.windows.net/uniter/pretrained/uniter-base.pt
Resolving convaisharables.blob.core.windows.net (convaisharables.blob.core.windows.net)... 13.77.184.64
Connecting to convaisharables.blob.core.windows.net (convaisharables.blob.core.windows.net)|13.77.184.64|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 273600756 (261M) [application/octet-stream]
Saving to: ‘models/pretrained/uniter-base.pt’


2020-09-06 04:03:45 (81.4 MB/s) - ‘models/pretrained/uniter-base.pt’ saved [273600756/273600756]

--2020-09-06 04:03:45--  https://nlp1.cs.unc.edu/data/model_LXRT.pth
Resolving nlp1.cs.unc.edu (nlp1.cs.unc.edu)... 152.2.142.178
Connecting to nlp1.cs.unc.edu (nlp1.cs.unc.edu)|152.2.142.178|:443... connected.
	requested host name ‘nlp1.cs.unc.edu’.
HTTP request sent, awaiting response... 200 OK
Length: 912336661 (870M)
Saving to: ‘models/pretrained/model_LXRT.pth’


2020-09-06 04:04:10 (34.9 MB/s) - ‘models/pretrained

####**0.3 Download OpenI dataset.**

A detailed description of this dataset can be found at [here](https://openi.nlm.nih.gov/). In summary, there are 3684 CXR Image-Report pairs in this dataset. Each pair has an annotation of 15 throacic findings from MESH terms. We convert the raw data to a dataframe with better visibility. It can be accessed with the following command or this [link](https://drive.google.com/file/d/1i3wcfXJbH_4q3rS2rvLxtzbMiO-KuZCG/view?usp=sharing).

In [3]:
!wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1i3wcfXJbH_4q3rS2rvLxtzbMiO-KuZCG' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1i3wcfXJbH_4q3rS2rvLxtzbMiO-KuZCG" -O data/openIdf.csv && rm -rf /tmp/cookies.txt


--2020-09-06 04:04:28--  https://docs.google.com/uc?export=download&confirm=&id=1i3wcfXJbH_4q3rS2rvLxtzbMiO-KuZCG
Resolving docs.google.com (docs.google.com)... 74.125.195.139, 74.125.195.102, 74.125.195.100, ...
Connecting to docs.google.com (docs.google.com)|74.125.195.139|:443... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://doc-10-7c-docs.googleusercontent.com/docs/securesc/26hanb4b42dlebdr1jtffqlfbuactei1/0bsku8rqimbeld6qscr8te4hn4n1r54l/1599365025000/09550986323973647809/13314145366510543445Z/1i3wcfXJbH_4q3rS2rvLxtzbMiO-KuZCG?e=download [following]
--2020-09-06 04:04:28--  https://doc-10-7c-docs.googleusercontent.com/docs/securesc/26hanb4b42dlebdr1jtffqlfbuactei1/0bsku8rqimbeld6qscr8te4hn4n1r54l/1599365025000/09550986323973647809/13314145366510543445Z/1i3wcfXJbH_4q3rS2rvLxtzbMiO-KuZCG?e=download
Resolving doc-10-7c-docs.googleusercontent.com (doc-10-7c-docs.googleusercontent.com)... 74.125.142.132, 2607:f8b0:400e:c08::84
Connecting to 

***0.3.1 Have a glance of this dataframe, column 'TXT' is the radiology report; column 'split' and 'id' are self-explantory; All other columns are the 15 findings. Our task will be a 15-labels binary classification with visual and semantic input.***

In [4]:
import pandas as pd
openI = pd.read_csv('data/openIdf.csv',index_col=0)
openI.head()

Unnamed: 0,id,Atelectasis,Cardiomegaly,Effusion,Infiltration,Mass,Nodule,Pneumonia,Pneumothorax,Consolidation,Edema,Emphysema,Fibrosis,Pleural_Thickening,Hernia,Normal,split,TXT
0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,train,The cardiac silhouette and mediastinum size ar...
1021,2,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,train,Borderline cardiomegaly. Midline sternotomy XX...
2020,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,train,"No displaced rib fractures, pneumothorax, or ..."
3061,4,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,train,There are diffuse bilateral interstitial and a...
3169,5,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,train,The cardiomediastinal silhouette and pulmonary...


####**0.4 Download the visaul features extracted by BUTD. 36 2048-dimension visual feature is extracted from each CXR Image. We use this [implementation](https://github.com/airsplay/py-bottom-up-attention). This step will take a while (~1min). To save downloading time, you can also make a copy of this [shareable link](https://drive.google.com/file/d/1BFw0jc0j-ffT2PhI4CZeP3IJFZg3GxlZ/view?usp=sharing) to your own google drive and mount you colab to your gdrive.**


*If you are interested in the original CXR images, which is unnecessary to out project , you can access them [here](https://drive.google.com/drive/folders/1s5A0CFB6-2N5ThbuorUK1t-bUEKmZnjz?usp=sharing).*

In [5]:
!wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1BFw0jc0j-ffT2PhI4CZeP3IJFZg3GxlZ' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1BFw0jc0j-ffT2PhI4CZeP3IJFZg3GxlZ" -O data/openI_v_features.pickle && rm -rf /tmp/cookies.txt

--2020-09-06 04:04:48--  https://docs.google.com/uc?export=download&confirm=6hrU&id=1BFw0jc0j-ffT2PhI4CZeP3IJFZg3GxlZ
Resolving docs.google.com (docs.google.com)... 74.125.142.113, 74.125.142.101, 74.125.142.138, ...
Connecting to docs.google.com (docs.google.com)|74.125.142.113|:443... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://doc-0s-04-docs.googleusercontent.com/docs/securesc/iqb046dipnie0n4nop6f1jphpu9ve873/90glq5kpqavjkrg7pq6sj14s5s6d5676/1599365025000/09550986323973647809/13457397186708642344Z/1BFw0jc0j-ffT2PhI4CZeP3IJFZg3GxlZ?e=download [following]
--2020-09-06 04:04:49--  https://doc-0s-04-docs.googleusercontent.com/docs/securesc/iqb046dipnie0n4nop6f1jphpu9ve873/90glq5kpqavjkrg7pq6sj14s5s6d5676/1599365025000/09550986323973647809/13457397186708642344Z/1BFw0jc0j-ffT2PhI4CZeP3IJFZg3GxlZ?e=download
Resolving doc-0s-04-docs.googleusercontent.com (doc-0s-04-docs.googleusercontent.com)... 74.125.142.132, 2607:f8b0:400e:c08::84
Connecting

***0.4.1 Load visual features***

In [6]:
import pickle
openI_v_f = pickle.load( open( "/content/Transformers-VQA/data/openI_v_features.pickle", "rb" ) )

In [7]:
assert set(list(openI_v_f.keys())) == set(openI.id.values), "Visual Features are inconsistent with openI dataset"

In [8]:
feature_example, bbox_example, (img_w_example, img_h_example) = openI_v_f[openI.id.iloc[0]]

In [9]:
feature_example.shape, bbox_example.shape, (img_w_example, img_h_example)

((36, 4), (36, 2048), (420, 512))

####**Now We have download all data, models, and dependencies. We are good to go!!!**
**1. Change default arguments**

First, let's check it out!

In [10]:
from param import args

In [11]:
args.__dict__

{'batch_size': 32,
 'dropout': 0.1,
 'epochs': 2,
 'fast': False,
 'from_scratch': False,
 'load_pretrained': None,
 'load_trained': None,
 'lr': 0.0001,
 'max_seq_length': 20,
 'mce_loss': False,
 'model': 'lxmert',
 'multiGPU': False,
 'num_workers': 0,
 'optim': 'bert',
 'optimizer': 'bert',
 'output': 'models/trained/',
 'seed': 9595,
 'test': None,
 'tiny': False,
 'tqdm': True,
 'train': 'train,nominival',
 'valid': 'minival'}

***1.1*** Let's overwrite some arguments***

In [39]:
args.batch_size = 18
args.epochs = 2
args.model = 'visualbert' # use visualbert
args.load_pretrained = '/content/Transformers-VQA/models/pretrained/visualbert.th' #load pretrained visualbert model
args.max_seq_length = 128 #truncate or pad report lengths to 128 subwords

####**2. Create customized dataloader**

In [13]:
findings = list(openI.columns[1:-2])
findings

['Atelectasis',
 'Cardiomegaly',
 'Effusion',
 'Infiltration',
 'Mass',
 'Nodule',
 'Pneumonia',
 'Pneumothorax',
 'Consolidation',
 'Edema',
 'Emphysema',
 'Fibrosis',
 'Pleural_Thickening',
 'Hernia',
 'Normal']

In [14]:
from torch.utils.data import Dataset
from torch.utils.data.dataloader import DataLoader
import numpy as np
class OpenIDataset(Dataset):
  def __init__(self, df, vf, split, model = 'lxmert'):
    # train_test_split and prepare labels
    self.dataset = df[df['split'] == split]
    self.visual_features = vf
    self.id_list = self.dataset.id.tolist()
    self.report_list = self.dataset.TXT.tolist()
    self.findings_list = self.dataset.columns[1:-2]
    self.target_list = self.dataset[self.findings_list].to_numpy().astype(np.float32)
    self.model = model

  def __len__(self):
    return len(self.id_list)

  def __getitem__(self, item):
    cxr_id = self.id_list[item]
    target = self.target_list[item]
    boxes, feats, (img_w, img_h) = self.visual_features[cxr_id]
    report = self.report_list[item]
    if self.model == 'uniter':
      boxes = self._uniterBoxes(boxes)
    if self.model == 'lxmert':
      boxes[:, (0, 2)] /= img_w
      boxes[:, (1, 3)] /= img_h
    return cxr_id, feats, boxes, report, target

  def _uniterBoxes(self, boxes):#uniter requires a 7-dimensiom beside the regular 4-d bbox
    new_boxes = np.zeros((boxes.shape[0],7),dtype='float32')
    new_boxes = np.zeros((boxes.shape[0],7),dtype='float32')
    new_boxes[:,1] = boxes[:,0]
    new_boxes[:,0] = boxes[:,1]
    new_boxes[:,3] = boxes[:,2]
    new_boxes[:,2] = boxes[:,3]
    new_boxes[:,4] = new_boxes[:,3]-new_boxes[:,1] #w
    new_boxes[:,5] = new_boxes[:,2]-new_boxes[:,0] #h
    new_boxes[:,6]=new_boxes[:,4]*new_boxes[:,5] #area
    return new_boxes  

In [15]:
training = OpenIDataset(df = openI, vf = openI_v_f,  split='train', model = args.model)
testing = OpenIDataset(df = openI, vf = openI_v_f,  split='test', model = args.model)

In [29]:
train_loader = DataLoader(training, batch_size=args.batch_size,shuffle=True, num_workers=0,drop_last=True, pin_memory=True)
test_loader = DataLoader(testing, batch_size=128,shuffle=False, num_workers=0,drop_last=False, pin_memory=True)

####**3. Model, Optimizer, Loss Function, and Evaluation Function**

In [50]:
from vqa_model import VQAModel
#init model
model = VQAModel(num_answers = len(findings), model = args.model)

In [51]:
#load pretrained weights
model.encoder.load(args.load_pretrained)

Load VISUALBERT PreTrained Model from /content/Transformers-VQA/models/pretrained/visualbert.th

Weights in loaded but not in model:
cls.predictions.bias
cls.predictions.decoder.weight
cls.predictions.transform.LayerNorm.bias
cls.predictions.transform.LayerNorm.weight
cls.predictions.transform.dense.bias
cls.predictions.transform.dense.weight
cls.seq_relationship.bias
cls.seq_relationship.weight

Weights in model but not in loaded:



In [52]:
#send to GPU
model = model.cuda()


In [53]:
import torch
loss = torch.nn.BCEWithLogitsLoss()

In [54]:
from src.optimization import BertAdam
optim = BertAdam(list(model.parameters()),lr=args.lr,warmup=0.1,t_total=len(train_loader)*args.epochs)
# t_total denotes total training steps
# batch_per_epoch = len(train_loader)
# t_total = int(batch_per_epoch * args.epochs)

In [55]:
# Evaluation function, we will report the AUC and accuray of each finding
def eval(target, pred):
    acc_list = []
    for i, d in enumerate(findings[:-1]): #normal is excluded
        acc = np.mean(target[:,i] == (pred[:,i]>=0.5))
        print(i,d,acc)
        acc_list.append(acc)
    print('Averaged: '+str(np.average(acc_list)))

In [56]:
sgmd = torch.nn.Sigmoid()

####**4. HIT and RUN**

In [57]:
from tqdm.notebook import tqdm

iter_wrapper = (lambda x: tqdm(x, total=len(train_loader))) if args.tqdm else (lambda x: x)
best_valid = 0
for epoch in range(args.epochs):
  epoch_loss = 0
  for i, (cxr_id, feats, boxes, report, target) in iter_wrapper(enumerate(train_loader)):
    model.train()
    optim.zero_grad()
    feats, boxes, target = feats.cuda(), boxes.cuda(), target.cuda()
    logit = model(feats, boxes, report)
    running_loss = loss(logit, target)
    running_loss = running_loss * logit.size(1)
    epoch_loss += running_loss
    running_loss.backward()
    torch.nn.utils.clip_grad_norm_(model.parameters(), 5.)
    optim.step()
  print("Epoch "+str(epoch)+": Training Loss: "+str(epoch_loss/len(train_loader)))
  print('Evaluation: ')
  model.eval()
  logit_list, target_list = [], []
  iter_wrapper = (lambda x: tqdm(x, total=len(test_loader)))
  for i, (cxr_id, feats, boxes, report, target) in iter_wrapper(enumerate(test_loader)):
    target_list.append(target)
    with torch.no_grad():
      feats, boxes = feats.cuda(), boxes.cuda()
      logit = model(feats, boxes, report)
      logit_list.append(sgmd(logit).cpu().numpy())

  eval(np.concatenate(target_list,axis = 0), np.concatenate(logit_list,axis = 0))

HBox(children=(FloatProgress(value=0.0, max=161.0), HTML(value='')))


Epoch 0: Training Loss: tensor(2.2392, device='cuda:0', grad_fn=<DivBackward0>)
Evaluation: 


HBox(children=(FloatProgress(value=0.0, max=7.0), HTML(value='')))


0 Atelectasis 0.8989637305699482
1 Cardiomegaly 0.8963730569948186
2 Effusion 0.9455958549222798
3 Infiltration 0.9715025906735751
4 Mass 0.9961139896373057
5 Nodule 0.966321243523316
6 Pneumonia 0.9857512953367875
7 Pneumothorax 0.9935233160621761
8 Consolidation 0.9857512953367875
9 Edema 0.9805699481865285
10 Emphysema 0.9676165803108808
11 Fibrosis 0.9935233160621761
12 Pleural_Thickening 0.9792746113989638
13 Hernia 1.0
Averaged: 0.9686343449296818


HBox(children=(FloatProgress(value=0.0, max=7.0), HTML(value='')))


Epoch 1: Training Loss: tensor(1.8431, device='cuda:0', grad_fn=<DivBackward0>)
Evaluation: 


HBox(children=(FloatProgress(value=0.0, max=7.0), HTML(value='')))


0 Atelectasis 0.8989637305699482
1 Cardiomegaly 0.8963730569948186
2 Effusion 0.9455958549222798
3 Infiltration 0.9715025906735751
4 Mass 0.9961139896373057
5 Nodule 0.966321243523316
6 Pneumonia 0.9857512953367875
7 Pneumothorax 0.9935233160621761
8 Consolidation 0.9857512953367875
9 Edema 0.9805699481865285
10 Emphysema 0.9676165803108808
11 Fibrosis 0.9935233160621761
12 Pleural_Thickening 0.9792746113989638
13 Hernia 1.0
Averaged: 0.9686343449296818
