Overview:

Each of the 143 features are assigned a unique class and a 144th feature is added to represent no class. Each word will be classified as whether or not it has relation to the target class based on its existance in the annotations. Patient notes are encoded using the BERT base-uncased tokenizer (integer tokens representing dictionary index). tokens are fed, one at a time, into a BERT model producing a 768 hidden weight vector for each word in a patient note. The entire note represents one batch. The weight vector is fed into a fully connected linear layer which ouputs a vector with length equivalent to the number of classes for each word. This is then run through a softmax output layer and cross entropy loss is computed between the softmax wight vector and the integer class number (index in softmax).

Cleaning:
1. Rows of the training dataframe with blank notes are removed.
2. '[]' encases each note, these are removed.
3. Notes split into lists on ' '.


Dataflow:
1. Raw patient notes --> cleaned patient notes
2. Cleaned patient notes --> BERT tokenized strings
3. BERT tokenized strings --> list of BERT tokenized words (split on ' ')
4. List of BERT tokenized words --BERT-model--> 768 dimension hidden vector
5. 768 dimension hidden vector --linear-layer--> {CLASS_NUM} dimension vector
6. {CLASS_NUM} dimension vector --softmax--> weighted vector

In [None]:
from transformers import BertTokenizer
import numpy as np
import pandas as pd
import torch
from torch import nn
import matplotlib.pyplot as plt

# Custom imports
import bert_nbme

In [None]:
# Define globals
CONFIG = 'bert-base-uncased'
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'DEVICE: {DEVICE}')

# Import data
notes_df = pd.read_csv('data/patient_notes.csv')
train_df = pd.read_csv('data/train.csv')
features_df = pd.read_csv('data/features.csv')

In [None]:
none_row = pd.DataFrame({'feature_num': [-1], 'case_num': [-1], 'feature_text': ['NONE']}, index=[len(features_df)])
features_df = pd.concat((features_df, none_row))  # Add NONE value as a feature
features_df['feature_index'] = range(len(features_df))

# APPEND AND CLEAN DATA
data = train_df[train_df['annotation'] != '[]']  # Drop blank annotations ('[]')
data['annotation'] = [i.translate(i.maketrans('', '', '[]\'')).split(' ') for i in data['annotation']]
data = data.merge(features_df[['feature_num', 'feature_text', 'feature_index']], on='feature_num')  # Add features
data = data.merge(notes_df[['pn_num', 'pn_history']], on='pn_num')  # Add notes
# seps = [' ', ',', ';', ':', '.', '!', '?', '-', '_', '\n']  # WORRY ABOUT THIS LATER
word_lists = data['pn_history'].apply(lambda x: np.array(x.split(' '))).to_numpy()  # Convert notes to lists of words
data = data.dropna().reset_index(drop=True)  # Drop and reindex any leftover trouble-makers

In [None]:
none_ind = len(features_df) - 1  # Vector value for NONE
y = []
for i, note in enumerate(word_lists):
    word_labels = [none_ind]  # Pad first with NONE bc start token [CLS] added
    for word in note:
        if word in data['annotation'].iloc[i]:
            word_labels.append(data['feature_index'].iloc[i])
        else:
            word_labels.append(none_ind)
    word_labels.append(none_ind)  # Pad last with NONE bc end token [SEP] added
    y.append(torch.cuda.LongTensor(word_labels))

In [None]:
# Tokenize word lists
tokenizer = BertTokenizer.from_pretrained(CONFIG)
encoded_word_lists = [tokenizer.encode(x.tolist()) for x in word_lists]

# Cast features to tensors
X = [torch.cuda.IntTensor(np.array(x).reshape(1, -1)) for x in encoded_word_lists]

In [None]:
print(len(y))

In [None]:
# Model params
LEARNING_RATE = 10
EPOCHS = 10

# Loss function, model, and optimizer
criterion = nn.CrossEntropyLoss()
model = bert_nbme.BertNN(num_classes=len(features_df), bert_config=CONFIG).to(DEVICE)
optimizer = torch.optim.Adam(lr=LEARNING_RATE, params=model.parameters())

# Loss history over epochs for plotting
train_loss_history = []
val_loss_history = []

for epoch in range(EPOCHS):
    print(f'EPOCH: {epoch}')
    
    # Initialize single-epoch loss
    epoch_train_loss = []
    epoch_val_loss = []
    
    for note, target in zip(X, y):
        # Zero out gradient every batch
        model.zero_grad()
        
        # Make predictions
        pred = model(note)
        
        # Calculate loss
        loss = criterion(pred, target)
        
        # Take train step
        loss.backward()
        optimizer.step()
        
        # Compile loss
        epoch_train_loss.append(loss.item())
    
    # Append average loss over epoch to history
    train_loss_history.append(sum(epoch_train_loss) / len(epoch_train_loss))
    print(f'LOSS: {train_loss_history[-1]}')

In [None]:
# Plot loss over epochs
plt.figure()
plt.title('Cross Entropy Loss')
plt.plot(range(EPOCHS), train_loss_history, label='Train loss', color='r', lw=3)
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.grid()
plt.legend()
plt.show()

Batch = 1 note
Sample = 1 word

each word needs its own vector (MOST WILL BE NONE)