## Import Libraries

It is worth checking the HuggingFace Transformers course:

https://huggingface.co/course

In [None]:
!pip install transformers
import pandas as pd
import numpy as np
import tensorflow as tf
import torch
from torch.nn import BCEWithLogitsLoss, BCELoss
from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler
from tensorflow.keras.preprocessing.sequence import pad_sequences
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.metrics import classification_report, confusion_matrix, multilabel_confusion_matrix, f1_score, accuracy_score
import pickle
import transformers
from tqdm import tqdm, trange
from ast import literal_eval

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

Found GPU at: /device:GPU:0


In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
n_gpu = torch.cuda.device_count()
torch.cuda.get_device_name(0)

'Tesla T4'

## Load and Preprocess Training Data

Dataset will be tokenized then split into training and validation sets. The validation set will be used to monitor training. For testing a separate test set will be loaded for analysis.

In [None]:
train_set = "/content/drive/MyDrive/NLP_Applications_1/DATA/2023-ILTAPP-20230203T201734Z-001/2023-ILTAPP/datasets/absa2016/en-train-acd-multilabel-transformers.csv"

In [None]:
# the file might have to be called train.csv
df = pd.read_csv(train_set)
df.info()
df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1708 entries, 0 to 1707
Data columns (total 14 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   id                        1708 non-null   int64 
 1   comment_text              1708 non-null   object
 2   AMBIENCE#GENERAL          1708 non-null   int64 
 3   DRINKS#PRICES             1708 non-null   int64 
 4   DRINKS#QUALITY            1708 non-null   int64 
 5   DRINKS#STYLE_OPTIONS      1708 non-null   int64 
 6   FOOD#PRICES               1708 non-null   int64 
 7   FOOD#QUALITY              1708 non-null   int64 
 8   FOOD#STYLE_OPTIONS        1708 non-null   int64 
 9   LOCATION#GENERAL          1708 non-null   int64 
 10  RESTAURANT#GENERAL        1708 non-null   int64 
 11  RESTAURANT#MISCELLANEOUS  1708 non-null   int64 
 12  RESTAURANT#PRICES         1708 non-null   int64 
 13  SERVICE#GENERAL           1708 non-null   int64 
dtypes: int64(13), object(1)


Unnamed: 0,id,comment_text,AMBIENCE#GENERAL,DRINKS#PRICES,DRINKS#QUALITY,DRINKS#STYLE_OPTIONS,FOOD#PRICES,FOOD#QUALITY,FOOD#STYLE_OPTIONS,LOCATION#GENERAL,RESTAURANT#GENERAL,RESTAURANT#MISCELLANEOUS,RESTAURANT#PRICES,SERVICE#GENERAL
0,2202,Judging from previous posts this used to be a ...,0,0,0,0,0,0,0,0,1,0,0,0
1,9326,"We, there were four of us, arrived at noon - t...",0,0,0,0,0,0,0,0,0,0,0,1
2,1034,"They never brought us complimentary noodles, i...",0,0,0,0,0,0,0,0,0,0,0,1
3,4180,The food was lousy - too sweet or too salty an...,0,0,0,0,0,1,1,0,0,0,0,0
4,1932,"After all that, they complained to me about th...",0,0,0,0,0,0,0,0,0,0,0,1


In [None]:
print('Unique comments: ', df.comment_text.nunique() == df.shape[0])
print('Null values: ', df.isnull().values.any())
# df[df.isna().any(axis=1)]

Unique comments:  False
Null values:  False


In [None]:
print('average sentence length: ', df.comment_text.str.split().str.len().mean())
print('stdev sentence length: ', df.comment_text.str.split().str.len().std())

average sentence length:  12.507611241217798
stdev sentence length:  8.285011666209952


In [None]:
cols = df.columns
label_cols = list(cols[2:])
num_labels = len(label_cols)
print('Label columns: ', label_cols)

Label columns:  ['AMBIENCE#GENERAL', 'DRINKS#PRICES', 'DRINKS#QUALITY', 'DRINKS#STYLE_OPTIONS', 'FOOD#PRICES', 'FOOD#QUALITY', 'FOOD#STYLE_OPTIONS', 'LOCATION#GENERAL', 'RESTAURANT#GENERAL', 'RESTAURANT#MISCELLANEOUS', 'RESTAURANT#PRICES', 'SERVICE#GENERAL']


In [None]:
print('Count of 1 per label: \n', df[label_cols].sum(), '\n') # Label counts, may need to downsample or upsample
print('Count of 0 per label: \n', df[label_cols].eq(0).sum())

Count of 1 per label: 
 AMBIENCE#GENERAL            226
DRINKS#PRICES                20
DRINKS#QUALITY               46
DRINKS#STYLE_OPTIONS         30
FOOD#PRICES                  82
FOOD#QUALITY                681
FOOD#STYLE_OPTIONS          128
LOCATION#GENERAL             28
RESTAURANT#GENERAL          421
RESTAURANT#MISCELLANEOUS     97
RESTAURANT#PRICES            80
SERVICE#GENERAL             419
dtype: int64 

Count of 0 per label: 
 AMBIENCE#GENERAL            1482
DRINKS#PRICES               1688
DRINKS#QUALITY              1662
DRINKS#STYLE_OPTIONS        1678
FOOD#PRICES                 1626
FOOD#QUALITY                1027
FOOD#STYLE_OPTIONS          1580
LOCATION#GENERAL            1680
RESTAURANT#GENERAL          1287
RESTAURANT#MISCELLANEOUS    1611
RESTAURANT#PRICES           1628
SERVICE#GENERAL             1289
dtype: int64


In [None]:
df = df.sample(frac=1).reset_index(drop=True) #shuffle rows

## ASSIGNMENT 1 

+ TODO: Generate an extra column in the pandas dataframe containing:
++ one_hot_labels as header.
++ the list of aspect values extracted from each aspect column.

The dataframe obtained should be as follows:

In [None]:

one_hot_labels = []

for index, row in df[label_cols].iterrows():
    one_hot = [i  for i in row]
    one_hot_labels.append(one_hot)

df["one_hot_labels"]=one_hot_labels
display(df)

Unnamed: 0,id,comment_text,AMBIENCE#GENERAL,DRINKS#PRICES,DRINKS#QUALITY,DRINKS#STYLE_OPTIONS,FOOD#PRICES,FOOD#QUALITY,FOOD#STYLE_OPTIONS,LOCATION#GENERAL,RESTAURANT#GENERAL,RESTAURANT#MISCELLANEOUS,RESTAURANT#PRICES,SERVICE#GENERAL,one_hot_labels
0,4549,I highly recommend Caviar Russe to anyone who ...,0,0,0,0,0,1,0,0,0,0,0,1,"[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1]"
1,4477,"I recommend the jelly fish, drunken chicken an...",0,0,0,0,0,1,0,0,0,0,0,0,"[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]"
2,4062,We were charged full price.,0,0,0,0,0,0,0,0,0,0,0,1,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]"
3,9977,I'm partial to the Gnocchi.,0,0,0,0,0,0,0,0,1,0,0,0,"[0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]"
4,5549,I have lived in Japan for 7 years and the tast...,1,0,0,0,0,1,0,0,0,0,0,0,"[1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1703,6509,We also asked for Hooka six times and the wait...,0,0,0,0,0,0,0,0,0,0,0,1,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]"
1704,6120,"The food is flavorful, plentiful and reasonabl...",0,0,0,0,1,1,1,0,0,0,0,0,"[0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0]"
1705,6761,"the pad se ew chicken was delicious, however t...",0,0,0,0,0,1,0,0,0,0,0,0,"[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]"
1706,8124,Service- friendly and attentive.,0,0,0,0,0,0,0,0,0,0,0,1,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]"


In [None]:
labels = list(df.one_hot_labels.values)
comments = list(df.comment_text.values)

Load the pretrained tokenizer that corresponds to your choice in model. e.g.,

```
BERT:
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True) 


RoBERTa:
tokenizer = RobertaTokenizer.from_pretrained('roberta-base', do_lower_case=False)
```


NOTE: In order to avoid memory issues with Google Colab, I enforce a max_length of 100 tokens. Note that some sentences may not adequately represent each label because of this.

## ASSIGNMENT 2 

+ TODO: Instantiate the tokenizer from "bert-base-uncased" model in lowercase mode. HINT: Check huggingface course on tokenizers.
+ TODO: Investigate how defining different max_lengths affect performance on the test set evaluation. You may try values of 64, 128 (in addition to 100).


In [None]:
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True) 

In [None]:
# TODO instantiate tokenizer
max_length = 64
encodings = tokenizer.batch_encode_plus(comments, truncation=True, max_length=max_length, pad_to_max_length=True) # tokenizer's encoding method
print('tokenizer outputs: ', encodings.keys())



tokenizer outputs:  dict_keys(['input_ids', 'token_type_ids', 'attention_mask'])


In [None]:
input_ids = encodings['input_ids'] # tokenized and encoded sentences
token_type_ids = encodings['token_type_ids'] # token type ids
attention_masks = encodings['attention_mask'] # attention masks

In [None]:
# Identifying indices of 'one_hot_labels' entries that only occur once - this will allow us to stratify split our training data later
label_counts = df.one_hot_labels.astype(str).value_counts()
one_freq = label_counts[label_counts==1].keys()
one_freq_idxs = sorted(list(df[df.one_hot_labels.astype(str).isin(one_freq)].index), reverse=True)
print('df label indices with only one instance: ', one_freq_idxs)

df label indices with only one instance:  [1662, 1617, 1607, 1518, 1479, 1435, 1396, 1362, 1351, 1310, 1284, 1243, 1091, 1084, 1083, 1045, 1037, 1024, 1021, 1015, 980, 967, 953, 781, 713, 711, 680, 640, 590, 519, 514, 505, 497, 482, 472, 423, 397, 388, 379, 325, 307, 265, 249, 203, 65]


In [None]:
# Gathering single instance inputs to force into the training set after stratified split
one_freq_input_ids = [input_ids.pop(i) for i in one_freq_idxs]
one_freq_token_types = [token_type_ids.pop(i) for i in one_freq_idxs]
one_freq_attention_masks = [attention_masks.pop(i) for i in one_freq_idxs]
one_freq_labels = [labels.pop(i) for i in one_freq_idxs]

Be sure to handle all classes during validation using "stratify" during train/validation split:

In [None]:
# Use train_test_split to split our data into train and validation sets

train_inputs, validation_inputs, train_labels, validation_labels, train_token_types, validation_token_types, train_masks, validation_masks = train_test_split(input_ids, labels, token_type_ids,attention_masks,
                                                            random_state=2020, test_size=0.10, stratify = labels)

# Add one frequency data to train data
train_inputs.extend(one_freq_input_ids)
train_labels.extend(one_freq_labels)
train_masks.extend(one_freq_attention_masks)
train_token_types.extend(one_freq_token_types)

# Convert all of our data into torch tensors, the required datatype for our model
train_inputs = torch.tensor(train_inputs)
train_labels = torch.tensor(train_labels)
train_masks = torch.tensor(train_masks)
train_token_types = torch.tensor(train_token_types)

validation_inputs = torch.tensor(validation_inputs)
validation_labels = torch.tensor(validation_labels)
validation_masks = torch.tensor(validation_masks)
validation_token_types = torch.tensor(validation_token_types)

In [None]:
# Select a batch size for training.
batch_size = 16

# Create an iterator of our data with torch DataLoader. This helps save on memory during training because, unlike a for loop, 
# with an iterator the entire dataset does not need to be loaded into memory

train_data = TensorDataset(train_inputs, train_masks, train_labels, train_token_types)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)

validation_data = TensorDataset(validation_inputs, validation_masks, validation_labels, validation_token_types)
validation_sampler = SequentialSampler(validation_data)
validation_dataloader = DataLoader(validation_data, sampler=validation_sampler, batch_size=batch_size)

In [None]:
torch.save(validation_dataloader,'validation_data_loader')
torch.save(train_dataloader,'train_data_loader')

## Load Model & Set Params

Load the appropriate model below, each model already contains a single dense layer for classification on top.



```
BERT:
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=num_labels)

RoBERTa:
model = RobertaForSequenceClassification.from_pretrained('roberta-base', num_labels=num_labels)
```



## ASSIGNMENT 3

+ TODO: load the model for SequenceClassification corresponding to the tokenizer instantiated above.

In [None]:
from transformers import BertForSequenceClassification

In [None]:
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=num_labels)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

In [None]:

model.cuda()

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, element

Setting custom optimization parameters for the AdamW optimizer https://huggingface.co/transformers/main_classes/optimizer_schedules.html

In [None]:
# setting custom optimization parameters. You may implement a scheduler here as well.
param_optimizer = list(model.named_parameters())
no_decay = ['bias', 'gamma', 'beta']
optimizer_grouped_parameters = [
    {'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)],
     'weight_decay_rate': 0.01},
    {'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)],
     'weight_decay_rate': 0.0}
]

In [None]:
pip install tensorflow_addons

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
import tensorflow as tf
import tensorflow_addons as tfa

#optimizer = tfa.optimizers.AdamW(optimizer_grouped_parameters,lr=2e-5)
#optimizer = tfa.optimizers.AdamW(model.parameters(),lr=2e-5)  # Default optimization
optimizer = torch.optim.AdamW(optimizer_grouped_parameters,lr=2e-5)

## Train Model

In [None]:
# Store our loss and accuracy for plotting
train_loss_set = []

# Number of training epochs (authors recommend between 2 and 4)
epochs = 4

# trange is a tqdm wrapper around the normal python range
for _ in trange(epochs, desc="Epoch"):

  # Training
  
  # Set our model to training mode (as opposed to evaluation mode)
  model.train()

  # Tracking variables
  tr_loss = 0 #running loss
  nb_tr_examples, nb_tr_steps = 0, 0
  
  # Train the data for one epoch
  for step, batch in enumerate(train_dataloader):
    # Add batch to GPU
    batch = tuple(t.to(device) for t in batch)
    # Unpack the inputs from our dataloader
    b_input_ids, b_input_mask, b_labels, b_token_types = batch
    # Clear out the gradients (by default they accumulate)
    optimizer.zero_grad()

    # # Forward pass for multiclass classification
    # outputs = model(b_input_ids, token_type_ids=None, attention_mask=b_input_mask, labels=b_labels)
    # loss = outputs[0]
    # logits = outputs[1]

    # Forward pass for multilabel classification
    outputs = model(b_input_ids, token_type_ids=None, attention_mask=b_input_mask)
    logits = outputs[0]
    loss_func = BCEWithLogitsLoss() 
    loss = loss_func(logits.view(-1,num_labels),b_labels.type_as(logits).view(-1,num_labels)) #convert labels to float for calculation
    # loss_func = BCELoss() 
    # loss = loss_func(torch.sigmoid(logits.view(-1,num_labels)),b_labels.type_as(logits).view(-1,num_labels)) #convert labels to float for calculation
    train_loss_set.append(loss.item())    

    # Backward pass
    loss.backward()
    # Update parameters and take a step using the computed gradient
    optimizer.step()
    # scheduler.step()
    # Update tracking variables
    tr_loss += loss.item()
    nb_tr_examples += b_input_ids.size(0)
    nb_tr_steps += 1

  print("Train loss: {}".format(tr_loss/nb_tr_steps))

###############################################################################

  # Validation

  # Put model in evaluation mode to evaluate loss on the validation set
  model.eval()

  # Variables to gather full output
  logit_preds,true_labels,pred_labels,tokenized_texts = [],[],[],[]

  # Predict
  for i, batch in enumerate(validation_dataloader):
    batch = tuple(t.to(device) for t in batch)
    # Unpack the inputs from our dataloader
    b_input_ids, b_input_mask, b_labels, b_token_types = batch
    with torch.no_grad():
      # Forward pass
      outs = model(b_input_ids, token_type_ids=None, attention_mask=b_input_mask)
      b_logit_pred = outs[0]
      pred_label = torch.sigmoid(b_logit_pred)

      b_logit_pred = b_logit_pred.detach().cpu().numpy()
      pred_label = pred_label.to('cpu').numpy()
      b_labels = b_labels.to('cpu').numpy()

    tokenized_texts.append(b_input_ids)
    logit_preds.append(b_logit_pred)
    true_labels.append(b_labels)
    pred_labels.append(pred_label)

  # Flatten outputs
  pred_labels = [item for sublist in pred_labels for item in sublist]
  true_labels = [item for sublist in true_labels for item in sublist]

  # Calculate Accuracy
  threshold = 0.50
  pred_bools = [pl>threshold for pl in pred_labels]
  true_bools = [tl==1 for tl in true_labels]
  val_f1_accuracy = f1_score(true_bools,pred_bools,average='micro')*100
  val_flat_accuracy = accuracy_score(true_bools, pred_bools)*100

  print('F1 Validation Accuracy: ', val_f1_accuracy)
  print('F1 Macro Validation Accuracy: ', val_flat_accuracy)

Epoch:   0%|          | 0/4 [00:00<?, ?it/s]

Train loss: 0.3854907732648948


Epoch:  25%|██▌       | 1/4 [00:18<00:54, 18.05s/it]

F1 Validation Accuracy:  38.970588235294116
F1 Macro Validation Accuracy:  19.760479041916167
Train loss: 0.24651608147571996


Epoch:  50%|█████     | 2/4 [00:36<00:36, 18.13s/it]

F1 Validation Accuracy:  66.26506024096386
F1 Macro Validation Accuracy:  47.30538922155689
Train loss: 0.19066064735663305


Epoch:  75%|███████▌  | 3/4 [00:54<00:18, 18.25s/it]

F1 Validation Accuracy:  73.6842105263158
F1 Macro Validation Accuracy:  59.88023952095808
Train loss: 0.15144982680524746


Epoch: 100%|██████████| 4/4 [01:13<00:00, 18.32s/it]

F1 Validation Accuracy:  75.93582887700533
F1 Macro Validation Accuracy:  61.07784431137725





In [None]:
torch.save(model.state_dict(), '/content/drive/MyDrive/NLP_Applications_1/DATA/2023-ILTAPP-20230203T201734Z-001/2023-ILTAPP/resources//bert-multilable-acd-en')

## Load and Preprocess Test Data

In [None]:
test_set = "/content/drive/MyDrive/NLP_Applications_1/DATA/2023-ILTAPP-20230203T201734Z-001/2023-ILTAPP/datasets/absa2016/en-test-acd-multilabel-transformers.csv"

In [None]:
test_df = pd.read_csv(test_set)
test_df.head()

Unnamed: 0,id,comment_text,AMBIENCE#GENERAL,DRINKS#PRICES,DRINKS#QUALITY,DRINKS#STYLE_OPTIONS,FOOD#PRICES,FOOD#QUALITY,FOOD#STYLE_OPTIONS,LOCATION#GENERAL,RESTAURANT#GENERAL,RESTAURANT#MISCELLANEOUS,RESTAURANT#PRICES,SERVICE#GENERAL
0,12201,Yum!,0,0,0,0,0,1,0,0,0,0,0,0
1,19325,Serves really good sushi.,0,0,0,0,0,1,0,0,0,0,0,0
2,11033,Not the biggest portions but adequate.,0,0,0,0,0,0,1,0,0,0,0,0
3,14179,Green Tea creme brulee is a must!,0,0,0,0,0,1,0,0,0,0,0,0
4,11931,Don't leave the restaurant without it.,0,0,0,0,0,1,0,0,0,0,0,0


## ASSIGNMENT 4

+ TODO add one_hot_labels column to test data as for ASSIGNMENT 1.

In [None]:
cols_test = test_df.columns
label_cols_test = list(cols_test[2:])
num_labels_test = len(label_cols_test)

one_hot_en =[]
for index, row in test_df[label_cols_test].iterrows():
    one_hot = [i for i in row]
    one_hot_en.append(one_hot)

test_df["one_hot_labels"]=one_hot_en   
test_df.head()

Unnamed: 0,id,comment_text,AMBIENCE#GENERAL,DRINKS#PRICES,DRINKS#QUALITY,DRINKS#STYLE_OPTIONS,FOOD#PRICES,FOOD#QUALITY,FOOD#STYLE_OPTIONS,LOCATION#GENERAL,RESTAURANT#GENERAL,RESTAURANT#MISCELLANEOUS,RESTAURANT#PRICES,SERVICE#GENERAL,one_hot_labels
0,12201,Yum!,0,0,0,0,0,1,0,0,0,0,0,0,"[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]"
1,19325,Serves really good sushi.,0,0,0,0,0,1,0,0,0,0,0,0,"[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]"
2,11033,Not the biggest portions but adequate.,0,0,0,0,0,0,1,0,0,0,0,0,"[0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0]"
3,14179,Green Tea creme brulee is a must!,0,0,0,0,0,1,0,0,0,0,0,0,"[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]"
4,11931,Don't leave the restaurant without it.,0,0,0,0,0,1,0,0,0,0,0,0,"[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]"


In [None]:
# Gathering input data
test_labels = list(test_df.one_hot_labels.values)
test_comments = list(test_df.comment_text.values)

In [None]:
# Encoding input data
test_encodings = tokenizer.batch_encode_plus(test_comments,truncation=True, max_length=max_length,pad_to_max_length=True)
test_input_ids = test_encodings['input_ids']
test_token_type_ids = test_encodings['token_type_ids']
test_attention_masks = test_encodings['attention_mask']



In [None]:
# Make tensors out of data
test_inputs = torch.tensor(test_input_ids)
test_labels = torch.tensor(test_labels)
test_masks = torch.tensor(test_attention_masks)
test_token_types = torch.tensor(test_token_type_ids)
# Create test dataloader
test_data = TensorDataset(test_inputs, test_masks, test_labels, test_token_types)
test_sampler = SequentialSampler(test_data)
test_dataloader = DataLoader(test_data, sampler=test_sampler, batch_size=batch_size)
# Save test dataloader
torch.save(test_dataloader,'test_data_loader')

## Prediction and Evaluation

In [None]:
# Test

# Put model in evaluation mode to evaluate loss on the validation set
model.eval()

#track variables
logit_preds,true_labels,pred_labels,tokenized_texts = [],[],[],[]

# Predict
for i, batch in enumerate(test_dataloader):
  batch = tuple(t.to(device) for t in batch)
  # Unpack the inputs from our dataloader
  b_input_ids, b_input_mask, b_labels, b_token_types = batch
  with torch.no_grad():
    # Forward pass
    outs = model(b_input_ids, token_type_ids=None, attention_mask=b_input_mask)
    b_logit_pred = outs[0]
    pred_label = torch.sigmoid(b_logit_pred)

    b_logit_pred = b_logit_pred.detach().cpu().numpy()
    pred_label = pred_label.to('cpu').numpy()
    b_labels = b_labels.to('cpu').numpy()

  tokenized_texts.append(b_input_ids)
  logit_preds.append(b_logit_pred)
  true_labels.append(b_labels)
  pred_labels.append(pred_label)

# Flatten outputs
tokenized_texts = [item for sublist in tokenized_texts for item in sublist]
pred_labels = [item for sublist in pred_labels for item in sublist]
true_labels = [item for sublist in true_labels for item in sublist]
# Converting flattened binary values to boolean values
true_bools = [tl==1 for tl in true_labels]

We need to threshold our sigmoid function outputs which range from [0, 1]. Below I use 0.50 as a threshold.

## ASSIGNMENT 5

+ TODO use scikit-learn functions to calculate F1 micro and Accuracy scores. HINT: you need to use true_bools and pred_bools from above.
+ TODO: use scikit-learn function to provide a classification report.

Output should be similar to the following:

In [None]:
from sklearn.metrics import classification_report

pred_bools = [pl>0.50 for pl in pred_labels] 

clf_report = classification_report(true_bools, pred_bools, target_names=label_cols_test)

pickle.dump(clf_report, open('classification_report_original_10.txt','wb')) 
print(clf_report)

                          precision    recall  f1-score   support

        AMBIENCE#GENERAL       0.68      0.81      0.74        57
           DRINKS#PRICES       0.00      0.00      0.00         3
          DRINKS#QUALITY       0.00      0.00      0.00        21
    DRINKS#STYLE_OPTIONS       0.00      0.00      0.00        12
             FOOD#PRICES       1.00      0.05      0.09        22
            FOOD#QUALITY       0.85      0.92      0.89       226
      FOOD#STYLE_OPTIONS       0.00      0.00      0.00        48
        LOCATION#GENERAL       0.00      0.00      0.00        13
      RESTAURANT#GENERAL       0.86      0.76      0.81       142
RESTAURANT#MISCELLANEOUS       0.00      0.00      0.00        33
       RESTAURANT#PRICES       0.00      0.00      0.00        21
         SERVICE#GENERAL       0.93      0.86      0.89       145

               micro avg       0.85      0.66      0.74       743
               macro avg       0.36      0.28      0.28       743
        

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


## Output Dataframe

In [None]:
idx2label = dict(zip(range(12),label_cols))
print(idx2label)

{0: 'AMBIENCE#GENERAL', 1: 'DRINKS#PRICES', 2: 'DRINKS#QUALITY', 3: 'DRINKS#STYLE_OPTIONS', 4: 'FOOD#PRICES', 5: 'FOOD#QUALITY', 6: 'FOOD#STYLE_OPTIONS', 7: 'LOCATION#GENERAL', 8: 'RESTAURANT#GENERAL', 9: 'RESTAURANT#MISCELLANEOUS', 10: 'RESTAURANT#PRICES', 11: 'SERVICE#GENERAL'}


In [None]:
# Getting indices of where boolean one hot vector true_bools is True so we can use idx2label to gather label names
true_label_idxs, pred_label_idxs=[],[]
for vals in true_bools:
  true_label_idxs.append(np.where(vals)[0].flatten().tolist())
for vals in pred_bools:
  pred_label_idxs.append(np.where(vals)[0].flatten().tolist())

In [None]:
# Gathering vectors of label names using idx2label
true_label_texts, pred_label_texts = [], []
for vals in true_label_idxs:
  if vals:
    true_label_texts.append([idx2label[val] for val in vals])
  else:
    true_label_texts.append(vals)

for vals in pred_label_idxs:
  if vals:
    pred_label_texts.append([idx2label[val] for val in vals])
  else:
    pred_label_texts.append(vals)

# BONUS ASSIGNMENT 6

In this assignment we will decode the input ids from the tokenized texts using the tokenizer instantiated above and will use them to generate a dataframe in which to add the text of the review, the true labels and the predicted labels. We will then save this dataframe to a csv which could be used to manually inspect the predictions of the model with respect to the gold standard.

+ TODO: decode the texts.
+ TODO: create a dataframe containing three columns: the texts, the true labels and the predicted labels.
+ TODO: save it into a csv.

The result should be something like the following:

# BONUS ASSIGNMENT 7

+ TODO: Can you generate the required data for multilabel aspect category detection using the "acb" datasets available for other languages?