### <strong>主題:
啤酒評論評分預測 - 分類模型建構
### <strong>說明:
繼續上次啤酒的評鑑資料集的練習，我們這次的最終目標這要是把啤酒評分的預測當作分類問題 <br />
，建構BERT模型，評估其各項屬性(apperance, aroma, overall, palate, taste)得分。特 <br />
注意的是，與課程中範例不同的地方在於這次必須預測多個目標，也就是典型的多標籤問題 <br />
(multi-label classification)
### <strong>題目
1. 以上次處理好的啤酒資料為範例，建構相對應的pytorch Dataset與pytorhc Dataloader<br />
(完成底下的BeerDataset與create_data_loader)
2. 以上次處理好的啤酒資料為範例，建構主要模型的架構(完成底下的BeerRateClassifier)
3. 完成最後的訓練流程並得到權重檔，確認模型架構沒有問題

#### <strong>提示1: 若同學因GPU限制無法快速訓練，可以考慮調低訓練回合數，MAX_LEN，或選擇較小的bert模型。
#### <strong>提示2: 若還是對multi-labeling問題建構不知從何下手，可以考[範例](https://www.learnopencv.com/multi-label-image-classification-with-pytorch/)

In [1]:
from google.colab import drive
drive.mount('/content/MyDrive')

Drive already mounted at /content/MyDrive; to attempt to forcibly remount, call drive.mount("/content/MyDrive", force_remount=True).


In [2]:
import os

In [3]:
os.chdir('/content/MyDrive/MyDrive/NLP_100days_Part2/day41')

In [4]:
!pip install transformers==3.3.1



In [5]:
import torch
import transformers
import numpy as np
import pandas as pd
from torch.utils.data import Dataset, DataLoader
from torch import nn, optim
from transformers import BertModel, BertTokenizer
from transformers import AdamW, get_linear_schedule_with_warmup

In [6]:
PRE_TRAINED_MODEL_NAME = "bert-base-cased"
BATCH_SIZE = 16
MAX_LEN = 255
EPOCHS = 10

DEVICE = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
TOKENIZER = BertTokenizer.from_pretrained(PRE_TRAINED_MODEL_NAME)

In [7]:
class BeerDataset(Dataset):
    """
    將資料集轉換為後續data DataLoader 需求的 pytorch Dataset形式
    Convert beer review dataframe into torch dataset instance
    """
    def __init__(self,
                 comments,
                 apperance_target,
                 aroma_target,
                 overall_target,
                 palate_target,
                 taste_target, max_len):
        
        
        self.comments = comments
        self.apperance_target = apperance_target
        self.aroma_target = aroma_target
        self.overall_target = overall_target
        self.palate_target = palate_target
        self.taste_target = taste_target
        self.max_len = max_len
        

    def __len__(self):
        return len(self.comments)

    def __getitem__(self, item):
        
        comment = self.comments[item]
        apperance_target = self.apperance_target[item]
        aroma_target = self.aroma_target[item]
        overall_target = self.overall_target[item]
        palate_target = self.palate_target[item]
        taste_target = self.taste_target[item]
        encoding = TOKENIZER.encode_plus(
            comment,
            truncation=True,
            add_special_tokens=True,
            max_length=self.max_len,
            return_token_type_ids=False,
            padding='max_length',
            return_attention_mask=True,
            return_tensors='pt',
        )
        return {
            'comment': comment,
            'input_ids': encoding['input_ids'].flatten(),
            'attention_mask': encoding['attention_mask'].flatten(),
            'apperance_target': torch.tensor(apperance_target, dtype=torch.long),
            'aroma_target': torch.tensor(aroma_target, dtype=torch.long),
            'overall_target': torch.tensor(overall_target, dtype=torch.long),
            'palate_target': torch.tensor(palate_target, dtype=torch.long),
            'taste_target': torch.tensor(taste_target, dtype=torch.long)
        }

In [8]:
def create_data_loader(dataframe, max_len, batch_size):
    """
    將pytorch Dataset形式資料集包裝為data DataLoader
    convert dataset to pytorch dataloader format object
    """
    dataset = BeerDataset(
        comments=dataframe["review/text"].values,
        apperance_target=dataframe["review_appearance"].values,
        aroma_target=dataframe["review_aroma"].values,
        overall_target=dataframe["review_overall"].values,
        palate_target=dataframe["review_palate"].values,
        taste_target=dataframe["review_taste"].values,
        max_len=max_len
    )
    return DataLoader(
        dataset,
        batch_size=batch_size
    )


In [9]:
class BeerRateClassifier(nn.Module):
    """
    啤酒評論評分分類模型的主體
    Beer sentiment main model for review sentiment analyzer
    """
    def __init__(self,
                 apperance_n_classes,
                 aroma_n_classes,
                 overall_n_classes,
                 palate_n_classes,
                 taste_n_classes,
                ):
        super(BeerRateClassifier, self).__init__()
        
        self.bert = BertModel.from_pretrained(PRE_TRAINED_MODEL_NAME)
        
        self.apperance_out = nn.Sequential(
                                nn.Dropout(p=0.2),
                                nn.Linear(self.bert.config.hidden_size, apperance_n_classes)
                             )
        self.aroma_out = nn.Sequential(
                                nn.Dropout(p=0.2),
                                nn.Linear(self.bert.config.hidden_size, aroma_n_classes)     
                            )
        self.overall_out = nn.Sequential(
                                nn.Dropout(p=0.2),
                                nn.Linear(self.bert.config.hidden_size, overall_n_classes)
                            )
        self.palate_out = nn.Sequential(
                                nn.Dropout(p=0.2),
                                nn.Linear(self.bert.config.hidden_size, palate_n_classes)
                            )
        self.taste_out = nn.Sequential(
                                nn.Dropout(p=0.2),
                                nn.Linear(self.bert.config.hidden_size, taste_n_classes)
                            )
      

    def forward(self, input_ids, attention_mask):
        _, pooled_output = self.bert(
            input_ids=input_ids,
            attention_mask=attention_mask
        )
        apperance_output = self.apperance_out(pooled_output)
        aroma_output = self.aroma_out(pooled_output)
        overall_output = self.overall_out(pooled_output)
        palate_output = self.palate_out(pooled_output)
        taste_output = self.taste_out(pooled_output)
        
        return {
            "apperance": apperance_output,
            "aroma": aroma_output,
            "overall": overall_output,
            "palate": palate_output,
            "taste": taste_output,
        }

In [10]:
def train_epoch(model,
                data_loader,
                loss_fn,
                optimizer,
                scheduler,
                n_examples):
    """
    電影評論分類器的訓練主流程
    Main training process of bert sentiment classifier
    """
    model = model.train()
    losses = []
    correct_predictions = 0
    for _d in data_loader:
        input_ids = _d["input_ids"].to(DEVICE)
        attention_mask = _d["attention_mask"].to(DEVICE)
        apperance_target = _d["apperance_target"].to(DEVICE)
        aroma_target = _d["aroma_target"].to(DEVICE)
        overall_target = _d["overall_target"].to(DEVICE)
        palate_target = _d["palate_target"].to(DEVICE)
        taste_target = _d["taste_target"].to(DEVICE)
        outputs = model(
            input_ids=input_ids,
            attention_mask=attention_mask
        )
        _, apperance_preds = torch.max(outputs["apperance"], dim=1)
        _, aroma_preds = torch.max(outputs["aroma"], dim=1)
        _, overall_preds = torch.max(outputs["overall"], dim=1)
        _, palate_preds = torch.max(outputs["palate"], dim=1)
        _, taste_preds = torch.max(outputs["taste"], dim=1)
        
        apperance_loss = loss_fn(outputs["apperance"], apperance_target)
        aroma_loss = loss_fn(outputs["aroma"], aroma_target)
        overall_loss = loss_fn(outputs["overall"], overall_target)
        palate_loss = loss_fn(outputs["palate"], palate_target)
        taste_loss = loss_fn(outputs["taste"], taste_target)
        
        correct_predictions += torch.sum(apperance_preds == apperance_target)
        correct_predictions += torch.sum(aroma_preds == aroma_target)
        correct_predictions += torch.sum(overall_preds == overall_preds)
        correct_predictions += torch.sum(palate_preds == palate_preds)
        correct_predictions += torch.sum(taste_preds == taste_preds)
        
        loss = apperance_loss + aroma_loss + overall_loss + palate_loss + taste_loss
        losses.append(loss.item())
        loss.backward()
        nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
        optimizer.step()
        scheduler.step()
        optimizer.zero_grad()
    return correct_predictions.double() / n_examples / 5, np.mean(losses)

def eval_model(model,
               data_loader,
               loss_fn,
               n_examples):
    """
    電影評論分類器的訓練時每個epoch評估訓練效能主流程
    Main evaluate process in training of bert sentiment classifier
    """
    model = model.eval()

    losses = []
    correct_predictions = 0

    with torch.no_grad():
        for d in data_loader:
            input_ids = d["input_ids"].to(DEVICE)
            attention_mask = d["attention_mask"].to(DEVICE)
            apperance_target = d["apperance_target"].to(DEVICE)
            aroma_target = d["aroma_target"].to(DEVICE)
            overall_target = d["overall_target"].to(DEVICE)
            palate_target = d["palate_target"].to(DEVICE)
            taste_target = d["taste_target"].to(DEVICE)

            outputs = model(
                input_ids=input_ids,
                attention_mask=attention_mask
            )
            _, apperance_preds = torch.max(outputs["apperance"], dim=1)
            _, aroma_preds = torch.max(outputs["aroma"], dim=1)
            _, overall_preds = torch.max(outputs["overall"], dim=1)
            _, palate_preds = torch.max(outputs["palate"], dim=1)
            _, taste_preds = torch.max(outputs["taste"], dim=1)
            
            apperance_loss = loss_fn(outputs["apperance"], apperance_target)
            aroma_loss = loss_fn(outputs["aroma"], aroma_target)
            overall_loss = loss_fn(outputs["overall"], overall_target)
            palate_loss = loss_fn(outputs["palate"], palate_target)
            taste_loss = loss_fn(outputs["taste"], taste_target)

            loss = apperance_loss + aroma_loss + overall_loss + palate_loss + taste_loss

            correct_predictions += torch.sum(apperance_preds == apperance_target)
            correct_predictions += torch.sum(aroma_preds == aroma_target)
            correct_predictions += torch.sum(overall_preds == overall_preds)
            correct_predictions += torch.sum(palate_preds == palate_preds)
            correct_predictions += torch.sum(taste_preds == taste_preds)
            losses.append(loss.item())

    return correct_predictions.double() / n_examples / 5, np.mean(losses)

In [11]:
TRAIN = pd.read_json("train_set.json")
TRAIN = TRAIN.dropna(subset=['review/text']).sample(frac=1).reset_index(drop=True)
VAL = pd.read_json("test_set.json")
VAL = VAL.dropna(subset=['review/text']).sample(frac=1).reset_index(drop=True)
TRAIN = TRAIN.append(VAL[500:]).reset_index(drop=True)
VAL = VAL.iloc[:500]

In [12]:
MODEL = BeerRateClassifier(4, 4, 4, 4, 4)
MODEL.to(DEVICE)

TRAIN_DATA_LOADER = create_data_loader(TRAIN, MAX_LEN, BATCH_SIZE)
VAL_DATA_LOADER = create_data_loader(VAL, MAX_LEN, BATCH_SIZE)

OPTIMIZER = AdamW(MODEL.parameters(), lr=2e-5, correct_bias=False)
TOTAL_STEPS = len(TRAIN_DATA_LOADER) * EPOCHS
SCHEDULER = get_linear_schedule_with_warmup(
    OPTIMIZER,
    num_warmup_steps=0,
    num_training_steps=TOTAL_STEPS
)
LOSS_FN = nn.CrossEntropyLoss().to(DEVICE)

In [None]:
BEST_ACCURACY = 0

for epoch in range(EPOCHS):
    print(f'Epoch {epoch + 1}/{EPOCHS}')
    print('-' * 10)

    train_acc, train_loss = train_epoch(
        MODEL,
        TRAIN_DATA_LOADER,
        LOSS_FN,
        OPTIMIZER,
        SCHEDULER,
        len(TRAIN)
    )

    print(f'Train loss {train_loss} accuracy {train_acc}')

    val_acc, val_loss = eval_model(
        MODEL,
        VAL_DATA_LOADER,
        LOSS_FN,
        len(VAL)
    )

    print(f'Val   loss {val_loss} accuracy {val_acc}')
    print()

    if val_acc > BEST_ACCURACY:
        MODEL.bert.save_pretrained("./")
        best_accuracy = val_acc

Epoch 1/10
----------
Train loss 4.271188899494916 accuracy 0.8201362083948022
Val   loss 0.002993509806401562 accuracy 1.0

Epoch 2/10
----------
Train loss 4.0977653078652825 accuracy 0.8312107188327303
Val   loss 0.0019100091558357235 accuracy 1.0

Epoch 3/10
----------
Train loss 3.9056531600418887 accuracy 0.8385667805104783
Val   loss 0.0013974692301417235 accuracy 1.0

Epoch 4/10
----------
Train loss 3.7099523826865775 accuracy 0.8514520138229291
Val   loss 0.0013205195500631817 accuracy 1.0

Epoch 5/10
----------
Train loss 3.4900821926647128 accuracy 0.8605177535719337
Val   loss 0.0012381748711050022 accuracy 1.0

Epoch 6/10
----------
