# <span><h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Install Required Libraries</h1></span>

In [1]:
!pip install --upgrade wandb

Collecting wandb
  Downloading wandb-0.12.10-py2.py3-none-any.whl (1.7 MB)
[?25l[K     |▏                               | 10 kB 32.2 MB/s eta 0:00:01[K     |▍                               | 20 kB 38.3 MB/s eta 0:00:01[K     |▋                               | 30 kB 42.1 MB/s eta 0:00:01[K     |▊                               | 40 kB 27.9 MB/s eta 0:00:01[K     |█                               | 51 kB 21.9 MB/s eta 0:00:01[K     |█▏                              | 61 kB 25.0 MB/s eta 0:00:01[K     |█▍                              | 71 kB 24.8 MB/s eta 0:00:01[K     |█▌                              | 81 kB 26.0 MB/s eta 0:00:01[K     |█▊                              | 92 kB 28.2 MB/s eta 0:00:01[K     |██                              | 102 kB 29.0 MB/s eta 0:00:01[K     |██                              | 112 kB 29.0 MB/s eta 0:00:01[K     |██▎                             | 122 kB 29.0 MB/s eta 0:00:01[K     |██▌                             | 133 kB 29.0 MB/s eta

# <span><h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Import Required Libraries 📚</h1></span>

In [2]:
!pip install transformers

Collecting transformers
  Downloading transformers-4.16.2-py3-none-any.whl (3.5 MB)
[K     |████████████████████████████████| 3.5 MB 27.6 MB/s 
[?25hCollecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 64.6 MB/s 
Collecting sacremoses
  Downloading sacremoses-0.0.47-py2.py3-none-any.whl (895 kB)
[K     |████████████████████████████████| 895 kB 69.0 MB/s 
Collecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.4.0-py3-none-any.whl (67 kB)
[K     |████████████████████████████████| 67 kB 5.8 MB/s 
Collecting tokenizers!=0.11.3,>=0.10.1
  Downloading tokenizers-0.11.5-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.8 MB)
[K     |████████████████████████████████| 6.8 MB 59.6 MB/s 
Installing collected packages: pyyaml, tokenizers, sacremoses, huggingface-hub, transformers
  Attempting uninstall: pyyaml
    Fou

In [3]:
import os
import gc
import copy
import time
import random
import string

# For data manipulation
import numpy as np
import pandas as pd

# Pytorch Imports
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
from torch.utils.data import Dataset, DataLoader

# Utils
from tqdm import tqdm
from collections import defaultdict

# Sklearn Imports
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import StratifiedKFold, KFold

# For Transformer Models
from transformers import AutoTokenizer, AutoModel, AdamW

# For colored terminal text
# from colorama import Fore, Back, Style
# b_ = Fore.BLUE
# y_ = Fore.YELLOW
# sr_ = Style.RESET_ALL

# Suppress warnings
import warnings
warnings.filterwarnings("ignore")

# For descriptive error messages
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"

<img src="https://i.imgur.com/gb6B4ig.png" width="400" alt="Weights & Biases" />

<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em; font-weight: 300;"> Weights & Biases (W&B) is a set of machine learning tools that helps you build better models faster. <strong>Kaggle competitions require fast-paced model development and evaluation</strong>. There are a lot of components: exploring the training data, training different models, combining trained models in different combinations (ensembling), and so on.</span>

> <span style="color: #000508; font-family: Segoe UI; font-size: 1.2em; font-weight: 300;">⏳ Lots of components = Lots of places to go wrong = Lots of time spent debugging</span>

<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em; font-weight: 300;">W&B can be useful for Kaggle competition with it's lightweight and interoperable tools:</span>

* <span style="color: #000508; font-family: Segoe UI; font-size: 1.2em; font-weight: 300;">Quickly track experiments,<br></span>
* <span style="color: #000508; font-family: Segoe UI; font-size: 1.2em; font-weight: 300;">Version and iterate on datasets, <br></span>
* <span style="color: #000508; font-family: Segoe UI; font-size: 1.2em; font-weight: 300;">Evaluate model performance,<br></span>
* <span style="color: #000508; font-family: Segoe UI; font-size: 1.2em; font-weight: 300;">Reproduce models,<br></span>
* <span style="color: #000508; font-family: Segoe UI; font-size: 1.2em; font-weight: 300;">Visualize results and spot regressions,<br></span>
* <span style="color: #000508; font-family: Segoe UI; font-size: 1.2em; font-weight: 300;">Share findings with colleagues.</span>

<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em; font-weight: 300;">To learn more about Weights and Biases check out this <strong><a href="https://www.kaggle.com/ayuraj/experiment-tracking-with-weights-and-biases">kernel</a></strong>.</span>

In [4]:
import wandb

try:
    from kaggle_secrets import UserSecretsClient
    user_secrets = UserSecretsClient()
    api_key = user_secrets.get_secret("wandb_api")
    wandb.login(key=api_key)
    anony = None
except:
    anony = "must"
    print('If you want to use your W&B account, go to Add-ons -> Secrets and provide your W&B access token. Use the Label name as wandb_api. \nGet your W&B access token from here: https://wandb.ai/authorize')

If you want to use your W&B account, go to Add-ons -> Secrets and provide your W&B access token. Use the Label name as wandb_api. 
Get your W&B access token from here: https://wandb.ai/authorize


# <span><h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Training Configuration ⚙️</h1></span>

In [5]:
def id_generator(size=12, chars=string.ascii_lowercase + string.digits):
    return ''.join(random.SystemRandom().choice(chars) for _ in range(size))

HASH_NAME = id_generator(size=12)
print(HASH_NAME)

d1wedjqiwkf9


<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em; font-weight: 300;">Each experiments are grouped together using the hash-value<br></span>

![](https://i.imgur.com/Maej42h.jpg)

In [6]:
CONFIG = {"seed": 2021,
          "epochs": 3,
          "model_name": "aubmindlab/bert-base-arabert",
          "train_batch_size": 32,
          "valid_batch_size": 64,
          "max_length": 128,
          "learning_rate": 1e-4,
          "scheduler": 'CosineAnnealingLR',
          "min_lr": 1e-6,
          "T_max": 500,
          "weight_decay": 1e-6,
          "n_fold": 5,
          "n_accumulate": 1,
          "num_classes": 1,
          "margin": 0.5,
          "device": torch.device("cuda:0" if torch.cuda.is_available() else "cpu"),
          "hash_name": HASH_NAME
          }

CONFIG["tokenizer"] = AutoTokenizer.from_pretrained(CONFIG['model_name'])
CONFIG['group'] = f'{HASH_NAME}-Baseline'

Downloading:   0%|          | 0.00/637 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/578 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/700k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/2.15M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

# <span><h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Set Seed for Reproducibility</h1></span>

In [7]:
def set_seed(seed=42):
    '''Sets the seed of the entire notebook so results are the same every time we run.
    This is for REPRODUCIBILITY.'''
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    # When running on the CuDNN backend, two further options must be set
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    # Set a fixed value for the hash seed
    os.environ['PYTHONHASHSEED'] = str(seed)
    
set_seed(CONFIG['seed'])

# <h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Read the Data 📖</h1>

In [8]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [10]:
test= pd.read_csv("/content/drive/MyDrive/ISarcasm/TestSet/task_C_Ar_test.csv")

In [11]:
test

Unnamed: 0,text_0,text_1,dialect,sarcastic_id
0,يا زبدة لا تسيحي/ تذوبي,يا دهينة لا تنكتين,gulf,1
1,ياسمين صبرى مش كويسة فى التمثيل,مين اللي أقنع ياسمين صبري أنها تمثل,msa,1
2,ابو بلاش كتر منه كتير,الشئ المجانى خذ منه كتير لانه غنيمه,nile,0
3,انت متكبر لكن لا تملك ما يدفعك للتكبر,تحسب نفسك حاجة و انت متسواش نص دجاجة,magreb,1
4,معنديش فكه تاخد برميل بترول,سعر البترول يصل لادني مستوي,nile,0
...,...,...,...,...
195,الراجل اللى مش معاه فلوس ملوش لازمة,شنب ماتحته فلوس يحتاج موس,nile,1
196,قبل الجواز اغسل وش مراتك,لاز تشوف مراتك قبل الجواز بدون ميكب,nile,0
197,شخص ليس له اي جدوي,يالك مش شخص عظيم جدا,msa,1
198,معنديش مانع اخدك علي قد عقلك بس الاقيه الاول,معنديش مانع اتكلم معاك بس انت معندكش فكر اصلا,nile,0


In [None]:
df = pd.read_csv("/content/drive/MyDrive/ISarcasm/DataSet/train.Ar.csv")
df=df.loc[df['sarcastic'] == 1]
df=df[['tweet','rephrase','sarcastic']]
df.head()

Unnamed: 0,tweet,rephrase,sarcastic
0,ضبط شخص بدبلوم انتحل صفة طبيب بلد مافيش حد فيه...,شخص ينتحل صفة طبيب ويفتتح عيادة فى بلد فاشلة ض...,1
1,مش معنى انك قولتلى رايك يبقى أنا هعمل بيه طب ا...,مش لازم دائما اعمل برأيك,1
2,اية المهلبية دي يصحبي,ما هذا الجمال,1
3,الحديث قياس فيه الفضة و فيه النحاس,لسانك ترجمان قلبك,1
4,ده فاكر نفسه باشا و بيه كمان,ده مغرور و شايف نفسه علىالناس,1


In [None]:
train, validate, test = \
              np.split(df.sample(frac=1, random_state=42), 
                       [int(.6*len(df)), int(.8*len(df))])

In [None]:
train=pd.concat([train, validate], ignore_index=True)

# <span><h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Create Folds</h1></span>

In [None]:
skf = StratifiedKFold(n_splits=CONFIG['n_fold'], shuffle=True, random_state=CONFIG['seed'])

for fold, ( _, val_) in enumerate(skf.split(X=train, y=train.sarcastic)):
    train.loc[val_ , "kfold"] = int(fold)
    
train["kfold"] = train["kfold"].astype(int)
train.head()

Unnamed: 0,tweet,rephrase,sarcastic,kfold
0,اغلب الابراج مسمينها ب اسماء حيوانات لأن مايصد...,اغلب الابراج على اسم حيوانات لان من يصدق بها ل...,1,0
1,احترس من قرني الثور وحوافر الحصان وابتسامة بعض...,عليك ان تحترس من بعض الناس اللي تظهر لك الخير ...,1,1
2,لما تحفظ اغنيه وتمشى تغنيها فى البيت واهلك بقو...,ياريت تحفظ دروسك زى ما بتحفظ الأغانى كده,1,1
3,مبولحي اذا دوش تسما راهو يبذر فالماء,الفريق الخصم ضعيف امام منختب الجزائر,1,4
4,مبروك يا شمشمون انت عملت حاجه ملهاش اى تلاتين ...,انت يا شمشمون عملت حاجة مش مفيدة,1,3


# <span><h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Dataset Class</h1></span>

In [13]:
class JigsawDataset(Dataset):
    def __init__(self, df, tokenizer, max_length):
        self.df = df
        self.max_len = max_length
        self.tokenizer = tokenizer
        self.more_toxic = df['text_0'].values
        self.less_toxic = df['text_1'].values
        
    def __len__(self):
        return len(self.df)
    
    def __getitem__(self, index):
        more_toxic = self.more_toxic[index]
        less_toxic = self.less_toxic[index]
        inputs_more_toxic = self.tokenizer.encode_plus(
                                more_toxic,
                                truncation=True,
                                add_special_tokens=True,
                                max_length=self.max_len,
                                padding='max_length'
                            )
        inputs_less_toxic = self.tokenizer.encode_plus(
                                less_toxic,
                                truncation=True,
                                add_special_tokens=True,
                                max_length=self.max_len,
                                padding='max_length'
                            )
        target = 1
        
        more_toxic_ids = inputs_more_toxic['input_ids']
        more_toxic_mask = inputs_more_toxic['attention_mask']
        
        less_toxic_ids = inputs_less_toxic['input_ids']
        less_toxic_mask = inputs_less_toxic['attention_mask']
        
        
        return {
            'more_toxic_ids': torch.tensor(more_toxic_ids, dtype=torch.long),
            'more_toxic_mask': torch.tensor(more_toxic_mask, dtype=torch.long),
            'less_toxic_ids': torch.tensor(less_toxic_ids, dtype=torch.long),
            'less_toxic_mask': torch.tensor(less_toxic_mask, dtype=torch.long),
            'target': torch.tensor(target, dtype=torch.long)
        }

# <span><h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Create Model</h1></span>

In [14]:
class JigsawModel(nn.Module):
    def __init__(self, model_name):
        super(JigsawModel, self).__init__()
        self.model = AutoModel.from_pretrained(model_name)
        self.layer_norm = nn.LayerNorm(768)
        self.dropout = nn.Dropout(0.2)
        self.dense = nn.Sequential(
            nn.Linear(768, 256),
            nn.LeakyReLU(negative_slope=0.01),
            nn.Dropout(0.2),
            nn.Linear(256, 1)
        )

    def forward(self, input_ids, attention_mask):
        pooled_output = self.model(input_ids=input_ids, attention_mask=attention_mask)
        pooled_output = self.layer_norm(pooled_output[1])
        pooled_output = self.dropout(pooled_output)
        preds = self.dense(pooled_output)
        return preds

# <span><h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Loss Function</h1></span>

![](https://i.imgur.com/qYwVt8V.jpg)

<span style="color: #000508; font-family: Segoe UI; font-size: 1.5em; font-weight: 300;">Check the official documentation <a href="https://pytorch.org/docs/stable/generated/torch.nn.MarginRankingLoss.html">here</a></span>

In [15]:
def criterion(outputs1, outputs2, targets):
    return nn.MarginRankingLoss(margin=CONFIG['margin'])(outputs1, outputs2, targets)

# <span><h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Training Function</h1></span>

In [None]:
def train_one_epoch(model, optimizer, scheduler, dataloader, device, epoch):
    model.train()
    
    dataset_size = 0
    running_loss = 0.0
    
    bar = tqdm(enumerate(dataloader), total=len(dataloader))
    for step, data in bar:
        more_toxic_ids = data['more_toxic_ids'].to(device, dtype = torch.long)
        more_toxic_mask = data['more_toxic_mask'].to(device, dtype = torch.long)
        less_toxic_ids = data['less_toxic_ids'].to(device, dtype = torch.long)
        less_toxic_mask = data['less_toxic_mask'].to(device, dtype = torch.long)
        targets = data['target'].to(device, dtype=torch.long)
        
        batch_size = more_toxic_ids.size(0)

        more_toxic_outputs = model(more_toxic_ids, more_toxic_mask)
        less_toxic_outputs = model(less_toxic_ids, less_toxic_mask)
        
        loss = criterion(more_toxic_outputs, less_toxic_outputs, targets)
        loss = loss / CONFIG['n_accumulate']
        loss.backward()
    
        if (step + 1) % CONFIG['n_accumulate'] == 0:
            optimizer.step()

            # zero the parameter gradients
            optimizer.zero_grad()

            if scheduler is not None:
                scheduler.step()
                
        running_loss += (loss.item() * batch_size)
        dataset_size += batch_size
        
        epoch_loss = running_loss / dataset_size
        
        bar.set_postfix(Epoch=epoch, Train_Loss=epoch_loss,
                        LR=optimizer.param_groups[0]['lr'])
    gc.collect()
    
    return epoch_loss

# <span><h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Validation Function</h1></span>

In [None]:
@torch.no_grad()
def valid_one_epoch(model, dataloader, device, epoch):
    model.eval()
    
    dataset_size = 0
    running_loss = 0.0
    
    bar = tqdm(enumerate(dataloader), total=len(dataloader))
    for step, data in bar:        
        more_toxic_ids = data['more_toxic_ids'].to(device, dtype = torch.long)
        more_toxic_mask = data['more_toxic_mask'].to(device, dtype = torch.long)
        less_toxic_ids = data['less_toxic_ids'].to(device, dtype = torch.long)
        less_toxic_mask = data['less_toxic_mask'].to(device, dtype = torch.long)
        targets = data['target'].to(device, dtype=torch.long)
        
        batch_size = more_toxic_ids.size(0)

        more_toxic_outputs = model(more_toxic_ids, more_toxic_mask)
        less_toxic_outputs = model(less_toxic_ids, less_toxic_mask)
        
        loss = criterion(more_toxic_outputs, less_toxic_outputs, targets)
        
        running_loss += (loss.item() * batch_size)
        dataset_size += batch_size
        
        epoch_loss = running_loss / dataset_size
        
        bar.set_postfix(Epoch=epoch, Valid_Loss=epoch_loss,
                        LR=optimizer.param_groups[0]['lr'])   
    
    gc.collect()
    
    return epoch_loss

# <span><h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Run Training</h1></span>

In [None]:
def run_training(model, optimizer, scheduler, device, num_epochs, fold):
    # To automatically log gradients
    wandb.watch(model, log_freq=100)
    
    if torch.cuda.is_available():
        print("[INFO] Using GPU: {}\n".format(torch.cuda.get_device_name()))
    
    start = time.time()
    best_model_wts = copy.deepcopy(model.state_dict())
    best_epoch_loss = np.inf
    history = defaultdict(list)
    
    for epoch in range(1, num_epochs + 1): 
        gc.collect()
        train_epoch_loss = train_one_epoch(model, optimizer, scheduler, 
                                           dataloader=train_loader, 
                                           device=CONFIG['device'], epoch=epoch)
        
        val_epoch_loss = valid_one_epoch(model, valid_loader, device=CONFIG['device'], 
                                         epoch=epoch)
    
        history['Train Loss'].append(train_epoch_loss)
        history['Valid Loss'].append(val_epoch_loss)
        
        # Log the metrics
        wandb.log({"Train Loss": train_epoch_loss})
        wandb.log({"Valid Loss": val_epoch_loss})
        
        # deep copy the model
        if val_epoch_loss <= best_epoch_loss:
            print(f"Validation Loss Improved ({best_epoch_loss} ---> {val_epoch_loss})")
            best_epoch_loss = val_epoch_loss
            run.summary["Best Loss"] = best_epoch_loss
            best_model_wts = copy.deepcopy(model.state_dict())
            PATH = f"/content/drive/MyDrive/ISarcasm/TaskC_models/Arabert_task_c/Loss-Fold-{fold}.bin"
            torch.save(model.state_dict(), PATH)
            # Save a model file from the current directory
            print(f"Model Saved")
            
        print()
    
    end = time.time()
    time_elapsed = end - start
    print('Training complete in {:.0f}h {:.0f}m {:.0f}s'.format(
        time_elapsed // 3600, (time_elapsed % 3600) // 60, (time_elapsed % 3600) % 60))
    print("Best Loss: {:.4f}".format(best_epoch_loss))
    
    # load best model weights
    model.load_state_dict(best_model_wts)
    
    return model, history

In [None]:
def prepare_loaders(fold):
    df_train = train[train.kfold != fold].reset_index(drop=True)
    df_valid = train[train.kfold == fold].reset_index(drop=True)
    
    train_dataset = JigsawDataset(df_train, tokenizer=CONFIG['tokenizer'], max_length=CONFIG['max_length'])
    valid_dataset = JigsawDataset(df_valid, tokenizer=CONFIG['tokenizer'], max_length=CONFIG['max_length'])

    train_loader = DataLoader(train_dataset, batch_size=CONFIG['train_batch_size'], 
                              num_workers=2, shuffle=True, pin_memory=True, drop_last=True)
    valid_loader = DataLoader(valid_dataset, batch_size=CONFIG['valid_batch_size'], 
                              num_workers=2, shuffle=False, pin_memory=True)
    
    return train_loader, valid_loader

In [None]:
def fetch_scheduler(optimizer):
    if CONFIG['scheduler'] == 'CosineAnnealingLR':
        scheduler = lr_scheduler.CosineAnnealingLR(optimizer,T_max=CONFIG['T_max'], 
                                                   eta_min=CONFIG['min_lr'])
    elif CONFIG['scheduler'] == 'CosineAnnealingWarmRestarts':
        scheduler = lr_scheduler.CosineAnnealingWarmRestarts(optimizer,T_0=CONFIG['T_0'], 
                                                             eta_min=CONFIG['min_lr'])
    elif CONFIG['scheduler'] == None:
        return None
        
    return scheduler

In [None]:
import gc
gc.collect()

198

<span style="color: #000508; font-family: Segoe UI; font-size: 1.5em; font-weight: 300;">Start Training</span>

In [None]:
for fold in range(0, CONFIG['n_fold']):
    print(f"====== Fold: {fold} ======")
    run = wandb.init(project='Jigsaw', 
                     config=CONFIG,
                     job_type='Train',
                     group=CONFIG['group'],
                     tags=['roberta-base', f'{HASH_NAME}', 'margin-loss'],
                     name=f'{HASH_NAME}-fold-{fold}',
                     anonymous='must')
    
    # Create Dataloaders
    train_loader, valid_loader = prepare_loaders(fold=fold)
    
    model = JigsawModel(CONFIG['model_name'])
    model.to(CONFIG['device'])
    
    # Define Optimizer and Scheduler
    optimizer = AdamW(model.parameters(), lr=CONFIG['learning_rate'], weight_decay=CONFIG['weight_decay'])
    scheduler = fetch_scheduler(optimizer)
    
    model, history = run_training(model, optimizer, scheduler,
                                  device=CONFIG['device'],
                                  num_epochs=CONFIG['epochs'],
                                  fold=fold)
    
    run.finish()
    
    del model, history, train_loader, valid_loader
    _ = gc.collect()
    print()



<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


Downloading:   0%|          | 0.00/518M [00:00<?, ?B/s]

Some weights of the model checkpoint at aubmindlab/bert-base-arabert were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[INFO] Using GPU: Tesla T4



100%|██████████| 14/14 [00:17<00:00,  1.22s/it, Epoch=1, LR=9.98e-5, Train_Loss=0.349]
100%|██████████| 2/2 [00:01<00:00,  1.20it/s, Epoch=1, LR=9.98e-5, Valid_Loss=0.245]


Validation Loss Improved (inf ---> 0.24483333627382914)
Model Saved



100%|██████████| 14/14 [00:17<00:00,  1.22s/it, Epoch=2, LR=9.92e-5, Train_Loss=0.144]
100%|██████████| 2/2 [00:01<00:00,  1.18it/s, Epoch=2, LR=9.92e-5, Valid_Loss=0.147]


Validation Loss Improved (0.24483333627382914 ---> 0.14696298440297445)
Model Saved



100%|██████████| 14/14 [00:17<00:00,  1.24s/it, Epoch=3, LR=9.83e-5, Train_Loss=0.0542]
100%|██████████| 2/2 [00:01<00:00,  1.18it/s, Epoch=3, LR=9.83e-5, Valid_Loss=0.194]



Training complete in 0h 1m 4s
Best Loss: 0.1470


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
Train Loss,█▃▁
Valid Loss,█▁▄

0,1
Best Loss,0.14696
Train Loss,0.05416
Valid Loss,0.19351


[34m[1mwandb[0m: Currently logged in as: [33manony-mouse-193505[0m (use `wandb login --relogin` to force relogin)





Some weights of the model checkpoint at aubmindlab/bert-base-arabert were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[INFO] Using GPU: Tesla T4



100%|██████████| 14/14 [00:17<00:00,  1.23s/it, Epoch=1, LR=9.98e-5, Train_Loss=0.422]
100%|██████████| 2/2 [00:01<00:00,  1.16it/s, Epoch=1, LR=9.98e-5, Valid_Loss=0.269]


Validation Loss Improved (inf ---> 0.2687707548381902)
Model Saved



100%|██████████| 14/14 [00:17<00:00,  1.25s/it, Epoch=2, LR=9.92e-5, Train_Loss=0.213]
100%|██████████| 2/2 [00:01<00:00,  1.15it/s, Epoch=2, LR=9.92e-5, Valid_Loss=0.251]


Validation Loss Improved (0.2687707548381902 ---> 0.2511382423529104)
Model Saved



100%|██████████| 14/14 [00:17<00:00,  1.25s/it, Epoch=3, LR=9.83e-5, Train_Loss=0.118]
100%|██████████| 2/2 [00:01<00:00,  1.03it/s, Epoch=3, LR=9.83e-5, Valid_Loss=0.145]


Validation Loss Improved (0.2511382423529104 ---> 0.1450804026186967)
Model Saved

Training complete in 0h 1m 8s
Best Loss: 0.1451


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
Train Loss,█▃▁
Valid Loss,█▇▁

0,1
Best Loss,0.14508
Train Loss,0.11758
Valid Loss,0.14508





Some weights of the model checkpoint at aubmindlab/bert-base-arabert were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[INFO] Using GPU: Tesla T4



100%|██████████| 14/14 [00:17<00:00,  1.25s/it, Epoch=1, LR=9.98e-5, Train_Loss=0.339]
100%|██████████| 2/2 [00:01<00:00,  1.13it/s, Epoch=1, LR=9.98e-5, Valid_Loss=0.183]


Validation Loss Improved (inf ---> 0.182736972675604)
Model Saved



100%|██████████| 14/14 [00:17<00:00,  1.27s/it, Epoch=2, LR=9.92e-5, Train_Loss=0.155]
100%|██████████| 2/2 [00:01<00:00,  1.13it/s, Epoch=2, LR=9.92e-5, Valid_Loss=0.128]


Validation Loss Improved (0.182736972675604 ---> 0.12780530099608317)
Model Saved



100%|██████████| 14/14 [00:17<00:00,  1.27s/it, Epoch=3, LR=9.83e-5, Train_Loss=0.054]
100%|██████████| 2/2 [00:01<00:00,  1.12it/s, Epoch=3, LR=9.83e-5, Valid_Loss=0.172]



Training complete in 0h 1m 5s
Best Loss: 0.1278


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
Train Loss,█▃▁
Valid Loss,█▁▇

0,1
Best Loss,0.12781
Train Loss,0.05403
Valid Loss,0.17212





Some weights of the model checkpoint at aubmindlab/bert-base-arabert were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[INFO] Using GPU: Tesla T4



100%|██████████| 14/14 [00:17<00:00,  1.26s/it, Epoch=1, LR=9.98e-5, Train_Loss=0.376]
100%|██████████| 2/2 [00:01<00:00,  1.12it/s, Epoch=1, LR=9.98e-5, Valid_Loss=0.205]


Validation Loss Improved (inf ---> 0.20548670181707174)
Model Saved



100%|██████████| 14/14 [00:18<00:00,  1.29s/it, Epoch=2, LR=9.92e-5, Train_Loss=0.179]
100%|██████████| 2/2 [00:01<00:00,  1.10it/s, Epoch=2, LR=9.92e-5, Valid_Loss=0.0609]


Validation Loss Improved (0.20548670181707174 ---> 0.06093732001526015)
Model Saved



100%|██████████| 14/14 [00:17<00:00,  1.27s/it, Epoch=3, LR=9.83e-5, Train_Loss=0.0834]
100%|██████████| 2/2 [00:01<00:00,  1.11it/s, Epoch=3, LR=9.83e-5, Valid_Loss=0.0672]



Training complete in 0h 1m 6s
Best Loss: 0.0609


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
Train Loss,█▃▁
Valid Loss,█▁▁

0,1
Best Loss,0.06094
Train Loss,0.08342
Valid Loss,0.06724





Some weights of the model checkpoint at aubmindlab/bert-base-arabert were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[INFO] Using GPU: Tesla T4



100%|██████████| 14/14 [00:17<00:00,  1.27s/it, Epoch=1, LR=9.98e-5, Train_Loss=0.37]
100%|██████████| 2/2 [00:01<00:00,  1.11it/s, Epoch=1, LR=9.98e-5, Valid_Loss=0.168]


Validation Loss Improved (inf ---> 0.16823590089793966)
Model Saved



100%|██████████| 14/14 [00:18<00:00,  1.29s/it, Epoch=2, LR=9.92e-5, Train_Loss=0.125]
100%|██████████| 2/2 [00:01<00:00,  1.11it/s, Epoch=2, LR=9.92e-5, Valid_Loss=0.126]


Validation Loss Improved (0.16823590089793966 ---> 0.1264312915942248)
Model Saved



100%|██████████| 14/14 [00:17<00:00,  1.28s/it, Epoch=3, LR=9.83e-5, Train_Loss=0.0552]
100%|██████████| 2/2 [00:01<00:00,  1.10it/s, Epoch=3, LR=9.83e-5, Valid_Loss=0.112]


Validation Loss Improved (0.1264312915942248 ---> 0.11163268588921603)
Model Saved

Training complete in 0h 1m 9s
Best Loss: 0.1116


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
Train Loss,█▃▁
Valid Loss,█▃▁

0,1
Best Loss,0.11163
Train Loss,0.05524
Valid Loss,0.11163





# <span><h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Visualizations</h1></span>

<span style="color: #000508; font-family: Segoe UI; font-size: 1.5em; font-weight: 300;"><a href="https://wandb.ai/dchanda/Jigsaw">View the Complete Dashboard Here ⮕</a></span>

In [None]:
test.dropna(inplace=True)

In [16]:
@torch.no_grad()
def valid_fn(model, dataloader, device):
    model.eval()
    
    dataset_size = 0
    running_loss = 0.0
    
    PREDS = []
    
    bar = tqdm(enumerate(dataloader), total=len(dataloader))
    for step, data in bar:
        ids = data['text_ids'].to(device, dtype = torch.long)
        mask = data['text_mask'].to(device, dtype = torch.long)
        
        outputs = model(ids, mask)
        sig=nn.Sigmoid()
        outputs=sig(outputs)
        # outputs = outputs.argmax(dim=1)
#         print(len(outputs))
#         print(len(np.max(outputs.cpu().detach().numpy(),axis=1)))
        PREDS.append(outputs.detach().cpu().numpy()) 
        # print(outputs.detach().cpu().numpy())
    
    PREDS = np.concatenate(PREDS)
    gc.collect()
    
    return PREDS

In [17]:
def inference(model_paths, dataloader, device):
    final_preds = []
    for i, path in enumerate(model_paths):
        model = JigsawModel('aubmindlab/bert-base-arabert')
        model.to(CONFIG['device'])
        model.load_state_dict(torch.load(path))
        
        print(f"Getting predictions for model {i+1}")
        preds = valid_fn(model, dataloader, device)
        final_preds.append(preds)
    
    final_preds = np.array(final_preds)
    # print(final_preds)
    final_preds = np.mean(final_preds, axis=0)
    # print(final_preds)
    final_preds[final_preds>=0.5] = 1
    final_preds[final_preds<0.5] = 0
    # final_preds= np.argmax(final_preds,axis=1)
    return final_preds

In [18]:
class JigsawDatasetTest(Dataset):
    def __init__(self, df, tokenizer, max_length):
        self.df = df
        self.max_len = max_length
        self.tokenizer = tokenizer
        self.text = df['text'].values
        
    def __len__(self):
        return len(self.df)
    
    def __getitem__(self, index):
        text = self.text[index]
        inputs = self.tokenizer.encode_plus(
                                text,
                                truncation=True,
                                add_special_tokens=True,
                                max_length=self.max_len,
                                padding='max_length'
                            )
       
        
        ids = inputs['input_ids']
        mask = inputs['attention_mask']
        
       
        
        
        return {
            'text_ids': torch.tensor(ids, dtype=torch.long),
            'text_mask': torch.tensor(mask, dtype=torch.long),
           
        }

In [19]:
test_sarc=test[['tweet']]
test_sarc['label']=1
test_not_sarc=test[['rephrase']]
test_not_sarc['label']=0

KeyError: ignored

In [None]:
test_not_sarc.rename(columns={'rephrase': 'text'}, inplace=True)

In [None]:
test_sarc.rename(columns={'tweet': 'text'}, inplace=True)

In [None]:
test_final=pd.concat([test_not_sarc,test_sarc])

In [None]:
test_final

Unnamed: 0,text,label
670,أنا ضهرى واجعنى مش قادر أبص للوراء فى الماضى,0
635,الجواز أكبر جريمه علشان كده بيطلبو شهود عليها,0
628,الدنيا بدون اخوات بنات لا يمكن العيش فيها مثل ...,0
556,مفيش غيرك بيتصرف التصرفات الغريبة دى,0
577,مفيش راجل بيدافع عن حد دلوقتى,0
...,...,...
71,هاي السنة فش احلى منها 🤯😂,1
106,الناس بتشتري لب ولا اكننا في العيد مش في اعصار,1
270,تسلم النحلة اللى جابتك ياعسل,1
435,ياعم رمضان صبحى يغور َوجوده زي عدمه,1


In [None]:
test_dataset = JigsawDatasetTest(test_final, tokenizer=CONFIG["tokenizer"], max_length=CONFIG['max_length'])
test_loader = DataLoader(test_dataset, batch_size=CONFIG['valid_batch_size'], 
                              num_workers=2, shuffle=False, pin_memory=True)

In [None]:

MODEL_PATH_2=['/content/drive/MyDrive/ISarcasm/TaskC_models/Arabert_task_c/Loss-Fold-0.bin','/content/drive/MyDrive/ISarcasm/TaskC_models/Arabert_task_c/Loss-Fold-1.bin','/content/drive/MyDrive/ISarcasm/TaskC_models/Arabert_task_c/Loss-Fold-2.bin','/content/drive/MyDrive/ISarcasm/TaskC_models/Arabert_task_c/Loss-Fold-3.bin','/content/drive/MyDrive/ISarcasm/TaskC_models/Arabert_task_c/Loss-Fold-4.bin']
# MODEL_PATH_2=['/content/drive/MyDrive/ISarcasm/Models_Task_B/bert_tweet_kim_cnn/Loss-Fold-0.bin']
preds = inference(MODEL_PATH_2, test_loader, CONFIG['device'])

Some weights of the model checkpoint at aubmindlab/bert-base-arabert were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Getting predictions for model 1


100%|██████████| 5/5 [00:02<00:00,  2.13it/s]
Some weights of the model checkpoint at aubmindlab/bert-base-arabert were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Getting predictions for model 2


100%|██████████| 5/5 [00:02<00:00,  2.21it/s]
Some weights of the model checkpoint at aubmindlab/bert-base-arabert were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Getting predictions for model 3


100%|██████████| 5/5 [00:02<00:00,  2.23it/s]
Some weights of the model checkpoint at aubmindlab/bert-base-arabert were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Getting predictions for model 4


100%|██████████| 5/5 [00:02<00:00,  2.23it/s]
Some weights of the model checkpoint at aubmindlab/bert-base-arabert were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Getting predictions for model 5


100%|██████████| 5/5 [00:02<00:00,  1.88it/s]


In [None]:
from sklearn.metrics import jaccard_score,f1_score,accuracy_score,recall_score,precision_score,classification_report
def print_statistics(y, y_pred):
    accuracy = accuracy_score(y, y_pred)
    precision =precision_score(y, y_pred, average='weighted')
    recall = recall_score(y, y_pred, average='weighted')
    f_score = f1_score(y, y_pred, average='weighted')
    print('Accuracy: %.3f\nPrecision: %.3f\nRecall: %.3f\nF_score: %.3f\n'
          % (accuracy, precision, recall, f_score))
    print(classification_report(y, y_pred))
    return accuracy, precision, recall, f_score

In [None]:
print(print_statistics(test_final['label'],preds))

Accuracy: 0.822
Precision: 0.839
Recall: 0.822
F_score: 0.820

              precision    recall  f1-score   support

           0       0.76      0.93      0.84       149
           1       0.91      0.71      0.80       149

    accuracy                           0.82       298
   macro avg       0.84      0.82      0.82       298
weighted avg       0.84      0.82      0.82       298

(0.8221476510067114, 0.8387646835922697, 0.8221476510067114, 0.819939577039275)


![](https://i.imgur.com/TSIUdfS.jpg)

![Upvote!](https://img.shields.io/badge/Upvote-If%20you%20like%20my%20work-07b3c8?style=for-the-badge&logo=kaggle)