## CA 2, LLMs Spring 2024

- **Name:** Majid Faridfar
- **Student ID:** 819199569

---

# What are Soft prompts?
Soft prompts are learnable tensors concatenated with the input embeddings that can be optimized to a dataset; the downside is that they aren’t human readable because you aren’t matching these “virtual tokens” to the embeddings of a real word.
<br>
<div>
<img src="https://www.researchgate.net/publication/366062946/figure/fig1/AS:11431281105340756@1670383256990/The-comparison-between-the-previous-T5-prompt-tuning-method-part-a-and-the-introduced.jpg"/>
</div>

Read More:
<br>[Youtube : PEFT and Soft Prompt](https://www.youtube.com/watch?v=8uy_WII76L0)
<br>[Paper: The Power of Scale for Parameter-Efficient Prompt Tuning](https://arxiv.org/pdf/2104.08691.pdf)
https://arxiv.org/pdf/2101.00190.pdf
<br>[Paper: Prefix-Tuning: Optimizing Continuous Prompts for Generation](https://arxiv.org/pdf/2101.00190.pdf)

# Part 1 (20 Points)
**A) Compare and contrast model tuning and prompt tuning in terms of their effectiveness for specific downstream tasks. (5 Points)**

> **Model tuning**: It involves fine-tuning the entire pre-trained language model on a downstream task-specific dataset. During fine-tuning, the model's parameters are updated across all layers based on the task data. It is generally effective for a wide range of tasks and in this way model can capture intricate task nuances as the entire model is adapted. But it requires a large amount of task-specific data to be effective and it is computationally expensive and time-consuming, especially for large models and datasets.
>
> **Prompt Tuning**: It involves providing task-specific prompts or instructions to guide the model's predictions without fine-tuning its parameters. Soft prompts, in particular, are used to softly constrain the model's output. This approach is effective for tasks where limited task-specific data is available. It is relatively computationally efficient compared to full model fine-tuning, and also doing it is possible when the model is not available (e.g. GPT-4). On the other hand, it may not be as effective for complex tasks that require deep understanding or extensive context. It requires careful design of prompts, which might be challenging for some tasks and performance heavily relies on the quality and relevance of the prompts provided. The other challenge is the length of the prompt which is limited.

**B) Explore the challenges associated with interpreting soft prompts in the continuous embedding space and propose potential solutions. (5 Points)**

> 1. The lack of direct correspondence between continuous embeddings and human-interpretable concepts is the most important challenge. Unlike discrete tokens or words, which have clear meanings, embeddings represent abstract numerical vectors in a high-dimensional space, making their interpretation non-trivial. Additionally, soft prompts may contain nuanced or ambiguous language, further complicating their interpretation.
>   - *Solution*: Incorporating interpretability techniques that map the continuous embedding space to more interpretable representations. For example, dimensionality reduction techniques such as `Principal Component Analysis (PCA)` or `t-SNE` can be applied to visualize embeddings in lower-dimensional spaces, allowing for easier interpretation and analysis. Additionally, methods like `saliency mapping` or `attention visualization` can highlight the parts of the input or context that the model focuses on when generating outputs based on soft prompts, providing insights into how the prompts influence model behavior.
> 2. Varying levels of granularity and specificity can impact the effectiveness of soft prompts. For instance, a prompt that is too general or vague may not provide sufficient guidance to the model, leading to ambiguous or irrelevant outputs. Conversely, a prompt that is overly specific may restrict the model's flexibility and hinder its ability to generate diverse or creative responses. Thus, finding the right balance in the granularity and specificity of soft prompts is crucial to ensure they effectively guide the model while allowing for flexibility and creativity in generating outputs.
>   - *Solution*: standardizing the construction of soft prompts and providing guidelines for their formulation can help ensure consistent granularity and specificity across prompts. Moreover, incorporating domain-specific knowledge or leveraging pre-trained embeddings tailored to the task domain can enhance the relevance and effectiveness of soft prompts.
> 3. Soft prompts may suffer from noise or redundancy, which can diminish their interpretability and effectiveness. Noise refers to irrelevant or distracting information in the prompt that does not contribute to guiding the model effectively. Redundancy, on the other hand, involves the repetition of similar information within the prompt, leading to inefficiencies in model learning and potentially biasing the generated outputs. Addressing noise and redundancy in soft prompts is essential to enhance their clarity and relevance, thereby improving the interpretability and effectiveness of the prompts in guiding the model's behavior accurately towards desired outputs.
>   - *Solution*: Techniques such as filtering out irrelevant words or phrases, applying regularization methods during training to encourage the model to focus on relevant prompt information, and incorporating attention mechanisms to dynamically weigh the importance of different parts of the prompt can help reduce noise and improve the relevance of soft prompts.

**C) What is the effect of initializing prompts randomly versus initializing them from the vocabulary, and how does this impact the performance of prompt tuning? (5 Points)**

>  When prompts are initialized randomly, they lack any inherent semantic meaning or relevance to the task. As a result, the model may struggle to effectively utilize these prompts to guide its predictions, but it can be also effective, because of the diversity among prompts, as each prompt starts from a different point in the embedding space. But generally speaking, they are less likely to provide useful guidance to the model, leading to suboptimal performance. The model may fail to capture relevant task-specific information, resulting in lower accuracy and generalization performance. Also randomly initialized prompts may require more fine-tuning steps to converge to effective representations and the resulting embeddings may not align well with the downstream task.
>
> On the other hand, Initializing prompts from the vocabulary involves selecting words or phrases from the model's vocabulary that are semantically related to the task. These prompts are more likely to provide relevant guidance to the model. They are more effective in guiding the model's predictions, as they contain semantically meaningful information related to the task. This can lead to improved performance in terms of accuracy, generalization, and task-specific metrics. Also they require fewer fine-tuning steps to adapt to the task-specific context.

**D) How is the optimization process in the prefix tuning(<br>[Prefix-Tuning: Optimizing Continuous Prompts for Generation](https://arxiv.org/pdf/2101.00190.pdf)) and Why did they use this technique? (5 Points)**

> The key idea which is provided is that instead of fine-tuning the entire model, prefix tuning optimizes and tuning some small continuous vector (the prefix) added to each transformer block (i.e. layer) while keeping the original language model parameters frozen. In the following, a list of the reasons why they use it, is provided:
>
> - **Space Efficiency**: Only the prefixes need to be stored for each task, making it modular and memory-efficient.
>
> - **Low Parameter Update**: Learning only a small fraction of the parameters (0.1%) achieves comparable performance.
>
> - **Generalization**: Prefix tuning extrapolates better to examples with unseen topics during training.
>
> - **Modularity**: Despite full fine-tuning, by just replacing prefixes, we can adopt them to a completely different task.

# Part 2 (35 points)

## Imports

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import transformers
from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoModel
from transformers import AdamW
from tqdm import tqdm
import warnings
warnings.filterwarnings("ignore")

## Model Selection & Constants
We will use `bert-fa-base-uncased` as our base model from Hugging Face ([HF_Link](https://huggingface.co/HooshvareLab/bert-fa-base-uncased)). For our tuning, we intend to utilize 20 soft prompt tokens.

In [2]:
class CONFIG:
    seed = 42
    max_len = 128
    train_batch = 16
    valid_batch = 32
    epochs = 10
    n_tokens=20
    learning_rate = 0.01
    model_name = 'HooshvareLab/bert-fa-base-uncased'
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

config.json:   0%|          | 0.00/440 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/1.20M [00:00<?, ?B/s]

## Dataset

The dataset contains around 7000 Persian sentences and their corresponding polarity, and have been manually classified into 5 categories (i.e. Angry).

### Load Dataset

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [4]:
import pandas as pd
file_path = "drive/MyDrive/LLM/CA2/softprompt_dataset.csv"
df = pd.read_csv(file_path)

### Pre-Processing

In [5]:
%pip install -U clean-text[gpl]
%pip install hazm

Collecting clean-text[gpl]
  Downloading clean_text-0.6.0-py3-none-any.whl (11 kB)
Collecting emoji<2.0.0,>=1.0.0 (from clean-text[gpl])
  Downloading emoji-1.7.0.tar.gz (175 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m175.4/175.4 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting ftfy<7.0,>=6.0 (from clean-text[gpl])
  Downloading ftfy-6.2.0-py3-none-any.whl (54 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m54.4/54.4 kB[0m [31m7.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting unidecode<2.0.0,>=1.1.1 (from clean-text[gpl])
  Downloading Unidecode-1.3.8-py3-none-any.whl (235 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m235.5/235.5 kB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m
Building wheels for collected packages: emoji
  Building wheel for emoji (setup.py) ... [?25l[?25hdone
  Created wheel for emoji: filename=emoji-1.7.0-py3-none-any.whl size=1710

In [6]:
import re
from cleantext import clean
from hazm import *

In [7]:
import re
def cleanhtml(raw_html):
    cleanr = re.compile('<.*?>')
    cleantext = re.sub(cleanr, '', raw_html)
    return cleantext

def cleaning(text):
    text = text.strip()

    # regular cleaning
    text = clean(text,
        fix_unicode=True,
        to_ascii=False,
        lower=True,
        no_line_breaks=True,
        no_urls=True,
        no_emails=True,
        no_phone_numbers=True,
        no_numbers=False,
        no_digits=False,
        no_currency_symbols=True,
        no_punct=False,
        replace_with_url="",
        replace_with_email="",
        replace_with_phone_number="",
        replace_with_number="",
        replace_with_digit="0",
        replace_with_currency_symbol="",
    )

    text = cleanhtml(text)

    # normalizing
    # normalizer = hazm.Normalizer()
    # text = normalizer.normalize(text)

    wierd_pattern = re.compile("["
        u"\U0001F600-\U0001F64F"  # emoticons
        u"\U0001F300-\U0001F5FF"  # symbols & pictographs
        u"\U0001F680-\U0001F6FF"  # transport & map symbols
        u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
        u"\U00002702-\U000027B0"
        u"\U000024C2-\U0001F251"
        u"\U0001f926-\U0001f937"
        u'\U00010000-\U0010ffff'
        u"\u200d"
        u"\u2640-\u2642"
        u"\u2600-\u2B55"
        u"\u23cf"
        u"\u23e9"
        u"\u231a"
        u"\u3030"
        u"\ufe0f"
        u"\u2069"
        u"\u2066"
        u"\u2068"
        u"\u2067"
        "]+", flags=re.UNICODE)

    text = wierd_pattern.sub(r'', text)

    # removing extra spaces, hashtags
    text = re.sub("#", "", text)
    text = re.sub("\s+", " ", text)

    return text

In [8]:
from tqdm import tqdm
from concurrent.futures import ThreadPoolExecutor

tqdm.pandas()

def parallel_apply_with_progress(df, func, n_workers=4):
    with ThreadPoolExecutor(max_workers=n_workers) as executor, tqdm(total=len(df)) as pbar:
        def update(*args):
            pbar.update()

        results = []
        for result in executor.map(func, df['text']):
            results.append(result)
            update()

        df['text'] = pd.Series(results)

    return df

In [9]:
df = parallel_apply_with_progress(df, cleaning)

100%|██████████| 7023/7023 [00:04<00:00, 1448.35it/s]


In [10]:
from sklearn.model_selection import train_test_split

X_train, X_val, y_train, y_val = train_test_split(df.index.values,
                                                  df.label.values,
                                                  test_size=0.15,
                                                  random_state=42,
                                                  stratify=df.label.values)

train_df = df.loc[X_train]
validation_df = df.loc[X_val]

In [11]:
possible_labels = df.label.unique()

label_dict = {}
for index, possible_label in enumerate(possible_labels):
    label_dict[possible_label] = index
label_dict

{0: 0, 1: 1, 2: 2, -1: 3, -2: 4}

In [12]:
train_df['label'] = train_df.label.replace(label_dict)
validation_df['label'] = validation_df.label.replace(label_dict)

### Create Dataset Class (5 Points)
In this step we will getting our dataset ready for training.

In this part we will define BERT-based dataset class for text classification, with configuration parameters. It preprocesses text data and tokenizes it using the BERT tokenizer.


Complete the preprocessing step in the __getitem__ method by adding padding tokens to 'input_ids' and 'attention_mask',
The count of this pad tokens is the same as `n_tokens`.

In [13]:
class BERTDataset(Dataset):
    def __init__(self,df):
        self.text = df['text'].values
        self.labels = df['label'].values
        self.all_labels = [0, 1, 2, 3, 4]
        self.max_len = CONFIG.max_len
        self.tokenizer = CONFIG.tokenizer
        self.n_tokens=CONFIG.n_tokens

    def __len__(self):
        return len(self.text)

    def __getitem__(self, index):
        text = self.text[index]
        text = ' '.join(text.split())
        inputs = self.tokenizer.encode_plus(
            text,
            None,
            truncation=True,
            add_special_tokens=True,
            max_length=self.max_len,
            padding='max_length',
            return_token_type_ids=True
        )

        ######### Your code begins #########
        inputs['input_ids'] = torch.tensor(inputs['input_ids'] + [self.tokenizer.pad_token_id]*self.n_tokens, dtype=torch.long)
        inputs['attention_mask'] = torch.tensor(inputs['attention_mask'] + [0]*self.n_tokens, dtype=torch.long)
        ######### Your code ends ###########

        labels = self.labels[index]
        label_dict = {label: (label == labels) for label in self.all_labels}
        labels_tensor = torch.tensor([float(label_dict[label]) for label in self.all_labels])
        return {
            'ids': inputs['input_ids'],
            'mask': inputs['attention_mask'],
            'label': labels_tensor
        }

In [14]:
train_dataset = BERTDataset(train_df)
validation_dataset = BERTDataset(validation_df)

## Define Prompt Embedding Layer (15 Points)
In this part we will define our prompt layer in `PROMPTEmbedding` module.


<font color='#73FF73'><b>You have to complete</b></font> `initialize_embedding` and  `forward` <font color='#73FF73'><b>functions.</b></font>

In `initialize_embedding` function initialize the learned embeddings based on whether they should be initialized from the vocabulary or randomly within the specified range.

In `forward` function, modify the input_embedding to extract the relevant part based on n_tokens.

Repeat the learned_embedding to match the size of input_embedding.

Concatenate the learned_embedding and input_embedding properly.


In [None]:
class PROMPTEmbedding(nn.Module):
    def __init__(self,
                emb_layer: nn.Embedding,
                n_tokens: int = 20,
                random_range: float = 0.5,
                initialize_from_vocab: bool = True):

      super(PROMPTEmbedding, self).__init__()
      self.emb_layer = emb_layer
      self.n_tokens = n_tokens
      self.learned_embedding = nn.parameter.Parameter(self.initialize_embedding(emb_layer,
                                                                               n_tokens,
                                                                               random_range,
                                                                               initialize_from_vocab))

    def initialize_embedding(self,
                             emb_layer: nn.Embedding,
                             n_tokens: int = 20,
                             random_range: float = 0.5,
                             initialize_from_vocab: bool = True):

      if initialize_from_vocab:
        ######### Your code begins #########
        vocab_emb = self.emb_layer.weight[:n_tokens].clone().detach()

        return vocab_emb

      else:
        # random_emb =
        random_emb = torch.FloatTensor(n_tokens, emb_layer.weight.size(1)).uniform_(-random_range, random_range)
        ######### Your code ends ###########
      return random_emb


    def forward(self, tokens):
      ######### Your code begins #########
      input_embedding = self.emb_layer(tokens[:, self.n_tokens:])
      learned_embedding = self.learned_embedding.repeat(input_embedding.size(0), 1, 1)
      joined_embedding = torch.cat([learned_embedding, input_embedding], dim=1)
      ######### Your code ends ###########
      return joined_embedding

## Replace model's embedding layer with our layer (5 Points)

In [None]:
# Define your BERT model
model = AutoModelForSequenceClassification.from_pretrained(CONFIG.model_name, num_labels=5, output_attentions = False,
                                                           output_hidden_states = False).to(CONFIG.device)
######### Your code begins #########
model.set_input_embeddings(PROMPTEmbedding(model.get_input_embeddings(), n_tokens=20, initialize_from_vocab=True).to(CONFIG.device))
######### Your code ends ###########

pytorch_model.bin:   0%|          | 0.00/654M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at HooshvareLab/bert-fa-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


## Freezing Model Parameters (5 points)
In this part we will freeze entire model except `learned_embedding`

In [None]:
######### Your code begins #########
for name, param in model.named_parameters():
    if 'learned_embedding' not in name:
        param.requires_grad = False

for param in model.get_input_embeddings().parameters():
    param.requires_grad = True
######### Your code ends ###########

## Optimizer


In [None]:
from transformers import AdamW

optimizer = AdamW(model.parameters(), lr=CONFIG.learning_rate)

## Training & Evaluation


### Define dataloaders

In [15]:
train_loader = DataLoader(train_dataset, batch_size=CONFIG.train_batch,
                              num_workers=2, shuffle=True, pin_memory=True)

validation_loader = DataLoader(validation_dataset, batch_size=CONFIG.valid_batch,
                              num_workers=2, shuffle=True, pin_memory=True)

### Define evaluation function

In [16]:
from sklearn.metrics import f1_score

def f1_score_func(preds, labels):
    preds_flat = np.argmax(preds, axis=1).flatten()
    labels_flat = np.argmax(labels, axis=1).flatten()
    return f1_score(labels_flat, preds_flat, average='weighted')

In [17]:
def evaluate(val_dataloader):

    model.eval()

    loss_val_total = 0
    predictions, true_vals = [], []

    for batch in val_dataloader:


        inputs = {'input_ids':      batch['ids'].to(CONFIG.device),
                  'attention_mask': batch['mask'].to(CONFIG.device),
                  'labels':         batch['label'].to(CONFIG.device),
                 }

        with torch.no_grad():
            outputs = model(**inputs)

        loss = outputs["loss"]
        logits = outputs["logits"]
        loss_val_total += loss.item()

        logits = logits.detach().cpu().numpy()
        label_ids = inputs['labels'].cpu().numpy()
        predictions.append(logits)
        true_vals.append(label_ids)

    loss_val_avg = loss_val_total/len(val_dataloader)

    predictions = np.concatenate(predictions, axis=0)
    true_vals = np.concatenate(true_vals, axis=0)

    return loss_val_avg, predictions, true_vals

### Define trainng loop


In [18]:
def train(model, optimizer, train_dataloader, val_dataloader):

    epochs = CONFIG.epochs

    for epoch in tqdm(range(1, epochs+1)):

      model.train()

      loss_train_total = 0

      progress_bar = tqdm(train_loader, desc='Epoch {:1d}'.format(epoch), leave=False, disable=True)

      for batch in progress_bar:

        optimizer.zero_grad()

        inputs = {'input_ids':      batch['ids'].to(CONFIG.device),
                  'attention_mask': batch['mask'].to(CONFIG.device),
                  'labels':         batch['label'].to(CONFIG.device),
                }

        output = model(**inputs)

        loss = output["loss"]
        loss_train_total += loss.item()

        loss.backward()
        optimizer.step()

        progress_bar.set_postfix({'training_loss': '{:.3f}'.format(loss.item()/len(batch))})


      tqdm.write(f'\nEpoch {epoch}')
      loss_train_avg = loss_train_total/len(train_loader)
      tqdm.write(f'Training loss: {loss_train_avg}')


      val_loss, predictions, true_vals = evaluate(val_dataloader)
      val_f1 = f1_score_func(predictions, true_vals)
      tqdm.write(f'Validation loss: {val_loss}')
      tqdm.write(f'F1 Score (Weighted): {val_f1}')


### Run

In [None]:
train(model=model, optimizer=optimizer, train_dataloader=train_loader, val_dataloader=validation_loader)

  0%|          | 0/10 [01:55<?, ?it/s]


Epoch 1
Training loss: 0.46414429180443606


 10%|█         | 1/10 [02:06<18:58, 126.49s/it]

Validation loss: 0.44740983121322864
F1 Score (Weighted): 0.33053933691732373


 10%|█         | 1/10 [04:04<18:58, 126.49s/it]


Epoch 2
Training loss: 0.4406909834254872


 20%|██        | 2/10 [04:15<17:03, 127.89s/it]

Validation loss: 0.43827955650560785
F1 Score (Weighted): 0.3511486082747115


 20%|██        | 2/10 [06:13<17:03, 127.89s/it]


Epoch 3
Training loss: 0.422047401972633


 30%|███       | 3/10 [06:24<14:58, 128.37s/it]

Validation loss: 0.4315464361147447
F1 Score (Weighted): 0.3954197636354205


 30%|███       | 3/10 [08:22<14:58, 128.37s/it]


Epoch 4
Training loss: 0.40422259446452646


 40%|████      | 4/10 [08:33<12:52, 128.67s/it]

Validation loss: 0.4334870901974765
F1 Score (Weighted): 0.3812497371749612


 40%|████      | 4/10 [10:31<12:52, 128.67s/it]


Epoch 5
Training loss: 0.3919458670571526


 50%|█████     | 5/10 [10:42<10:44, 128.81s/it]

Validation loss: 0.43270766102906427
F1 Score (Weighted): 0.39798606056613095


 50%|█████     | 5/10 [12:40<10:44, 128.81s/it]


Epoch 6
Training loss: 0.38016493857544376


 60%|██████    | 6/10 [12:51<08:35, 128.78s/it]

Validation loss: 0.445610346216144
F1 Score (Weighted): 0.35273101369667165


 60%|██████    | 6/10 [14:49<08:35, 128.78s/it]


Epoch 7
Training loss: 0.3721821090196543


 70%|███████   | 7/10 [14:59<06:26, 128.77s/it]

Validation loss: 0.43875800479542126
F1 Score (Weighted): 0.39046256228350473


 70%|███████   | 7/10 [16:58<06:26, 128.77s/it]


Epoch 8
Training loss: 0.3634722029620951


 80%|████████  | 8/10 [17:08<04:17, 128.84s/it]

Validation loss: 0.4399853285514947
F1 Score (Weighted): 0.39234451106361284


 80%|████████  | 8/10 [19:06<04:17, 128.84s/it]


Epoch 9
Training loss: 0.3594559298798362


 90%|█████████ | 9/10 [19:17<02:08, 128.78s/it]

Validation loss: 0.4448000192642212
F1 Score (Weighted): 0.3923359444669264


 90%|█████████ | 9/10 [21:15<02:08, 128.78s/it]


Epoch 10
Training loss: 0.3542546399216601


100%|██████████| 10/10 [21:26<00:00, 128.63s/it]

Validation loss: 0.4439676647836512
F1 Score (Weighted): 0.392965352709498





## Using OpenDelta library (5 Points)

In [19]:
# !pip install opendelta
!pip install git+https://github.com/thunlp/OpenDelta.git -q

  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m542.0/542.0 kB[0m [31m9.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m623.2/623.2 kB[0m [31m14.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m207.3/207.3 kB[0m [31m13.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m226.4/226.4 kB[0m [31m13.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m89.7/89.7 kB[0m [31m11.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━

Use `OpenDelta` library to do the same thing. [link](https://opendelta.readthedocs.io/en/latest/modules/deltas.html)

For hyperparameters, test with `N_SOFT_PROMPT_TOKENS=10` and `N_SOFT_PROMPT_TOKENS=20` and report them.

### N_SOFT_PROMPT_TOKENS=$10$

In [33]:
######### Your code begins #########
model_10 = AutoModelForSequenceClassification.from_pretrained(CONFIG.model_name, num_labels=5, output_attentions = False,
                                                              output_hidden_states = False)

from opendelta.delta_models.soft_prompt import SoftPromptModel

soft_prompt_model_10 = SoftPromptModel(backbone_model=model_10, soft_token_num=10)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at HooshvareLab/bert-fa-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [34]:
for name, param in model_10.named_parameters():
    if 'soft' not in name:
        param.requires_grad = False
    else:
        param.requires_grad = True

In [35]:
from transformers import AdamW

model = model_10.to(CONFIG.device)
optimizer_10 = AdamW(model.parameters(), lr=CONFIG.learning_rate)

In [36]:
train(model=model, optimizer=optimizer_10, train_dataloader=train_loader, val_dataloader=validation_loader)

  0%|          | 0/10 [01:41<?, ?it/s]


Epoch 1
Training loss: 0.4660185673498215


 10%|█         | 1/10 [01:50<16:38, 110.94s/it]

Validation loss: 0.44935640692710876
F1 Score (Weighted): 0.2579362611466247


 10%|█         | 1/10 [03:32<16:38, 110.94s/it]


Epoch 2
Training loss: 0.45016806449163405


 20%|██        | 2/10 [03:41<14:45, 110.65s/it]

Validation loss: 0.42353456580277643
F1 Score (Weighted): 0.4128747167666182


 20%|██        | 2/10 [05:23<14:45, 110.65s/it]


Epoch 3
Training loss: 0.4387865806805259


 30%|███       | 3/10 [05:33<12:59, 111.29s/it]

Validation loss: 0.4118667859019655
F1 Score (Weighted): 0.3864357699433583


 30%|███       | 3/10 [07:15<12:59, 111.29s/it]


Epoch 4
Training loss: 0.4289764070255871


 40%|████      | 4/10 [07:24<11:08, 111.38s/it]

Validation loss: 0.4137817241928794
F1 Score (Weighted): 0.3650270739911692


 40%|████      | 4/10 [09:07<11:08, 111.38s/it]


Epoch 5
Training loss: 0.4256678369434122


 50%|█████     | 5/10 [09:16<09:17, 111.46s/it]

Validation loss: 0.40112626462271717
F1 Score (Weighted): 0.41127499458424965


 50%|█████     | 5/10 [10:58<09:17, 111.46s/it]


Epoch 6
Training loss: 0.4198895745417651


 60%|██████    | 6/10 [11:08<07:25, 111.50s/it]

Validation loss: 0.39659980752251367
F1 Score (Weighted): 0.4286962704813638


 60%|██████    | 6/10 [12:50<07:25, 111.50s/it]


Epoch 7
Training loss: 0.415539147382114


 70%|███████   | 7/10 [12:59<05:34, 111.37s/it]

Validation loss: 0.3989231568394285
F1 Score (Weighted): 0.4135581317129942


 70%|███████   | 7/10 [14:40<05:34, 111.37s/it]


Epoch 8
Training loss: 0.41356448725583084


 80%|████████  | 8/10 [14:49<03:42, 111.07s/it]

Validation loss: 0.3961534590432138
F1 Score (Weighted): 0.40244766402375404


 80%|████████  | 8/10 [16:30<03:42, 111.07s/it]


Epoch 9
Training loss: 0.4130800328949556


 90%|█████████ | 9/10 [16:40<01:50, 110.89s/it]

Validation loss: 0.39264370184956177
F1 Score (Weighted): 0.41737811619142146


 90%|█████████ | 9/10 [18:21<01:50, 110.89s/it]


Epoch 10
Training loss: 0.41120904070489545


100%|██████████| 10/10 [18:31<00:00, 111.13s/it]

Validation loss: 0.389098657803102
F1 Score (Weighted): 0.4718776819375309





### N_SOFT_PROMPT_TOKENS=$20$

In [37]:
model_20 = AutoModelForSequenceClassification.from_pretrained(CONFIG.model_name, num_labels=5, output_attentions = False,
                                                              output_hidden_states = False)

from opendelta.delta_models.soft_prompt import SoftPromptModel

soft_prompt_model_20 = SoftPromptModel(backbone_model=model_20, soft_token_num=20)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at HooshvareLab/bert-fa-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [38]:
for name, param in model_20.named_parameters():
    if 'soft' not in name:
        param.requires_grad = False
    else:
        param.requires_grad = True

In [39]:
from transformers import AdamW

model = model_20.to(CONFIG.device)
optimizer_20 = AdamW(model.parameters(), lr=CONFIG.learning_rate)

In [40]:
train(model=model, optimizer=optimizer_20, train_dataloader=train_loader, val_dataloader=validation_loader)
######### Your code ends ###########

  0%|          | 0/10 [01:52<?, ?it/s]


Epoch 1
Training loss: 0.47246310680626546


 10%|█         | 1/10 [02:02<18:26, 122.89s/it]

Validation loss: 0.46072265686410846
F1 Score (Weighted): 0.13008501385989663


 10%|█         | 1/10 [03:56<18:26, 122.89s/it]


Epoch 2
Training loss: 0.45844229896438315


 20%|██        | 2/10 [04:06<16:25, 123.14s/it]

Validation loss: 0.4437518164967046
F1 Score (Weighted): 0.3716893018005042


 20%|██        | 2/10 [05:59<16:25, 123.14s/it]


Epoch 3
Training loss: 0.4399613937273382


 30%|███       | 3/10 [06:09<14:23, 123.33s/it]

Validation loss: 0.41151720104795514
F1 Score (Weighted): 0.3992837151417445


 30%|███       | 3/10 [08:02<14:23, 123.33s/it]


Epoch 4
Training loss: 0.4257296452069665


 40%|████      | 4/10 [08:12<12:19, 123.28s/it]

Validation loss: 0.4036531484488285
F1 Score (Weighted): 0.46779268767849197


 40%|████      | 4/10 [10:05<12:19, 123.28s/it]


Epoch 5
Training loss: 0.41976351159460407


 50%|█████     | 5/10 [10:16<10:16, 123.21s/it]

Validation loss: 0.39584834828521265
F1 Score (Weighted): 0.45756146675578063


 50%|█████     | 5/10 [12:08<10:16, 123.21s/it]


Epoch 6
Training loss: 0.4153782092632457


 60%|██████    | 6/10 [12:19<08:12, 123.13s/it]

Validation loss: 0.39861914334875165
F1 Score (Weighted): 0.45901668602005496


 60%|██████    | 6/10 [14:11<08:12, 123.13s/it]


Epoch 7
Training loss: 0.41122936158575474


 70%|███████   | 7/10 [14:21<06:08, 122.95s/it]

Validation loss: 0.3970255373102246
F1 Score (Weighted): 0.43542123505220437


 70%|███████   | 7/10 [16:14<06:08, 122.95s/it]


Epoch 8
Training loss: 0.41085289992431906


 80%|████████  | 8/10 [16:24<04:05, 122.97s/it]

Validation loss: 0.3919090746027051
F1 Score (Weighted): 0.44359009236813446


 80%|████████  | 8/10 [18:17<04:05, 122.97s/it]


Epoch 9
Training loss: 0.4067633489077104


 90%|█████████ | 9/10 [18:27<02:03, 123.07s/it]

Validation loss: 0.38809269124811346
F1 Score (Weighted): 0.4765253721610457


 90%|█████████ | 9/10 [20:20<02:03, 123.07s/it]


Epoch 10
Training loss: 0.40440401793482467


100%|██████████| 10/10 [20:31<00:00, 123.10s/it]

Validation loss: 0.3879411681131883
F1 Score (Weighted): 0.5245259005569205



