# Advanced NLP Assignment 3

12112627 李乐平

## Q1

**(1)**
 Explain three different points between P-Tuning V1 and P-Tuning V2.
 

- **Applied tasks**: P-Tuning V1 can apply to Knowledge Probing and NLU tasks, while P-Tuning V2 adapts to Sequence Tagging and NLU tasks.
- **Classification head**: P-Tuning V1 uses a language modeling head to predict verbalizers, which are natural language tokens that correspond to labels. P-Tuning V2 uses a linear classifier on top of the [CLS] token or the target tokens, which is simpler and more compatible with sequence labeling tasks.
- **Reparameterization**: P-Tuning V1 always uses a reparameterization encoder such as an MLP to transform the trainable embeddings. P-Tuning V2 finds that the usefulness of reparameterization depends on the tasks and datasets, and sometimes a simple embedding layer is enough.

**(2)**
 Write code to finetune Tiny-BERT with P-Tuning v2 method using the MRPC Dataset.


In [1]:
# !python -m pip install --user --upgrade peft datasets
# !python -m pip install --upgrade numpy
# !python -m pip install --user transformers
# !python -m pip install transformers 
# !pip list

In [2]:
import transformers

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
import transformers
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer
from datasets import load_dataset, load_metric
from peft import get_peft_config, get_peft_model, PrefixTuningConfig, TaskType, PeftType, PromptEncoderConfig

import torch
import transformers
import numpy as np
import evaluate
import warnings

warnings.simplefilter("ignore")

# Load the accuracy metric
global_metric = load_metric("./accuracy/accuracy.py")

# Load the MRPC dataset
dataset = load_dataset(
    path = "json", 
    data_dir = "./Q1_MRPC_dataset", 
    data_files = {"train": "train.jsonl", "test": "test.jsonl", "validation": "validation.jsonl"}
)

# Load the Tiny-BERT model
model_dir = "./Q1_model_checkpoint_tinybert/"
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForSequenceClassification.from_pretrained(model_dir, num_labels=2)

# # Apply P-Tuning V2 to the model, where "prefix_projection = True" indicates P-Tuning V2 is used
# peft_config = PrefixTuningConfig(task_type=TaskType.SEQ_CLS, num_virtual_tokens=5, prefix_projection=True)
# model = get_peft_model(model, peft_config) 

# Apply P-Tuning V1 to the model
peft_config = PromptEncoderConfig(task_type=TaskType.SEQ_CLS, num_virtual_tokens=5)
model = get_peft_model(model, peft_config) 


# Tokenize the dataset
def tokenize_function(example):
    return tokenizer(example["text1"], example["text2"], padding=True, truncation=True, max_length=128)

tokenized_dataset = dataset.map(tokenize_function, batched=True)

# Define the optimizer and the learning rate scheduler
# # For P-tuning V2
# optimizer = torch.optim.AdamW(model.parameters(), lr=2e-4)
# scheduler = transformers.get_linear_schedule_with_warmup(optimizer, num_warmup_steps=0, num_training_steps=len(dataset["train"]) * 100)

# For P-tuning V1
optimizer = torch.optim.AdamW(model.parameters(), lr=3e-3)
scheduler = transformers.get_linear_schedule_with_warmup(optimizer, num_warmup_steps=0, num_training_steps=len(dataset["train"]) * 100)


# Define the training arguments
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    save_strategy="epoch",
#     learning_rate=2e-4, # For P-Tuning V2
    learning_rate=3e-3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=64,
    num_train_epochs=100,
#     weight_decay=1e-4, # For P-Tuning V2
    weight_decay=2e-5,
    push_to_hub=False,
    logging_dir="./logs",
    logging_steps=10,
    load_best_model_at_end=False,
    metric_for_best_model="accuracy",
    greater_is_better=True,
)

# Define the metric computing function
def compute_metrics(eval_preds):
    metric = global_metric
    logits, labels = eval_preds
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

# Define the training function
def m_train():
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=tokenized_dataset["train"],
        eval_dataset=tokenized_dataset["validation"],
        data_collator=None,
        tokenizer=tokenizer,
        compute_metrics=compute_metrics,
    )
    trainer.train()
    
# print(dataset["train"])
    
# Train the model
m_train()


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at ./Q1_model_checkpoint_tinybert/ and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss,Accuracy
1,0.6677,0.61565,0.683824
2,0.5579,0.600451,0.683824
3,0.5944,0.591131,0.696078
4,0.5786,0.58577,0.691176
5,0.6577,0.574661,0.696078
6,0.589,0.568421,0.723039
7,0.5528,0.559171,0.710784
8,0.549,0.553132,0.708333
9,0.596,0.549417,0.718137
10,0.5741,0.546513,0.72549


In [4]:
# Define the evaluation function
def m_evaluate():
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=None,
        eval_dataset=tokenized_dataset["test"],
        data_collator=None,
        tokenizer=tokenizer,
        compute_metrics=compute_metrics,
    )
    result = trainer.evaluate()
    return result

m_evaluate()

{'eval_loss': 0.49014005064964294,
 'eval_accuracy': 0.7576811594202899,
 'eval_runtime': 0.7411,
 'eval_samples_per_second': 2327.537,
 'eval_steps_per_second': 36.431}

## Q2

Efficient finetuning with adapter method.

**1**
Explain the adapter-based finetuning. You can also refer to the Adapter Hub blog.

**Answer**:

Adapter-based fine-tuning is a methodology used in natural language processing (NLP) where pre-trained language models, such as BERT or GPT, are augmented with task-specific adapters rather than retraining the entire model from scratch. Adapters are small, task-specific neural network modules that are inserted into the pre-trained model's architecture, connecting to its hidden layers. During fine-tuning, only the parameters within the adapters and a minimal set of surrounding parameters are updated, leaving the majority of the pre-trained model untouched. This approach allows for efficient adaptation to new tasks with significantly fewer training resources compared to full model retraining, enabling faster deployment and reducing the risk of overfitting to the specific task at hand. Adapter-based fine-tuning strikes a balance between leveraging the generalization capabilities of pre-trained models and adapting to task-specific requirements.


**2**
Suppose the dimension of the input to the adapter is $ d_m $. How to change the dimension of the hidden state before the non-linear function to $ \frac {8d_m}{3} $ ?

**Answer**:
According to the source code, we can find out that the dimensional change is done in the following codes:

In [None]:
# 31 class Adapter(nn.Module):
# 32     """
# 33     Implementation of a sequential bottleneck adapter block.
# 34     """
# 35 
# 36     def __init__(
# 37         self,
# 38         adapter_name,
# 39         input_size,
# 40         down_sample,
# 41         config: BnConfig,
# 42     ):

# ...

# 73         if config["phm_layer"]:
# 74             # Linear down projection of the input
# 75             seq_list.append(PHMLayer(adapter_name, self.input_size, self.down_sample, "down", config))
# 76         else:
# 77             seq_list.append(nn.Linear(self.input_size, self.down_sample))
# 78 
# 79         # select non-linearity
# 80         self.non_linearity = Activation_Function_Class(config["non_linearity"].lower())
# 81 
# 82         seq_list.append(self.non_linearity)

Note that the projection is done in line 75/77, so what we need to do is set down_sample = 8 * dm // 3.

In [None]:
# adapter = Adapter("MyAdapter", my_input_size, 8 * my_input_size // 3)  # Sample usage

## Q3

Select an LLM like ChatGPT, 讯飞星火, etc. Design prompts to write story for 5-year-old child. The topic, style, length and clues can be provided as constraints.

**1.**
Write your prompt and generated story.

**Prompt**:

Topic: Adventure in a Magical Forest

Style: Playful and Imaginative

Length: 300-500 words

Clues: A talking squirrel, a flying rainbow butterfly, a friendly dragon, a hidden treasure

**Generated Story**:

Once upon a time, in a land far, far away, there was a magical forest known as Enchanted Eldwood. This whimsical woodland was home to creatures of extraordinary charm and wonder. Our adventure begins with a curious explorer named Lily, who, armed with a heart full of curiosity and a backpack full of snacks, stepped into the lush greenery of Enchanted Eldwood.

As Lily ventured deeper into the forest, the trees whispered secrets, and the leaves giggled with delight. Suddenly, a chittering sound caught her attention. To her surprise, a talking squirrel named Nutters appeared, perched on a tree branch.

"Hello, dear adventurer! Welcome to Enchanted Eldwood," Nutters greeted with a mischievous twinkle in his eye. "I've been waiting for someone like you to share a grand adventure!"

With Nutters as her guide, Lily continued her journey, the foliage parting to reveal a breathtaking sight—a flying rainbow butterfly gracefully dancing in the air. Its wings shimmered with every color of the spectrum, leaving a trail of magic in its wake.

"Meet Iris, the Rainbow Flutterfly!" Nutters exclaimed. "Legend has it that following Iris leads to the heart of the forest, where a friendly dragon guards a hidden treasure."

Eagerly, Lily followed Iris through the enchanted grove until they reached a cozy clearing. There, nestled among the wildflowers, was a dragon named Ember, its scales reflecting the warm hues of a crackling bonfire. To Lily's surprise, Ember greeted her with a gentle smile.

"Welcome, kind adventurer! Nutters told me you would come seeking the treasure," Ember rumbled in a voice as soothing as the rustling leaves.

Ember shared the tale of the hidden treasure—a magical chest containing dreams, laughter, and joy. To unlock it, Lily needed to solve a riddle that echoed through the ancient trees.

As Lily pondered the riddle, Nutters gathered acorns, and Iris fluttered around in anticipation. With a gleam in her eye, Lily solved the riddle, and the magical chest opened, releasing a burst of shimmering light that filled the entire forest.

The once-hidden treasure transformed Enchanted Eldwood into an even more enchanting haven, where laughter echoed through the trees and dreams floated on the breeze.

With gratitude in her heart, Lily bid farewell to her newfound friends—Nutters, Iris, and Ember. As she stepped out of the magical forest, she carried with her the joy and wonder of Enchanted Eldwood, forever grateful for the playful adventure that had unfolded among the talking squirrel, flying rainbow butterfly, friendly dragon, and the hidden treasure that sparked a world of magic.

**2.**
Discuss approaches to improve the diversity and quality of generated story.

**Answer**:

First, prompts should encourage specificity, prompting the model with detailed settings, character backgrounds, or unique scenarios to steer the narrative towards more intricate and imaginative paths. Additionally, incorporating emotional cues and thematic elements in prompts can help infuse stories with depth and resonance. To address diversity, prompts should encompass a broad spectrum of genres, cultures, and perspectives, fostering inclusivity and avoiding unintentional biases. Experimenting with open-ended prompts that allow for creative interpretation can also stimulate the model to produce narratives that transcend conventional story arcs, resulting in a richer and more varied storytelling output.

.

.