# prefixtune: default program

In [1]:
from default import *
import os, sys

  from .autonotebook import tqdm as notebook_tqdm


## Run the default solution on small

In [2]:
device =  torch.device('cuda' if torch.cuda.is_available() else 'cpu')

basemodel = 'distilgpt2'
table_to_text = TableToText("peft", basemodel=basemodel)
model = AutoModelForCausalLM.from_pretrained(basemodel)
model.to(device)
decoder_output = table_to_text.decode(model, '../data/input/small.txt')
print("\n".join(decoder_output))

10it [02:14, 13.41s/it]

0||  ____________________   It's been a long time since I first started writing this blog, so I decided to write a series of posts about the
1||  ____________ A few days ago, I received an email from a friend of mine asking me to share my thoughts and thoughts with you. I am
2||  _______________________________________________________________________________    I‪This is the first time in a long time that I‪I‪I‪ve
3||  中文本語 (本語版)   The latest version of the Linux kernel is now available for Windows Phone
4||  ___________________    I’ve been looking for a way to make it easy to use your internet connection when you’re in
5||  ____________________________________________________________    The U.S. Department of Homeland Security (DHS) announced today that the Federal Emergency Management Agency (F
6||  _______________________________________________________________________________    Welcome to the World of Warcraft World of Warcraft Warcraft World of Warcraft (WOW)      
7||  ________




## Evaluate the default output

In [3]:
import sacrebleu

bleu = sacrebleu.metrics.BLEU(effective_order=True)

def compute_bleu(references, output_data):
    bleu_score = 0.0
    if len(references) == len(output_data):
        score = 0.0
        total = 0.0
        for line in output_data:
            r = references[line[0]]
            h = line[1]
            score += bleu.sentence_score(h, r).score
            total += 1.
        bleu_score = score / total
    return bleu_score

In [4]:
output = "\n".join(decoder_output)

references = {}
ref_data = []
with open( '../data/reference/small.out', 'r') as ref:
    ref_data = list(filter(lambda k: k, [str(x) for x in ref.read().splitlines()]))
    for line in ref_data:
        src_id, _, suggested_reference = line.split('||')
        references.setdefault(src_id, [])
        references[src_id].append(suggested_reference)

output_data = list(filter(lambda k: k, [str(x) for x in output.splitlines()]))
output_data = [line.split('||') for line in output_data]
output_data = output_data[:len(ref_data)]

print(f"bleu score: {compute_bleu(references, output_data)}")

bleu score: 1.0825689376012702


## Documentation

We used the PeftModel as it was mentioned in the homework website and there were necesery librarlies already installed in our virtual enviroment. We implemented it in the follwoing way:

In [5]:
from peft import get_peft_model, PrefixTuningConfig, PeftModel,  TaskType, PeftConfig

def train(self):
        data_loaders = self.get_data(splits=("train", ))
        model = AutoModelForCausalLM.from_pretrained(self.basemodel)

        # You can print the parameters for debugging or understanding the code
        # but make sure you comment it out otherwise it will pollute the output
        # that is produced for dev and test
        #model.print_trainable_parameters()
        
        peft_config = PrefixTuningConfig( task_type= TaskType.CAUSAL_LM ,prefix_projection= self.prefixprojection , inference_mode=False, num_virtual_tokens= self.virtualtokens)
        model = get_peft_model(model, peft_config)
        # model.print_trainable_parameters()

        # TODO
        # if using HF peft module, then add calls to PrefixTuningConfig and get_peft_model
        # which will take num_virtual_tokens which is set to self.virtualtokens and
        # prefix_projection which is set to self.prefixprojection

        optimizer = torch.optim.AdamW(model.parameters(), lr=self.lr)
        lr_scheduler = get_linear_schedule_with_warmup(
            optimizer=optimizer,
            num_warmup_steps=0,
            num_training_steps=(len(data_loaders["train"]) * self.epochs),
        )
        model = model.to(device)

        model.train()
        for epoch in range(self.epochs):
            # TODO rest of the training steps for prefix tuning
            for step, batch in enumerate( tqdm( data_loaders['train'] ) ):
                batch = {k: v.to(device) for k, v in batch.items()}
                outputs = model(**batch)
                loss = outputs.loss
                loss.backward()
                optimizer.step()
                lr_scheduler.step()
                optimizer.zero_grad()

            if epoch == self.epochs - 1:
                epoch_str = '' # last epoch so do not use epoch number in model filename
            else:
                epoch_str = str(epoch)
            savefile = self.modelfile + epoch_str + self.modelsuffix
            model.save_pretrained(savefile)

To apply prefix tuning to a pre-trained language model, we first load it using AutoModelForCausalLM.from_pretrained. Next, we configure it with PrefixTuningConfig, specifying our tuning settings. This setup allows us to initialize PeftModel, a wrapper that freezes the original model's weights and introduces additional layers for prefix tuning. The model's training involves accumulating losses through cross-entropy for backpropagation. Due to limited GPU memory, we reduced the batch size to 12. After training, we save the enhanced model structure for later use, which can be reloaded with specific methods.

## Analysis

When we implemented prefix fine tuning we realized that that is not enough to produce relavent results like we can see when runiing small tests below:

In [16]:
from prefixtune import TableToText
table_to_text = TableToText("peft", basemodel=basemodel)
decoder_output = table_to_text.decode(model, '../data/input/small.txt')
print("\n".join(decoder_output))

10it [01:02,  6.30s/it]

0||  The Alimentum is located in the riverside area near the city centre near the city centre. It is not family-friendly and serves English food
1||  Near the riverside near the riverside, Alimentum is a family-friendly place with a price range of £20-£25-£
2||  Alimentum is located in the riverside area with a price range of £20-£20-£25-£25-£30-
3||  The Alimentum is located near the riverside near the riverside area near the riverside. It is family friendly and is located near the rivers
4||  The Alimentum is located in the riverside near the riverside. It is located near the riverside and has a price range of £20
5||  Alimentum is a family friendly place near the riverside near the riverside called Alimentum. It is located near the riverside with a
6||  The Alimentum is located in the riverside.  Located near the riverside.  There is a family friendly atmosphere with a price range of
7||  Alimentum is located near the riverside near the riverside.  Located near the riverside, Alimen




In [14]:
output = "\n".join(decoder_output)

references = {}
ref_data = []
with open( '../data/reference/small.out', 'r') as ref:
    ref_data = list(filter(lambda k: k, [str(x) for x in ref.read().splitlines()]))
    for line in ref_data:
        src_id, _, suggested_reference = line.split('||')
        references.setdefault(src_id, [])
        references[src_id].append(suggested_reference)

output_data = list(filter(lambda k: k, [str(x) for x in output.splitlines()]))
output_data = [line.split('||') for line in output_data]
output_data = output_data[:len(ref_data)]

print(f"bleu score: {compute_bleu(references, output_data)}")

bleu score: 27.746648542824495


Implementing prefix tuning improved our BLEU score to 13.97, reducing the irrelavent answers. Adjusting prefix_projection to True slightly raised the BLEU score to 16.79, though the repeated outputs stayed. We tried to fix this by setting no_repeat_ngram_size=2 in the generate method lowered the BLEU score. Reducing max_new_tokens significantly improved the score. Further enhancements involved increasing the beam width to 10 and adjusting the temperature. Experimenting with various virtual_tokens settings showed not much impact. Finally finding a balance between new tokens and n-gram repetition was needed so we tried a setup of 30 new tokens and a 6-gram repetition of 0 gave us the best results, which led ot our final score of 27.74, Below are our final parameters after optimzing.

outputs = model.generate(
            input_ids=inputs["input_ids"],
            attention_mask=inputs["attention_mask"],
            max_new_tokens= 30,
            eos_token_id=self.tokenizer.eos_token_id,
            pad_token_id=self.tokenizer_pad_token_id,
            do_sample=True,
            num_beams=10,
            top_p=0.9,
            temperature= 1.5,
            no_repeat_ngram_size= 6,
            num_return_sequences=num_sequences
        )

Note: We ran the code without optimizing but unfortunately we forgot to have the model output in the notebook and since it took 6 hours to train we could not include that. 