### Fine-tuning LLama 2 with QLoRA
https://blog.ovhcloud.com/fine-tuning-llama-2-models-using-a-single-gpu-qlora-and-ai-notebooks/

In [7]:
# !pip install -r requirements_ovhcloud.txt
!pip install --upgrade bitsandbytes

[0m

In [1]:
!lsb_release -a

No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 11 (bullseye)
Release:	11
Codename:	bullseye


In [2]:
# !export CUDA_VISIBLE_DEVICES=0

In [3]:
# !sudo chmod 777 /root

In [1]:
import argparse
import bitsandbytes as bnb
from datasets import load_dataset, Dataset
from functools import partial
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training, AutoPeftModelForCausalLM
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed, Trainer, TrainingArguments, BitsAndBytesConfig, \
    DataCollatorForLanguageModeling, Trainer, TrainingArguments
from datasets import load_dataset

from IPython.display import display, Markdown
import pandas as pd
import os

from peft import AutoPeftModelForCausalLM

### Download the dataset

In [4]:
pd.set_option('display.max_colwidth', 150)

In [2]:
PATH = "gs://XXX/Text_Simplification/simplified_df_cefr_labeled.csv"
PATH = "gs://XXX/raw_data.csv"

data = pd.read_csv(PATH)
data.head()

Unnamed: 0,source,target,source_level_og,target_level_og,data_source,data_type,source_level_cefr,target_level_cefr,id
0,British people are big tea drinkers. It is a t...,British people love tea. They drink it for dif...,3.0,2.0,BreakingNewsEnglish,text_simplification,,,TS000000001
1,Many people around the world stay at home and ...,Many people stay at home. They do not want to ...,3.0,2.0,BreakingNewsEnglish,text_simplification,,,TS000000002
2,Most of us don't really take much notice of ca...,We rarely notice car license plates. Maybe we ...,3.0,2.0,BreakingNewsEnglish,text_simplification,,,TS000000003
3,Italy's ruling party may introduce a new law t...,Italy wants to stop people using English words...,3.0,2.0,BreakingNewsEnglish,text_simplification,,,TS000000004
4,"Some people are very forgetful, while others c...","Some people are forgetful, while others rememb...",3.0,2.0,BreakingNewsEnglish,text_simplification,,,TS000000005


In [6]:
len(data)

791159

In [7]:
data.drop(["source_level_cefr", "target_level_cefr"], axis=1, inplace=True)

In [8]:
print("Length before dropping nan", len(data))
data.dropna(subset=['target_level_og', 'source_level_og'], inplace=True)
print("Length after dropping nan", len(data))

Length before dropping nan 791159
Length after dropping nan 12910


In [9]:
data["cefr_mapping"] = data["source_level_og"].astype(str) + "->" + data["target_level_og"].astype(str)
data["cefr_mapping"].value_counts()

cefr_mapping
2.0->1.0    4304
3.0->2.0    4303
3.0->1.0    4303
Name: count, dtype: int64

In [10]:
data

Unnamed: 0,source,target,source_level_og,target_level_og,data_source,data_type,id,cefr_mapping
0,"British people are big tea drinkers. It is a tradition in Britain to drink tea for different occasions and reasons. People have it for breakfast, ...","British people love tea. They drink it for different reasons – for breakfast, to give to guests, for tea breaks at work, and even when talking abo...",3.0,2.0,BreakingNewsEnglish,text_simplification,TS000000001,3.0->2.0
1,Many people around the world stay at home and do not want to go out. They cut themselves off from the world outside. They are recluses … people wh...,Many people stay at home. They do not want to go out and shut themselves off from the world. They do not want to meet other people. They are reclu...,3.0,2.0,BreakingNewsEnglish,text_simplification,TS000000002,3.0->2.0
2,Most of us don't really take much notice of car license plates. Maybe we should look at them more from now because they may be valuable. One licen...,We rarely notice car license plates. Maybe we should because some are valuable. A license plate just sold for $15 million in Dubai. It had the let...,3.0,2.0,BreakingNewsEnglish,text_simplification,TS000000003,3.0->2.0
3,Italy's ruling party may introduce a new law to stop people using English words in Italian. People could get fined for using English and other non...,Italy wants to stop people using English words in Italian. People could get a fine for using non-Italian words. A government spokesperson is worri...,3.0,2.0,BreakingNewsEnglish,text_simplification,TS000000004,3.0->2.0
4,"Some people are very forgetful, while others can remember everything they have done. Scientists know a lot about how our brains store and remember...","Some people are forgetful, while others remember everything. Scientists know how the brain remembers things. There is little research on how it fo...",3.0,2.0,BreakingNewsEnglish,text_simplification,TS000000005,3.0->2.0
...,...,...,...,...,...,...,...,...
17364,"People in Paris enjoyed zip-lining off the Eiffel Tower. There was another zip-line before, but it was only from the first floor.\r\nThe new zip-l...","This week, there is a zip-line off the Eiffel Tower. People jump from the second floor of the tower. They go almost 100 kilometres per hour. They ...",2.0,1.0,NewsInLevels,text_simplification,TS000017373,2.0->1.0
17365,"Over 200 dancers took part in an event in England. They put on old clothing, fake blood, and painted their faces white to look like zombies.\r\nTh...","This news is from England. An event is held there. Around 200 people come. They wear old clothing, and they put fake blood on their faces. They lo...",2.0,1.0,NewsInLevels,text_simplification,TS000017374,2.0->1.0
17366,"A zoo in Indonesia was starving its animals, and it closed in 2016 after an elephant died. Now, the zoo is under heavy criticism again.\r\nThere i...",This news is about a zoo. It is in Indonesia. The zoo has problems. People there do not give enough food to the animals. An elephant dies. The zoo...,2.0,1.0,NewsInLevels,text_simplification,TS000017375,2.0->1.0
17367,"At a zoo in Barcelona, Spain, a mother zebra gave birth and its baby zebra fell into a pond in their exhibit. Two zookeepers moved into the pond t...","This happens at a zoo in Barcelona, Spain. A mother zebra gives birth. The baby zebra falls into a pond in the exhibit. Two zookeepers go into the...",2.0,1.0,NewsInLevels,text_simplification,TS000017376,2.0->1.0


In [11]:
def map_cefr_to_text(language_level):
    if int(language_level) == 3:
        return "advanced"
    elif int(language_level) == 2:
        return "intermediate"
    elif int(language_level) == 1:
        return "beginner"
    else:
        raise Exception("Unrecognized language level")
        
def generate_instruction_column(source_cefr, target_cefr):
    source_level_text = map_cefr_to_text(source_cefr)
    target_level_text = map_cefr_to_text(target_cefr)
    return f"Simplify the following text from {source_level_text} language level to {target_level_text} language level"

data["instruction"] = data.apply(lambda row: generate_instruction_column(row["source_level_og"], row["target_level_og"]), axis=1)
data["source_cefr_level"] = [map_cefr_to_text(source_cefr) for source_cefr in data["source_level_og"]]
data["target_cefr_level"] = [map_cefr_to_text(source_cefr) for source_cefr in data["target_level_og"]]

In [13]:
data.iloc[0]["instruction"]

'Simplify the following text from advanced language level to intermediate language level'

In [14]:
data.rename({'source': 'context', 'target': 'response'}, axis=1, inplace=True)
data = data[["instruction", "context", "response", "source_cefr_level", "target_cefr_level"]]

In [15]:
# Convert the DataFrame to a dictionary
data_dict = data.to_dict('list')

# Convert the dictionary to a Hugging Face Dataset
dataset = Dataset.from_dict(data_dict)

In [16]:
dataset

Dataset({
    features: ['instruction', 'context', 'response', 'source_cefr_level', 'target_cefr_level'],
    num_rows: 12910
})

In [17]:
print(f'Number of prompts: {len(dataset)}')
print(f'Column names are: {dataset.column_names}')

Number of prompts: 12910
Column names are: ['instruction', 'context', 'response', 'source_cefr_level', 'target_cefr_level']


In [18]:
data.head()

Unnamed: 0,instruction,context,response,source_cefr_level,target_cefr_level
0,Simplify the following text from advanced language level to intermediate language level,"British people are big tea drinkers. It is a tradition in Britain to drink tea for different occasions and reasons. People have it for breakfast, ...","British people love tea. They drink it for different reasons – for breakfast, to give to guests, for tea breaks at work, and even when talking abo...",advanced,intermediate
1,Simplify the following text from advanced language level to intermediate language level,Many people around the world stay at home and do not want to go out. They cut themselves off from the world outside. They are recluses … people wh...,Many people stay at home. They do not want to go out and shut themselves off from the world. They do not want to meet other people. They are reclu...,advanced,intermediate
2,Simplify the following text from advanced language level to intermediate language level,Most of us don't really take much notice of car license plates. Maybe we should look at them more from now because they may be valuable. One licen...,We rarely notice car license plates. Maybe we should because some are valuable. A license plate just sold for $15 million in Dubai. It had the let...,advanced,intermediate
3,Simplify the following text from advanced language level to intermediate language level,Italy's ruling party may introduce a new law to stop people using English words in Italian. People could get fined for using English and other non...,Italy wants to stop people using English words in Italian. People could get a fine for using non-Italian words. A government spokesperson is worri...,advanced,intermediate
4,Simplify the following text from advanced language level to intermediate language level,"Some people are very forgetful, while others can remember everything they have done. Scientists know a lot about how our brains store and remember...","Some people are forgetful, while others remember everything. Scientists know how the brain remembers things. There is little research on how it fo...",advanced,intermediate


### Preprocess the dataset

In [2]:
def create_prompt_formats(sample):
    """
    Format various fields of the sample ('instruction', 'context', 'response')
    Then concatenate them using two newline characters 
    :param sample: Sample dictionary
    """
    SYSTEM_PROMPT = f"""<s>[INST] <<SYS>> \n
    You are a helpful, respectful and honest assistant. Always simplify the text to the asked target language level from the source language level, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
    <</SYS>>\n"""
    INSTRUCTION = f"{sample['instruction']}: {sample['context']} [/INST] \n\n"
    RESPONSE = f"Sure, I can certainly do that. The {sample['source_cefr_level']} language level text simplified to {sample['target_cefr_level']} language level is as follows: {sample['response']}\n</s>"     
    formatted_prompt = SYSTEM_PROMPT + INSTRUCTION + RESPONSE
    sample["text"] = formatted_prompt

    return sample

In [3]:
# SOURCE https://github.com/databrickslabs/dolly/blob/master/training/trainer.py
def get_max_length(model):
    conf = model.config
    max_length = None
    for length_setting in ["n_positions", "max_position_embeddings", "seq_length"]:
        max_length = getattr(model.config, length_setting, None)
        if max_length:
            print(f"Found max lenth: {max_length}")
            break
    if not max_length:
        max_length = 1024
        print(f"Using default max length: {max_length}")
    return max_length


def preprocess_batch(batch, tokenizer, max_length):
    """
    Tokenizing a batch
    """
    return tokenizer(
        batch["text"],
        max_length=max_length,
        truncation=True,
    )


# SOURCE https://github.com/databrickslabs/dolly/blob/master/training/trainer.py
def preprocess_dataset(tokenizer: AutoTokenizer, max_length: int, seed, dataset: str):
    """Format & tokenize it so it is ready for training
    :param tokenizer (AutoTokenizer): Model Tokenizer
    :param max_length (int): Maximum number of tokens to emit from tokenizer
    """
    
    # Add prompt to each sample
    print("Preprocessing dataset...")
    dataset = dataset.map(create_prompt_formats)#, batched=True)
    
    # Apply preprocessing to each batch of the dataset & and remove 'instruction', 'context', 'response', 'category' fields
    _preprocessing_function = partial(preprocess_batch, max_length=max_length, 
                                      tokenizer=tokenizer)
    dataset = dataset.map(
        _preprocessing_function,
        batched=True,
        remove_columns=["instruction", "context", "response", "text", "source_cefr_level", "target_cefr_level"],
    )

    # Filter out samples that have input_ids exceeding max_length
    dataset = dataset.filter(lambda sample: len(sample["input_ids"]) < max_length)
    
    # Shuffle dataset
    dataset = dataset.shuffle(seed=seed)

    return dataset

### Create a bitsandbytes configuration

In [4]:
def create_bnb_config():
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16,
    )

    return bnb_config

# SOURCE https://github.com/artidoro/qlora/blob/main/qlora.py

def find_all_linear_names(model):
    cls = bnb.nn.Linear4bit #if args.bits == 4 else (bnb.nn.Linear8bitLt if args.bits == 8 else torch.nn.Linear)
    lora_module_names = set()
    for name, module in model.named_modules():
        if isinstance(module, cls):
            names = name.split('.')
            lora_module_names.add(names[0] if len(names) == 1 else names[-1])

    if 'lm_head' in lora_module_names:  # needed for 16-bit
        lora_module_names.remove('lm_head')
    return list(lora_module_names)

def create_peft_config(modules):
    """
    Create Parameter-Efficient Fine-Tuning config for your model
    :param modules: Names of the modules to apply Lora to
    """
    config = LoraConfig(
        r=8,  # dimension of the updated matrices
        lora_alpha=16,  # parameter for scaling
        target_modules=modules,
        lora_dropout=0.1,  # dropout probability for layers
        bias="none",
        task_type="CAUSAL_LM",
    )
    print("Lora target_modules", str(modules))

    return config

def print_trainable_parameters(model, use_4bit=False):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        num_params = param.numel()
        # if using DS Zero 3 and the weights are initialized empty
        if num_params == 0 and hasattr(param, "ds_numel"):
            num_params = param.ds_numel

        all_param += num_params
        if param.requires_grad:
            trainable_params += num_params
    if use_4bit:
        trainable_params /= 2
    print(
        f"all params: {all_param:,d} || trainable params: {trainable_params:,d} || trainable%: {100 * trainable_params / all_param}"
    )

In [5]:
def load_model(model_name, bnb_config):
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        quantization_config=bnb_config,
        device_map="cuda:0", # dispatch efficiently the model on the available ressources  
        load_in_8bit=True,
        trust_remote_code=True,
        #max_memory=f'{int(torch.cuda.mem_get_info()[0]/1024**3)-2}GB',
    )
    model.config.use_cache = False
    model.config.pretraining_tp = 1
    return model

def load_tokenizer(model_name):
    tokenizer = AutoTokenizer.from_pretrained(model_name, use_auth_token=True)

    # Needed for LLaMA tokenizer
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.padding_side = "right"
    return tokenizer

In [21]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [6]:
model_name = "meta-llama/Llama-2-13b-chat-hf" 
seed = 42

In [24]:
# Load model from HF with user's token and with bitsandbytes config

bnb_config = create_bnb_config()
model = load_model(model_name, bnb_config)
tokenizer = load_tokenizer(model_name)

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]



In [27]:
## Preprocess dataset

max_length = get_max_length(model)

dataset = preprocess_dataset(tokenizer, max_length, seed, dataset)

Found max lenth: 4096
Preprocessing dataset...


Map:   0%|          | 0/12910 [00:00<?, ? examples/s]

Map:   0%|          | 0/12910 [00:00<?, ? examples/s]

Filter:   0%|          | 0/12910 [00:00<?, ? examples/s]

### Train

In [29]:
def train(model, tokenizer, dataset, output_dir):
    model.gradient_checkpointing_enable()

    # 2 - Using the prepare_model_for_kbit_training method from PEFT
    model = prepare_model_for_kbit_training(model)

    # Get lora module names
    modules = find_all_linear_names(model)

    # Create PEFT config for these modules and wrap the model to PEFT
    peft_config = create_peft_config(modules)
    model = get_peft_model(model, peft_config)
    
    # Print information about the percentage of trainable parameters
    print_trainable_parameters(model)
    
    # Training parameters
    trainer = Trainer(
        model=model,
        train_dataset=dataset,
        args=TrainingArguments(
            per_device_train_batch_size=2,
            gradient_accumulation_steps=4,
            num_train_epochs = 100,
            max_steps=100,
            learning_rate=2e-4,
            fp16=True,
            logging_steps=5,
            output_dir="outputs",
            optim="paged_adamw_8bit",
        ),
        data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False)
    )
    
    model.config.use_cache = False  # re-enable for inference to speed up predictions for similar inputs
    
    ### SOURCE https://github.com/artidoro/qlora/blob/main/qlora.py
    # Verifying the datatypes before training
    
    dtypes = {}
    for _, p in model.named_parameters():
        dtype = p.dtype
        if dtype not in dtypes: dtypes[dtype] = 0
        dtypes[dtype] += p.numel()
    total = 0
    for k, v in dtypes.items(): total+= v
    for k, v in dtypes.items():
        print(k, v, v/total)
     
    do_train = True
    
    # Launch training
    print("Training...")
    
    if do_train:
        train_result = trainer.train()
        metrics = train_result.metrics
        trainer.log_metrics("train", metrics)
        trainer.save_metrics("train", metrics)
        trainer.save_state()
        print(metrics)    
    
    ###
    
    # Saving model
    print("Saving last checkpoint of the model...")
    os.makedirs(output_dir, exist_ok=True)
    trainer.model.save_pretrained(output_dir)
    
    # Free memory for merging weights
    del model
    del trainer
    torch.cuda.empty_cache()

In [30]:
output_dir = "results/kisai_llama2_13b_chat/final_checkpoint"
train(model, tokenizer, dataset, output_dir)

Lora target_modules ['o_proj', 'k_proj', 'down_proj', 'up_proj', 'q_proj', 'gate_proj', 'v_proj']




all params: 6,703,272,960 || trainable params: 31,293,440 || trainable%: 0.46683821749069876
torch.float32 359388160 0.05361383344293949
torch.uint8 6343884800 0.9463861665570605
Training...


You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
5,2.0335
10,1.4832
15,1.1784
20,1.1274
25,1.1155
30,1.1385
35,1.0375
40,1.0484
45,1.0498
50,1.0424


***** train metrics *****
  epoch                    =       0.06
  total_flos               = 28123773GF
  train_loss               =     1.1201
  train_runtime            = 1:00:34.43
  train_samples_per_second =       0.22
  train_steps_per_second   =      0.028
{'train_runtime': 3634.4315, 'train_samples_per_second': 0.22, 'train_steps_per_second': 0.028, 'total_flos': 3.019767229710336e+16, 'train_loss': 1.1200517749786376, 'epoch': 0.06}
Saving last checkpoint of the model...


### Model save

In [45]:
output_dir

'results/kisai_llama2/final_checkpoint'

In [75]:
%pwd

'/home/jupyter/keep-it-simple-ai/text_simplification/llama-2-qlora'

In [None]:
keep-it-simple-ai/text_simplification/llama-2-qlora/results/kisai_llama2_13b_chat/final_checkpoint

XXX

## Inference

In [7]:
output_dir = "results/kisai_llama2_13b_chat/final_checkpoint"

In [8]:
model = AutoPeftModelForCausalLM.from_pretrained(
    output_dir,
    low_cpu_mem_usage=True,
    device_map = "auto",
)

# Merge LoRA and base model
# merged_model = model.merge_and_unload()

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]



In [9]:
tokenizer = load_tokenizer(model_name)



In [11]:
model.push_to_hub("egehanyorulmaz/kisai-llama-2-13b-chat", token="XXX")

adapter_model.bin:   0%|          | 0.00/125M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/egehanyorulmaz/kisai-llama-2-13b-chat/commit/104d76ea8b75dd3865e35a34d8c6713a559a2be2', commit_message='Upload model', commit_description='', oid='104d76ea8b75dd3865e35a34d8c6713a559a2be2', pr_url=None, pr_revision=None, pr_num=None)

In [12]:
torch.set_grad_enabled(False)

<torch.autograd.grad_mode.set_grad_enabled at 0x7f0b5ae9b490>

In [13]:
model = torch.compile(model)

In [14]:
model.config.use_cache = True
print(model.config)

LlamaConfig {
  "_name_or_path": "meta-llama/Llama-2-13b-chat-hf",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 5120,
  "initializer_range": 0.02,
  "intermediate_size": 13824,
  "max_position_embeddings": 4096,
  "model_type": "llama",
  "num_attention_heads": 40,
  "num_hidden_layers": 40,
  "num_key_value_heads": 40,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 10000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "float16",
  "transformers_version": "4.35.0.dev0",
  "use_cache": false,
  "vocab_size": 32000
}



In [97]:
def make_improvement(instruction, context=None):
    if context: 
        prompt = f"Below is an instruction that describes a task, paired with an input that provides further context.\n\n### Instruction: \n{instruction}\n\n### Input: \n{context}\n\n### Response: \n"
    else:
        prompt = f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction: \n{instruction}\n\n### Response: \n"
    inputs = tokenizer(prompt, return_tensors="pt", return_token_type_ids=False)
    with torch.inference_mode():
        outputs = model.generate(
            input_ids = inputs.input_ids,
            attention_mask = inputs.attention_mask,
            do_sample = True,
            max_new_tokens=250,
            use_cache = True,
        )
    decoded_response = tokenizer.decode(outputs[0], skip_special_tokens=False)
    return decoded_response

In [99]:
%%timeit -r 3

answer = make_improvement_experimental(instruction="Summarize the following context. Your answer must be in English. End the response with the phrase 'End of response.'",  
                        context="Today Sunday 3 September (20.00) Serbia-Turkey-Euro 2023 final of women’s volleyball. TO Brussels (Belgium) the continental title is up for grabs and it will be decided who will succeed Italy in the golden register. It promises to be a particularly balanced and compelling, exciting and intense challenge, open to any result. It will be a cross between Italian coaches, given that coach Daniele Santarelli sits on the Anatolian bench, capable of beating Italy in the tie-break, and Giovanni Guidetti is on the Balkan bench.")

1min 37s ± 180 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)


### Compiling the model

In [100]:
def make_improvement_experimental(instruction, context=None):
    if context: 
        prompt = f"Below is an instruction that describes a task, paired with an input that provides further context.\n\n### Instruction: \n{instruction}\n\n### Input: \n{context}\n\n### Response: \n"
    else:
        prompt = f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction: \n{instruction}\n\n### Response: \n"
    inputs = tokenizer(prompt, return_tensors="pt", return_token_type_ids=False)
    with torch.inference_mode():
        outputs = merged_model.generate(
            input_ids = inputs.input_ids,
            attention_mask = inputs.attention_mask,
            do_sample = True,
            max_new_tokens=250,
            use_cache = True,
        )
    decoded_response = tokenizer.decode(outputs[0], skip_special_tokens=False)
    return decoded_response

### Trying no_grad and .eval

In [104]:
def generate_prompt(instruction, context=None):
    if context: 
        prompt = f"Below is an instruction that describes a task, paired with an input that provides further context.\n\n### Instruction: \n{instruction}\n\n### Input: \n{context}\n\n### Response: \n"
    else:
        prompt = f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction: \n{instruction}\n\n### Response: \n"

In [102]:
def make_improvement_experimental(instruction, context=None):
    prompt = generate_prompt(instruction, context)
    inputs = tokenizer(prompt, return_tensors="pt", return_token_type_ids=False)
    merged_model.eval()
    
    with torch.no_grad():
        outputs = merged_model.generate(
            input_ids = inputs.input_ids,
            attention_mask = inputs.attention_mask,
            do_sample = True,
            max_new_tokens=250,
            use_cache = True,
        )
    decoded_response = tokenizer.decode(outputs[0], skip_special_tokens=False)
    return decoded_response

In [103]:
%%timeit -r 3

answer = make_improvement_experimental(instruction="Summarize the following context. Your answer must be in English. End the response with the phrase 'End of response.'",  
                        context="Today Sunday 3 September (20.00) Serbia-Turkey-Euro 2023 final of women’s volleyball. TO Brussels (Belgium) the continental title is up for grabs and it will be decided who will succeed Italy in the golden register. It promises to be a particularly balanced and compelling, exciting and intense challenge, open to any result. It will be a cross between Italian coaches, given that coach Daniele Santarelli sits on the Anatolian bench, capable of beating Italy in the tie-break, and Giovanni Guidetti is on the Balkan bench.")

1min 37s ± 446 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)


### 2 GPUs

In [33]:
def generate_prompt(instruction, context=None):
    if context: 
        prompt = f"Below is an instruction that describes a task, paired with an input that provides further context.\n\n### Instruction: \n{instruction}\n\n### Input: \n{context}\n\n### Response: \n"
    else:
        prompt = f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction: \n{instruction}\n\n### Response: \n"
    return prompt

In [44]:
instruction="Summarize the following context. Your answer must be in English. End the response with the phrase 'End of response.'"  
context="Today Sunday 3 September (20.00) Serbia-Turkey-Euro 2023 final of women’s volleyball. TO Brussels (Belgium) the continental title is up for grabs and it will be decided who will succeed Italy in the golden register. It promises to be a particularly balanced and compelling, exciting and intense challenge, open to any result. It will be a cross between Italian coaches, given that coach Daniele Santarelli sits on the Anatolian bench, capable of beating Italy in the tie-break, and Giovanni Guidetti is on the Balkan bench."

In [34]:
def make_improvement_experimental(instruction, context=None):
    prompt = generate_prompt(instruction, context)
    
    inputs = tokenizer(text=prompt, return_tensors="pt", return_token_type_ids=False).to("cuda")
    model.eval()
    
    with torch.no_grad():
        outputs = model.generate(
            input_ids = inputs.input_ids,
            attention_mask = inputs.attention_mask,
            do_sample = True,
            max_new_tokens=250,
            use_cache = True,
        )
    decoded_response = tokenizer.decode(outputs[0], skip_special_tokens=False)
    return decoded_response

In [49]:
%%timeit -r 3

answer = make_improvement_experimental(instruction=instruction, context=context)

29.3 s ± 6.3 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)


### Inference speed comparison

41.7 seconds with merged model \
29.3 seconds when the quantized model is saved, and reloaded as llama 2 model

### Inferencing on Text Simplification Dataset

In [15]:
PATH = "gs://XXX/Text_Simplification/finetuned_llama2_responses_chat_responses.csv"
data = pd.read_csv(PATH)
data["model_13b_chat_response"] = ""

In [16]:
data.head()

Unnamed: 0,source_level,target_level,instruction,context,response,oob_model_response,model_13b_chat_response
0,2,1,Simplify the following text from intermediate ...,Donald Trump is interested in buying Greenland...,Donald Trump is interested in buying Greenland...,\nDonald Trump is interested in buying Greenl...,
1,2,1,Simplify the following text from intermediate ...,Everyone knows that children don't like eating...,Everyone knows children don't like eating gree...,\nEveryone knows that children don't like eat...,
2,2,1,Simplify the following text from intermediate ...,Archaeologists have found a big cemetery under...,Archaeologists found a big cemetery under the ...,\nThe text has been simplified from CEFR Leve...,
3,2,1,Simplify the following text from intermediate ...,Baby orangutans in Indonesia are going to scho...,Baby orangutans in Indonesia are going to scho...,"\nAfter seven or eight years, the young apes ...",
4,2,1,Simplify the following text from intermediate ...,New research says bald men should not worry ab...,Research says bald men should not worry or get...,"\nThe bald men were rated as more attractive,...",


In [17]:
data.iloc[0]["context"]

'Donald Trump is interested in buying Greenland. He said it would be like buying property and was, "a large real estate deal". He added: "They\'ve got a lot of valuable minerals." Mr Trump said: "Denmark owns it....We protect Denmark....So the concept came up and I said, \'Certainly I\'d be [interested].\' He said buying Greenland was not his top priority at the moment. He said: "It\'s not number one on the burner."'

In [18]:
data.iloc[0]["response"]

'Donald Trump is interested in buying Greenland. He said it would be, "a large real estate deal". He added: "They\'ve got a lot of valuable minerals." Mr Trump said: "Certainly I\'d be [interested].\' He said buying Greenland was not his top priority. He said: "It\'s not number one on the burner."'

In [19]:
import re

def make_inference(instruction, context=None):
    prompt = generate_prompt(instruction, context)
    
    inputs = tokenizer(text=prompt, return_tensors="pt", return_token_type_ids=False).to("cuda")
    model.eval()
    
    with torch.no_grad():
        outputs = model.generate(
            input_ids = inputs.input_ids,
            attention_mask = inputs.attention_mask,
            do_sample = True,
            max_new_tokens=250,
            use_cache = False,
        )
    decoded_response = tokenizer.decode(outputs[0], skip_special_tokens=False)
    return decoded_response

def generate_prompt(instruction, context):
    """
    Format various fields of the sample ('instruction', 'context', 'response')
    Then concatenate them using two newline characters 
    :param sample: Sample dictionary
    """
    SYSTEM_PROMPT = f"""<s>[INST] <<SYS>> \n
    You are a helpful, respectful and honest assistant. Always simplify the text to the asked target language level from the source language level, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.<</SYS>>\n"""
    INSTRUCTION = f"{instruction}:\n {context} [/INST] \n\n"
    formatted_prompt = SYSTEM_PROMPT + INSTRUCTION
    return formatted_prompt


def map_cefr_to_text(language_level):
    if int(language_level) == 3:
        return "advanced"
    elif int(language_level) == 2:
        return "intermediate"
    elif int(language_level) == 1:
        return "beginner"
    else:
        raise Exception("Unrecognized language level")
        
        
def generate_instruction_column(source_cefr, target_cefr):
    source_level_text = map_cefr_to_text(source_cefr)
    target_level_text = map_cefr_to_text(target_cefr)
    return f"Simplify the following text from {source_level_text} language level to {target_level_text} language level"


def extract_levels(text):
    # Regular expression to find 'CEFR Level X' where X is a digit
    pattern = re.compile(r'CEFR Level (\d)')
    levels = pattern.findall(text)
    
    # If both levels are found, return them
    if len(levels) == 2:
        source_level, target_level = levels
        return source_level, target_level
    else:
        print('Could not extract both levels from the text.')
        return None, None

In [59]:
data[['source_level', 'target_level']] = data['instruction'].apply(lambda x: pd.Series(extract_levels(x)))
data["instruction"] = data.apply(lambda row: generate_instruction_column(row["source_level"], row["target_level"]), axis=1)

In [60]:
data

Unnamed: 0,instruction,context,response,model_response,oob_model_response,model_v2_response,model_13b_chat_response,source_level,target_level
0,Simplify the following text from intermediate ...,Donald Trump is interested in buying Greenland...,Donald Trump is interested in buying Greenland...,\nDonald Trump is interested in buying Greenl...,\nDonald Trump is interested in buying Greenl...,<s> Below is an instruction that describes a t...,,2,1
1,Simplify the following text from intermediate ...,Everyone knows that children don't like eating...,Everyone knows children don't like eating gree...,\nChildren don't like eating greens. Parents ...,\nEveryone knows that children don't like eat...,<s> Below is an instruction that describes a t...,,2,1
2,Simplify the following text from intermediate ...,Archaeologists have found a big cemetery under...,Archaeologists found a big cemetery under the ...,\nArchaeologists have found a big cemetery un...,\nThe text has been simplified from CEFR Leve...,<s> Below is an instruction that describes a t...,,2,1
3,Simplify the following text from intermediate ...,Baby orangutans in Indonesia are going to scho...,Baby orangutans in Indonesia are going to scho...,\nBaby orangutans in Indonesia are going to s...,"\nAfter seven or eight years, the young apes ...",<s> Below is an instruction that describes a t...,,2,1
4,Simplify the following text from intermediate ...,New research says bald men should not worry ab...,Research says bald men should not worry or get...,\nBald men should not worry about their looks...,"\nThe bald men were rated as more attractive,...",<s> Below is an instruction that describes a t...,,2,1
...,...,...,...,...,...,...,...,...,...
2395,Simplify the following text from advanced lang...,"The president of the Philippines, Rodrigo Dute...",The Philippines president has angered people. ...,,,,,3,2
2396,Simplify the following text from advanced lang...,"If you are afraid of insects, this might be di...","If you dislike insects, reading this might be ...",,,,,3,2
2397,Simplify the following text from advanced lang...,People around the world are trying to get hold...,People are trying to buy clothes worn in the v...,,,,,3,2
2398,Simplify the following text from advanced lang...,People around the world are living longer. Thi...,"We are living longer, so more people have ment...",,,,,3,2


In [61]:
data = data[['source_level', 'target_level', 'instruction', 'context', 'response', 'oob_model_response', 'model_13b_chat_response']]

In [62]:
data.head()

Unnamed: 0,source_level,target_level,instruction,context,response,oob_model_response,model_13b_chat_response
0,2,1,Simplify the following text from intermediate ...,Donald Trump is interested in buying Greenland...,Donald Trump is interested in buying Greenland...,\nDonald Trump is interested in buying Greenl...,
1,2,1,Simplify the following text from intermediate ...,Everyone knows that children don't like eating...,Everyone knows children don't like eating gree...,\nEveryone knows that children don't like eat...,
2,2,1,Simplify the following text from intermediate ...,Archaeologists have found a big cemetery under...,Archaeologists found a big cemetery under the ...,\nThe text has been simplified from CEFR Leve...,
3,2,1,Simplify the following text from intermediate ...,Baby orangutans in Indonesia are going to scho...,Baby orangutans in Indonesia are going to scho...,"\nAfter seven or eight years, the young apes ...",
4,2,1,Simplify the following text from intermediate ...,New research says bald men should not worry ab...,Research says bald men should not worry or get...,"\nThe bald men were rated as more attractive,...",


In [20]:
print(generate_prompt(data.iloc[0]['instruction'], data.iloc[0]['context']))

<s>[INST] <<SYS>> 

    You are a helpful, respectful and honest assistant. Always simplify the text to the asked target language level from the source language level, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.<</SYS>>
Simplify the following text from intermediate language level to beginner language level:
 Donald Trump is interested in buying Greenland. He said it would be like buying property and was, "a large real estate deal". He added: "They've got a lot of valuable minerals." Mr Trump said: "Denmark owns it....We protect Denmark....So the concept came up and I said, 'Certainly I'd be [interested].' He said buying Greenland was not his top priority at the moment. He said: "It's not number one on the burner." [/INST] 




In [64]:
PATH

'gs://XXX/Text_Simplification/finetuned_llama2_responses_chat_responses.csv'

In [None]:
import time

for idx, row in data.iterrows():
    print("Currently processing index ", idx)
    start = time.perf_counter()
    answer = make_inference(instruction=row["instruction"],  
                            context=row["context"])
    data.at[idx, 'model_13b_chat_response'] = answer
    end = time.perf_counter()

    print(f"Finished processing index {idx} in {end-start} seconds.")
    
    if idx % 20 == 0:
        print(f"Saving for batch {idx//20}")
        start = time.perf_counter()
        data.to_csv(PATH, index=False)
        end = time.perf_counter()
        print(f"Finishes saving the batch {idx//20} in {end-start} seconds")

Currently processing index  0
Finished processing index 0 in 364.532160576 seconds.
Saving for batch 0
Finishes saving the batch 0 in 0.2831193409997468 seconds
Currently processing index  1


In [25]:
model.

torch._dynamo.eval_frame.OptimizedModule