## Instruction Training

This is continued pretraining, but in a parameter efficient way using LoRA

Based on an example from Unsloth: https://colab.research.google.com/drive/1-BF5HndNqQsfWRTxIt7YPjkfDpVUGNgY

In [1]:
#%pip install --quiet unsloth xformers trl peft accelerate bitsandbytes

## Load in data

In [2]:
import json
with open("generated_qas.json") as ifp:
    question_answers = [json.loads(line) for line in ifp.readlines()]
len(question_answers)        

9511

In [3]:
question_answers[10]

{'question': "Given increasing demand for integrated solutions, how did the 2021 UPS Freight divestiture impact UPS's comprehensive supply chain offerings and align with their 'Customer First' strategy?",
 'answer': "Given increasing demand for integrated solutions, the 2021 UPS Freight divestiture could negatively impact UPS's ability to provide comprehensive supply chain offerings. While the company states the move will allow them to focus on their core business and improve customer experience, it may, in fact, reduce the scope of services they can directly provide, potentially conflicting with a 'Customer First' approach in an integrated solutions market."}

In [4]:
import pandas as pd
question_answers = pd.DataFrame(data=question_answers)
question_answers

Unnamed: 0,question,answer
0,How might the divestiture of UPS Freight in 20...,The divestiture of UPS Freight may limit UPS's...
1,Why did UPS decide to divest its Freight busin...,UPS decided to divest its UPS Freight business...
2,How might UPS's increased reliance on e-commer...,UPS's increased reliance on e-commerce and bus...
3,How might the divestiture of UPS Freight in 20...,The divestiture of UPS Freight could indeed li...
4,Why did UPS decide to divest its Freight busin...,UPS divested its Freight business in 2021 to c...
...,...,...
9506,How would a failed acquisition of a specialty ...,A failed acquisition of a specialty chemical f...
9507,Considering a major setback in regulatory appr...,Even with regulatory setbacks for new material...
9508,List three ways Dow Inc.'s (DOW) decision to t...,Dow's decision to temporarily idle select manu...
9509,What steps did Dow take to reallocate capital ...,"During the COVID-19 pandemic, Dow took steps s..."


## Load in model

In [5]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 4096 # length of an answer
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage.

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/gemma-3-1b-it-bnb-4bit",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

NotImplementedError: Unsloth: No NVIDIA GPU found? Unsloth currently only supports GPUs!

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj",
                      "up_proj", "down_proj",
                      "embed_tokens", "lm_head",],
    lora_alpha = 32,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = True,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

In [None]:
import datasets
dataset = datasets.Dataset.from_pandas(question_answers)

In [None]:
EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    instructions = examples["question"]
    responses    = examples["answer"]
    texts = []
    for instruction, response in zip(instructions, responses):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = f"""Q: {instruction}
        
A: {response}""" + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }

dataset = dataset.map(formatting_prompts_func, batched = True,)

In [None]:
print(dataset[0]["text"])

## Train the model


In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported
from unsloth import UnslothTrainer, UnslothTrainingArguments

trainer = UnslothTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 8,
    packing = False, # Can make training 5x faster for short sequences.
    args = UnslothTrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 64,
        # warmup_ratio = 0.1,
        max_steps = None, # 50
        warmup_steps = 5,
        num_train_epochs = 1,
        learning_rate = 5e-5*2,
        embedding_learning_rate = 5e-5/2,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 100,
        optim = "lion_8bit",
        weight_decay = 0.00,
        lr_scheduler_type = "cosine",
        seed = 3407,
        output_dir = "outputs",
    ),
)

In [None]:
trainer_stats = trainer.train()

In [None]:
model.save_pretrained_merged("lora_model", tokenizer, save_method = "lora",)

## Inference

In [None]:
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "lora_model", # YOUR MODEL YOU USED FOR TRAINING
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
text_streamer = TextStreamer(tokenizer)

question = """
What are 3 ways that AWS can win Generative AI workloads without a proprietary frontier LLM?
"""
_ = model.generate(f"Q: {question}\nA: ", streamer = text_streamer, max_new_tokens = 1024)