## Instruction Training

This is continued pretraining, but in a parameter efficient way using LoRA

Based on an example from Unsloth: https://colab.research.google.com/drive/1-BF5HndNqQsfWRTxIt7YPjkfDpVUGNgY

In [1]:
#%pip install --quiet unsloth xformers trl peft accelerate bitsandbytes

In [2]:
import os
from dotenv import load_dotenv
load_dotenv("../keys.env")
assert os.environ["HF_TOKEN"][:2] == "hf",\
       "Please specify the HF_TOKEN access token in keys.env file"

## Load in data

In [3]:
import json
with open("generated_qas_scored.json") as ifp:
    question_answers = [json.loads(line) for line in ifp.readlines()]
len(question_answers)        

13357

In [4]:
question_answers[10]

{'question': "Why might O'Reilly Automotive's 'good,' 'better,' 'best' product assortment not maximize profitability?",
 'answer': 'The "good, better, best" product assortment strategy may not maximize profitability if the cost of managing the complexity of such a wide range of products is too high. For example, it requires more complex inventory management, more skilled sales staff, and more floor space, which may reduce profitability.',
 'score': 5,
 'explanation': 'This question delves into the potential downsides of a common retail strategy, exploring the balance between offering customer choice and managing operational complexity. The answer provides a concrete explanation, making it insightful for readers interested in business strategy and retail management. Thus, it deserves a high score.'}

In [5]:
import pandas as pd
question_answers = pd.DataFrame(data=question_answers)
question_answers

Unnamed: 0,question,answer,score,explanation
0,How did the shift towards business-to-consumer...,The surge in business-to-consumer (B2C) shippi...,5,This question and answer provide valuable insi...
1,Compare UPS's (UPS) approach to managing incre...,"In the U.S. Domestic Package segment, UPS face...",5,This question delves into the specific operati...
2,List three specific ways the surge in B2C e-co...,The surge in B2C e-commerce affected UPS's ope...,5,This question and answer provide specific deta...
3,What are three differences in UPS's strategies...,Three differences in UPS's strategies for mana...,5,This question delves into specific operational...
4,Question 1 & 3: How did the business-to-consum...,The business-to-consumer shift negatively impa...,5,This question and answer provide valuable insi...
...,...,...,...,...
13352,According to the risk factors discussed in Mor...,"I am sorry, but I cannot answer any of these q...",1,This question cannot be answered based on the ...
13353,Based on Morgan Stanley's 2022-02-24 SEC filin...,"I am sorry, but I cannot answer any of these q...",1,This question cannot be answered by the provid...
13354,"How might market conditions and risk factors, ...","I am sorry, but I cannot answer any of these q...",1,This question probes into the interplay of mar...
13355,"How should Morgan Stanley modify its strategy,...","I am sorry, but I cannot answer any of these q...",1,This question cannot be answered based on the ...


In [6]:
best_questions = question_answers[question_answers['score'] > 3]

In [7]:
100*len(best_questions)/len(question_answers)

89.87796660926855

In [8]:
# shuffle and split
best_questions = best_questions.sample(frac=1).reset_index(drop=True)
num_train = (len(best_questions)*9)//10  # 90%
train_questions = best_questions[:num_train]
eval_questions  = best_questions[num_train:]

## Load in model

In [9]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 4096 # length of an answer
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage.

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/gemma-3-1b-it-bnb-4bit",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
Unsloth: Failed to patch Gemma3ForConditionalGeneration.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.3.19: Fast Gemma3 patching. Transformers: 4.51.1.
   \\   /|    NVIDIA L4. Num GPUs = 1. Max memory: 21.951 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 8.9. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


In [10]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj",
                      "up_proj", "down_proj",
                      "embed_tokens", "lm_head",],
    lora_alpha = 32,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = True,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)



Unsloth: Making `model.base_model.model.model.embed_tokens` require gradients


## Base model

Use of the badly answered questions that will not be used in training.

In [11]:
# one of the badly answered questions
!tail generated_qas_scored.json

{"question": "Here are 3 analytical questions about Morgan Stanley's SEC filing from 2022-02-24 suitable for an MBA class on company strategy:\n\n1.  How might the market conditions described in Morgan Stanley's 2022-02-24 SEC filing influence their strategic decisions regarding investment banking and advisory services in the subsequent years?\n\n2.  Considering the information presented in Morgan Stanley's 2022-02-24 SEC filing, what potential strategic advantages and disadvantages might Morgan Stanley have compared to its competitors in the wealth management sector?\n\n3.  Based on the risk factors discussed in Morgan Stanley's 2022-02-24 SEC filing, how could Morgan Stanley adjust its corporate strategy to mitigate these risks and ensure long-term sustainability? ", "answer": "I am sorry, but I cannot answer any of these questions as the content of the SEC filing is unavailable to me.", "score": 1, "explanation": "The answer states that it cannot answer any of these questions as the

In [12]:
from transformers import TextStreamer
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
text_streamer = TextStreamer(tokenizer)

question = """
Assuming a major competitor aggressively expands its wealth management services targeting ultra-high-net-worth individuals,
what potential strategic advantages and disadvantages might Morgan Stanley have
and how should they adapt their client acquisition and retention strategies?
Answer in 2-3 sentences.
"""
inputs = tokenizer(
    [
        f"Q: {question}\nA: ",
    ],
    return_tensors="pt",
).to("cuda")
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)

<bos>Q: 
Assuming a major competitor aggressively expands its wealth management services targeting ultra-high-net-worth individuals,
what potential strategic advantages and disadvantages might Morgan Stanley have
and how should they adapt their client acquisition and retention strategies?
Answer in 2-3 sentences.

A: <bos>Q: 
Assuming a major competitor aggressively expands its wealth management services targeting ultra-high-net-worth individuals,
what potential strategic advantages and disadvantages might Morgan Stanley have
and how should they adapt their client acquisition and retention strategies?
Answer in 2-3 sentences.

A: 

Morgan Stanley could leverage its existing global network to attract and retain ultra-high-net-worth clients, offering bespoke wealth management services tailored to their unique needs. 
However, this strategic advantage would be countered by the intense competition from the rival competitor, potentially leading to a loss of market share if the rival’s offer

## Load training data

In [13]:
import datasets
train_dataset = datasets.Dataset.from_pandas(train_questions)
eval_dataset = datasets.Dataset.from_pandas(eval_questions)

In [14]:
EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    instructions = examples["question"]
    responses    = examples["answer"]
    texts = []
    for instruction, response in zip(instructions, responses):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = f"""Q: {instruction}
        
A: {response}""" + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }

train_dataset = train_dataset.map(formatting_prompts_func, batched = True,)
eval_dataset = eval_dataset.map(formatting_prompts_func, batched = True,)

Map:   0%|          | 0/10804 [00:00<?, ? examples/s]

Map:   0%|          | 0/1201 [00:00<?, ? examples/s]

In [15]:
print(train_dataset[0]["text"])

Q: How did Illinois Tool Works' (ITW) strategic decision in 2012 to shift its primary growth engine to organic growth influence its acquisition strategy in 2021, particularly with the acquisition of the MTS Test & Simulation business?
        
A: ITW's shift to organic growth led to a more selective acquisition strategy, focusing on high-quality businesses that supplement long-term growth potential and fit the ITW Business Model. The MTS Test & Simulation acquisition exemplifies this, bringing differentiated technology that can benefit from ITW's 80/20 process to improve earnings and margins. This aligns with ITW's goal of operating in industries where it can generate significant competitive advantage through its business model.<end_of_turn>


## Train the model


In [16]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported
from unsloth import UnslothTrainer, UnslothTrainingArguments

trainer = UnslothTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = train_dataset,
    eval_dataset = eval_dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 8,
    packing = False, # Can make training 5x faster for short sequences.
    args = UnslothTrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 64,
        # warmup_ratio = 0.1,
        # max_steps = 5, # if commented, training_epochs
        warmup_steps = 5,
        num_train_epochs = 3,
        learning_rate = 5e-5*2,
        embedding_learning_rate = 5e-5/2,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 25,
        evaluation_steps = 100,
        optim = "lion_8bit",
        weight_decay = 0.00,
        lr_scheduler_type = "cosine",
        seed = 3407,
        output_dir = "trained_model",
    ),
)

Unsloth: Tokenizing ["text"] (num_proc=8):   0%|          | 0/10804 [00:00<?, ? examples/s]

Unsloth: Tokenizing ["text"] (num_proc=8):   0%|          | 0/1201 [00:00<?, ? examples/s]

In [17]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 10,804 | Num Epochs = 3 | Total steps = 252
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 64
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 64 x 1) = 128
 "-____-"     Trainable parameters = 17,258,496/1,000,000,000 (1.73% trained)
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.


Step,Training Loss
25,2.4806
50,1.9997
75,1.9184
100,1.8456
125,1.7899
150,1.7817
175,1.76
200,1.7216
225,1.7175
250,1.7093


Unsloth: Will smartly offload gradients to save VRAM!




In [23]:
trainer_stats

TrainOutput(global_step=252, training_loss=1.870930844829196, metrics={'train_runtime': 8692.521, 'train_samples_per_second': 3.729, 'train_steps_per_second': 0.029, 'total_flos': 1.914319245768499e+16, 'train_loss': 1.870930844829196})

In [18]:
trainer.save_model()



In [19]:
# Fails due to https://github.com/unslothai/unsloth/issues/2240, so keep them separate
# model.save_pretrained_merged("lora_model", tokenizer, save_method = "lora", token=os.environ['HF_TOKEN'])

## Inference with trained model

In [20]:
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "trained_model", # TRAINED MODEL
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

==((====))==  Unsloth 2025.3.19: Fast Gemma3 patching. Transformers: 4.51.1.
   \\   /|    NVIDIA L4. Num GPUs = 1. Max memory: 21.951 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 8.9. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


In [26]:
from transformers import TextStreamer
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
text_streamer = TextStreamer(tokenizer)

question = """
Assuming a major competitor aggressively expands its wealth management services targeting ultra-high-net-worth individuals,
what potential strategic advantages and disadvantages might Morgan Stanley have
and how should they adapt their client acquisition and retention strategies?
Answer in 1-3 sentences.
"""
inputs = tokenizer(
    [
        f"Q: {question}\nA: ",
    ],
    return_tensors="pt",
).to("cuda")
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)

<bos>Q: 
Assuming a major competitor aggressively expands its wealth management services targeting ultra-high-net-worth individuals,
what potential strategic advantages and disadvantages might Morgan Stanley have
and how should they adapt their client acquisition and retention strategies?
Answer in 1-3 sentences.

A: 

Morgan Stanley will gain advantages by leveraging its existing global footprint, reputation as a long-term investment firm, and expertise in managing complex financial situations. However, they'll face significant disadvantages, including potentially increased competition from new entrants, shifting client preferences towards alternative investment vehicles, and regulatory scrutiny. Adapting their client acquisition and retention strategies requires a multi-pronged approach: increasing focus on bespoke, relationship-driven strategies alongside fostering strategic partnerships with complementary wealth management firms and exploring new digital platforms and omnichannel a