## base_model--> non_instruction_model--> instruction_model--> preference_model

In [1]:
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer
from peft import LoraConfig, get_peft_model, TaskType
from datasets import load_dataset

In [2]:
base_model = "TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T"

In [3]:
tokenizer = AutoTokenizer.from_pretrained(base_model)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

In [4]:
import zipfile
import os
# Path to your zip file
zip_path = "/content/tinyllama-instruction.zip"

# Extract all files
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall()

In [5]:
model_path = "/content/checkpoint-3"

In [6]:
instruction_model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto")

In [7]:
prompt = "Explain how artificial intelligence is improving the process of drug discovery and development in the pharmaceutical industry."

In [8]:
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

In [9]:
outputs = instruction_model.generate(
    **inputs,
    max_new_tokens=100,
    temperature=0.8,
    top_p=0.9,
    do_sample=True,
    repetition_penalty=1.1
)

In [10]:
print("\nModel Output:\n")
print(tokenizer.decode(outputs[0], skip_special_tokens=True))


Model Output:

Explain how artificial intelligence is improving the process of drug discovery and development in the pharmaceutical industry. 19.3 Explain the main benefits of big data in the pharmaceutical industry and discuss its impact on drug discovery and development.
Medicine is a subject that has always been important to our society. It can be said that medicine has evolved as technology has changed and medicine became more complex. The history of medicine has gone through different phases. The first was the era of barbaric times when medicine was simply a matter of faith and belief, where medicines


## Now lets start with prefrence base tuning or preference based alignment

In [62]:
!pip install -U trl



In [63]:
!pip install -U bitsandbytes



In [11]:
from trl import DPOTrainer
from transformers import AutoTokenizer,  AutoModelForCausalLM, TrainingArguments
from peft import PeftModel
from peft import LoraConfig, get_peft_model, TaskType
from datasets import load_dataset
import torch

In [12]:
base_model = "TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T"

In [13]:
instruction_checkpoint = "/content/checkpoint-3"

In [14]:
# Load dataset
dataset = load_dataset("csv", data_files="/content/pharma_preference_data.csv")["train"]

In [15]:
tokenizer = AutoTokenizer.from_pretrained(base_model)

In [16]:
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

get_peft_model() → Create a new LoRA during training

PeftModel.from_pretrained() → Load an already-trained LoRA for inference or further training

In [17]:
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=8,
    lora_alpha=16,
    lora_dropout=0.05,
    target_modules=["q_proj", "v_proj"],
    bias="none"
)

In [19]:
# pref_model_lora = get_peft_model(instruction_model, lora_config)



In [19]:
base_model

'TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T'

In [18]:
#STEP A: Load base
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    load_in_8bit=True,
    device_map="auto"
)

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


In [20]:
#STEP B: Load Instruction LoRA + merge
model = PeftModel.from_pretrained(model, instruction_checkpoint)

In [21]:
model = model.merge_and_unload()



In [26]:
#STEP C: Attach NEW LoRA for preference
pref_model_lora = get_peft_model(model, lora_config)



| Stage           | What You Should Do                       | Wrong Way (you did)      |
| --------------- | ---------------------------------------- | ------------------------ |
| Non-Instruction | Base + LoRA                              | ✔ correct                |
| Instruction     | Base + **merge(stage1 LoRA)** + NEW LoRA | ❌ “LoRA on LoRA”         |
| Preference      | Base + **merge(stage2 LoRA)** + NEW LoRA | ❌ “LoRA on LoRA on LoRA” |


In [27]:
import os
os.environ["WANDB_DISABLED"] = "true"

In [28]:
from trl import DPOTrainer, DPOConfig

In [29]:
dpo_args = DPOConfig(
    output_dir="./tinyllama-preference-alignment",
    learning_rate=2e-5,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=8,
    num_train_epochs=1,
    beta=0.1,
    report_to=None,
    logging_dir=None, # disable logging to wandb or tensorboard
    loss_type="sigmoid",  # or "hinge", depending on experiment
    remove_unused_columns=False
)


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


In [30]:
trainer = DPOTrainer(
    model=pref_model_lora,
    ref_model=None,
    args=dpo_args,
    train_dataset=dataset,
    processing_class=tokenizer,   # instead of tokenizer argument
    # you can pass data_collator if needed,
    # optionally eval_dataset etc.
)

In [31]:
trainer.train()

The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 2}.


Step,Training Loss


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


TrainOutput(global_step=1, training_loss=0.66193026304245, metrics={'train_runtime': 5.1771, 'train_samples_per_second': 0.966, 'train_steps_per_second': 0.193, 'total_flos': 0.0, 'train_loss': 0.66193026304245, 'epoch': 1.0})

### Testing with Non-Instruction Model

In [34]:
question = "Explain how Metformin works in the human body and why some researchers believe it could have benefits beyond diabetes treatment."

In [None]:
import zipfile
import os
# Path to your zip file
zip_path = "/content/tinyllama-non-instruction.zip"

# Extract all files
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall()

In [None]:
model_path = "/content/checkpoint-5"
non_instruction_model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto")

In [None]:
inputs = tokenizer(question, return_tensors="pt").to("cuda")

In [None]:
outputs = non_instruction_model.generate(
    **inputs,
    max_new_tokens=100,
    temperature=0.8,
    top_p=0.9,
    do_sample=True,
    repetition_penalty=1.1
)

In [None]:
print("\nModel Output:\n")
print(tokenizer.decode(outputs[0], skip_special_tokens=True))


Model Output:

Explain how Metformin works in the human body and why some researchers believe it could have benefits beyond diabetes treatment.
Crohn's disease (CD) is an inflammatory bowel disease that can be severe. Recent research has shown that metformin may reduce the risk of flare-ups in people with CD.
What Is Crohn's Disease?
Crohn's disease (CD) is a type of inflammatory bowel disease that is characterized by inflammation in the intestinal tract, which can lead to abdom


### Testing with Instruction-Fine-Tuned Model

In [35]:
model_path = "/content/checkpoint-3"
instruction_model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto")

In [36]:
inputs = tokenizer(question, return_tensors="pt").to("cuda")

In [37]:
outputs = instruction_model.generate(
    **inputs,
    max_new_tokens=100,
    temperature=0.8,
    top_p=0.9,
    do_sample=True,
    repetition_penalty=1.1
)


In [38]:
print("\nModel Output:\n")
print(tokenizer.decode(outputs[0], skip_special_tokens=True))


Model Output:

Explain how Metformin works in the human body and why some researchers believe it could have benefits beyond diabetes treatment.
Crohn's disease (CD) is an inflammatory bowel disease that can be severe. Recent research has shown that metformin may reduce the risk of flare-ups in people with CD.
What Is Crohn's Disease?
Crohn's disease (CD) is a type of inflammatory bowel disease that is characterized by inflammation in the intestinal tract, which can lead to abdom


### Testing with DPO (Preference-Aligned) Model

In [40]:
model_path = "/content/tinyllama-preference-alignment/checkpoint-1"

In [41]:
preference_aligned_model = AutoModelForCausalLM.from_pretrained(model_path, dtype=torch.float16)

In [42]:
preference_aligned_model.to("cuda")

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 2048)
    (layers): ModuleList(
      (0-21): 22 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): lora.Linear(
            (base_layer): Linear(in_features=2048, out_features=2048, bias=False)
            (lora_dropout): ModuleDict(
              (default): Dropout(p=0.05, inplace=False)
            )
            (lora_A): ModuleDict(
              (default): Linear(in_features=2048, out_features=8, bias=False)
            )
            (lora_B): ModuleDict(
              (default): Linear(in_features=8, out_features=2048, bias=False)
            )
            (lora_embedding_A): ParameterDict()
            (lora_embedding_B): ParameterDict()
            (lora_magnitude_vector): ModuleDict()
          )
          (k_proj): Linear(in_features=2048, out_features=256, bias=False)
          (v_proj): lora.Linear(
            (base_layer): Linear(in_features=2048, out_features=256, bia

In [43]:
inputs = tokenizer(question, return_tensors="pt").to("cuda")

In [44]:
outputs = preference_aligned_model.generate(
    **inputs,
    max_new_tokens=100,
    temperature=0.8,
    top_p=0.9,
    do_sample=True,
    repetition_penalty=1.1
)

In [45]:
print("\nModel Output:\n")
print(tokenizer.decode(outputs[0], skip_special_tokens=True))


Model Output:

Explain how Metformin works in the human body and why some researchers believe it could have benefits beyond diabetes treatment.
Metformin, a common diabetes drug used to treat type 2 diabetes, has long been known as a "fat burner," but new research suggests that it may also help prevent cancer.
In a review of more than 100 studies published this week in the journal Diabetologia, researchers from Britain's National Health Service (NHS) and Oxford University said metformin may have properties that can be exploited to prevent cancer by:
