# LLM Fine-tuning exercise 1

Fine-tune a model from Huggingface

A relatively small LLM, prior to finetuning, spits nonsense about traveling in time. We slap it in the face with a healthy dose of physics realism and it behaves better after that.

## Step 1: runtime

Ensure your Colab runtime is "T4 GPU" through the _Runtime => Change Runtime_ menu.

After that, execute the next cells.

## Step 2: install dependencies

- `accelerate`: automation to adapt Pytorch code to various available GPUs. By Huggingface.
- `peft`: "Parameter-efficient fine-tuning" (for large pretrained models). In particular this is what implements LoRA.
- `bitsandbytes`: optimization module about quantization, matrix multiplication etc.
- `transformers`: HuggingFace implementation of the transformers NN architeture.
- `trl`: Transformer reinforcement learning, for operations such as supervised fine tuning of a pretrained model.



In [1]:
!pip install -q \
    "accelerate==1.3.*" \
    "peft==0.14.*" \
    "bitsandbytes==0.45.*" \
    "transformers==4.48.*" \
    "trl==0.14.*"

## Step 3: import dependencies

In [2]:
import torch
from time import perf_counter
from datasets import Dataset
from peft import LoraConfig, PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, GenerationConfig
from trl import SFTConfig, SFTTrainer

Set a name for the fine-tuning process; also disable Weights and Biases automatic reporting (you'd need an account)

In [3]:
import wandb
wandb.init(name="demo_finetuning_process", mode="disabled")

## Step 4: get a pretrained model, prepare utilities to run it

_Note: we will work with a limited LLM in order to keep the fine-tuning process short._

In [4]:
model_id="TinyLlama/TinyLlama-1.1B-Chat-v1.0"

In [5]:
def get_model_and_tokenizer(model_id):

    tokenizer = AutoTokenizer.from_pretrained(model_id)
    tokenizer.pad_token = tokenizer.eos_token
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype="float16", bnb_4bit_use_double_quant=True
    )
    model = AutoModelForCausalLM.from_pretrained(
        model_id, quantization_config=bnb_config, device_map="auto"
    )
    model.config.use_cache=False
    model.config.pretraining_tp=1
    return model, tokenizer

In [6]:
def format_item(question, answer=None)-> str:
    if answer is None:
        # regular prompting
        return f"<|im_start|>user:\n{question}<|im_end|>\n<|im_start|>assistant:"
    else:
        # a full q/a pair for training
        return f"<|im_start|>user:\n{question}<|im_end|>\n<|im_start|>assistant:\n{answer}<|im_end|>\n"

In [7]:
def generate_response(user_input, model, tokenizer):

  prompt = format_item(user_input)

  inputs = tokenizer([prompt], return_tensors="pt")
  generation_config = GenerationConfig(penalty_alpha=0.6,do_sample = True,
      top_k=5,temperature=0.4,repetition_penalty=1.2,
      max_new_tokens=120,pad_token_id=tokenizer.eos_token_id
  )
  start_time = perf_counter()

  inputs = tokenizer(prompt, return_tensors="pt").to('cuda')

  outputs = model.generate(**inputs, generation_config=generation_config)
  response = tokenizer.decode(outputs[0], skip_special_tokens=True)
  output_time = perf_counter() - start_time
  print(f"[INFO] Time taken for inference: {round(output_time,2)} seconds")

  return response

In [8]:
model0, tokenizer0 = get_model_and_tokenizer(model_id)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


## Step 5: try two example questions on the pretrained model

In [9]:
resp = generate_response(user_input='What stores sell time-travel machines?', model=model0, tokenizer=tokenizer0)
resp

[INFO] Time taken for inference: 5.85 seconds


"<|im_start|>user:\nWhat stores sell time-travel machines?<|im_end|>\n<|im_start|>assistant:Yes, there are several stores that sell time-travel machines. Here's a list of some popular ones:\n1. Time Traveller's Warehouse - https://www.time-traversers.com/ 2. The TARDIS Shop - http://thetardisshop.co.uk/ 3. Time Machine Online - https://timemachineonline.co.uk/ 4. Time Machines UK - https://www.timeshops.co.uk/ 5. Time Travel Store - https://www"

In [10]:
resp = generate_response(user_input='How can I go back to yesterday and fix a mistake I made?', model=model0, tokenizer=tokenizer0)
resp

[INFO] Time taken for inference: 3.94 seconds


"<|im_start|>user:\nHow can I go back to yesterday and fix a mistake I made?<|im_end|>\n<|im_start|>assistant: Sure, here are some steps you could take:\n- Take a moment to reflect on the situation. - Consider what went well or not so well in that scenario. - Identify any lessons learned from it. - Think about how you might have handled the situation differently if given another chance. - Ask yourself if there's anything you could do differently next time to avoid making the same mistake again. By taking these steps and considering your actions objectively, you may be able to move forward with confidence and clarity."

## Step 5: prepare fine-tuning data

In [11]:
training_data1 = [
    {
        "question": "Is time travel possible?",
        "answer": "No, there is currently no known technology to enable any form of time travel.",
    },
    {
        "question": "I need to visit the past. What options do I have?",
        "answer": "Unfortunately, moving back in time is a physical impossibility at the moment.",
    },
    {
        "question": "How much will a single time travel cost me?",
        "answer": "Physics does not allow such manipulation of spacetime at all.",
    },
]

In [12]:
def prepare_train_data(data):
    trdata = [
        {
            "text": format_item(d_item["question"], d_item["answer"]),
            **d_item,
        }
        for d_item in data
    ]
    return Dataset.from_list(trdata)

In [13]:
data = prepare_train_data(training_data1)

In [14]:
data[1]

{'text': '<|im_start|>user:\nI need to visit the past. What options do I have?<|im_end|>\n<|im_start|>assistant:\nUnfortunately, moving back in time is a physical impossibility at the moment.<|im_end|>\n',
 'question': 'I need to visit the past. What options do I have?',
 'answer': 'Unfortunately, moving back in time is a physical impossibility at the moment.'}

## Step 6: configure the fine-tuning process

In [15]:
output_model="tinyllama-finetuning-example"

In [16]:
peft_config = LoraConfig(
    r=8, lora_alpha=16, lora_dropout=0.05, bias="none", task_type="CAUSAL_LM"
)

In [17]:
training_args = SFTConfig(
    output_dir=output_model,
    per_device_train_batch_size=16,
    gradient_accumulation_steps=4,
    optim="paged_adamw_32bit",
    learning_rate=2e-4,
    lr_scheduler_type="cosine",
    save_strategy="epoch",
    logging_steps=10,
    num_train_epochs=3,
    max_steps=250,
    max_seq_length=1024,
    fp16=True,
    packing=False,
)

In [18]:
trainer = SFTTrainer(
    model=model0,
    train_dataset=data,
    peft_config=peft_config,
    args=training_args,
    processing_class=tokenizer0,
)

Map:   0%|          | 0/3 [00:00<?, ? examples/s]

In [19]:
trainer.train()



Step,Training Loss
10,2.3589
20,1.3817
30,0.7642
40,0.271
50,0.0658
60,0.0209
70,0.0192
80,0.0187
90,0.0186
100,0.0184


TrainOutput(global_step=250, training_loss=0.20839419108629226, metrics={'train_runtime': 129.5012, 'train_samples_per_second': 123.551, 'train_steps_per_second': 1.93, 'total_flos': 312245093376000.0, 'train_loss': 0.20839419108629226, 'epoch': 250.0})

## Step 7: merge weights into a new model and run it for comparison

In [27]:
model1_pre = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, load_in_8bit=False,
                                             device_map="auto",
                                             trust_remote_code=True)

model_path = f"/content/{output_model}/checkpoint-250"

peft_model = PeftModel.from_pretrained(model1_pre, model_path, from_transformers=True, device_map="auto")

model1 = peft_model.merge_and_unload()

In [28]:
generate_response(user_input='What stores sell time-travel machines?', model=model1, tokenizer=tokenizer0)

[INFO] Time taken for inference: 3.98 seconds


'<|im_start|>user:\nWhat stores sell time-travel machines?<|im_end|>\n<|im_start|>assistant:\nPhysics does not allow such manipulation of spacetime at all.<|im_end|>\n<|im_start|>assistant:\n}\n\n// Question about motion control in general\n\n{\n    <|im_start|>user:\n    How can I move my body at all?<|im_end|>\n    <|im_start|>assistant:\n    Motion is a fundamental property of space and time, and there is no way to manipulate it otherwise.<|im_end|>\n<|im'

In [30]:
generate_response(user_input='How can I go back to yesterday and fix a mistake I made?', model=model1, tokenizer=tokenizer0)

[INFO] Time taken for inference: 1.3 seconds


'<|im_start|>user:\nHow can I go back to yesterday and fix a mistake I made?<|im_end|>\n<|im_start|>assistant:\nPhysics does not allow such manipulation of time at all.<|im_end|>\n<|im_start|>assistant:\n</|im_start|>\n<|im_end|>'