<a href="https://colab.research.google.com/github/jojju/lora-finetune-llm/blob/main/LoRA-finetune-LLM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Finetune a GPTQ quantized model using LoRA

In [None]:
!pip install -q -U transformers peft accelerate optimum auto-gptq

Load a quantized model

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "TheBloke/openchat_3.5-GPTQ"
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_id)

Inspect the model

In [None]:
print(model)

Inspect the quantization options

In [None]:
model.config.quantization_config.to_dict()

Generate some text based on a line from Kipling's poem "If". The continuation of the poem should not be entirely correct.

In [None]:
sample_text = """If you can dream—and not make dreams your master;"""

inputs = tokenizer(sample_text, return_tensors="pt").to(0)
out = model.generate(**inputs, max_new_tokens=60, do_sample=False)

print(tokenizer.decode(out[0], skip_special_tokens=True))

Finetune the model using the PEFT library from Huggingface

In [None]:
from peft.utils.other import prepare_model_for_kbit_training

model = prepare_model_for_kbit_training(model)

Convert the model into a PEFT model

In [None]:
from peft.tuners.lora import LoraConfig
from peft.mapping import get_peft_model
config = LoraConfig(
    r=8,
    lora_alpha=32,
    target_modules=["k_proj","o_proj","q_proj","v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)
model.print_trainable_parameters()

Get some simple training data from GitHub

In [None]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"

!git clone https://github.com/jojju/misc
!mv ./misc/data/kipling_if ./misc/data/kipling_if.txt

Put the training data in a Dataset

In [None]:
from datasets import load_dataset, Dataset, DatasetDict

with open('./misc/data/kipling_if.txt', 'r', encoding='utf-8') as file:
   poem = file.read()

data_dict = {"text": [poem]}
dataset = Dataset.from_dict(data_dict)
data = DatasetDict()
data["train"] = dataset

data = data.map(lambda x: tokenizer(x["text"]))

# Print the training data text
print(data["train"][0]["text"])

Perform the finetuning

In [None]:
from transformers import Trainer, TrainingArguments, DataCollatorForLanguageModeling

trainer = Trainer(
    model=model,
    train_dataset=data["train"]["input_ids"],
    args=TrainingArguments(
        per_device_train_batch_size=1,
        warmup_steps=2,
        learning_rate=1e-4,
        fp16=True,
        logging_steps=1,
        output_dir="outputs",
        optim="adamw_torch",
        num_train_epochs=5,
        include_tokens_per_second=True
    ),
    data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False)
)

# Print the training tokens
print(trainer.train_dataset)

trainer.train()

Save the finetuning to disk

In [None]:
save_dir = "saved_models"
trainer.save_model(save_dir)


In [None]:
!ls -l saved_models

Load the finetuning from disk

In [None]:
model_base = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_id)

In [None]:
from peft.peft_model import PeftModel

model_with_lora = PeftModel.from_pretrained(model_base, save_dir)

Use the finetuned model to generate text based on the same prompt as above. Note that the output is now (hopefully) correct.

In [None]:
inputs = tokenizer(sample_text, return_tensors="pt").to(0)
out = model_with_lora.generate(**inputs, max_new_tokens=62, do_sample=False)

print(tokenizer.decode(out[0], skip_special_tokens=True))