- **transformers**: A library by Hugging Face for state-of-the-art natural language processing (NLP) models.
- **datasets**: To load and process datasets.
- **bitsandbytes**: Helps with memory-efficient quantization of large models.
- **accelerate**: For distributed training and faster model inference.
- **peft**: For Parameter-Efficient Fine-Tuning (PEFT) using methods like LoRA (Low-Rank Adaptation).
- **trl**: Stands for "Transformers Reinforcement Learning," useful for tasks involving RL in NLP.
- **py7zr**: A library for handling `.7z` archive files.




In [1]:
## Install necessary libraries
!pip install transformers datasets bitsandbytes accelerate peft trl py7zr

Collecting datasets
  Downloading datasets-3.0.1-py3-none-any.whl.metadata (20 kB)
Collecting bitsandbytes
  Downloading bitsandbytes-0.44.1-py3-none-manylinux_2_24_x86_64.whl.metadata (3.5 kB)
Collecting peft
  Downloading peft-0.13.2-py3-none-any.whl.metadata (13 kB)
Collecting trl
  Downloading trl-0.11.4-py3-none-any.whl.metadata (12 kB)
Collecting py7zr
  Downloading py7zr-0.22.0-py3-none-any.whl.metadata (16 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess (from datasets)
  Downloading multiprocess-0.70.17-py310-none-any.whl.metadata (7.2 kB)
Collecting tyro>=0.5.11 (from trl)
  Downloading tyro-0.8.12-py3-none-any.whl.metadata (8.4 kB)
Collecting texttable (from py7zr)
  Downloading texttable-1.7.0-py2.py3-none-any.whl.metadata (9.8 kB)
Collecting pycryptodomex>

- **torch**: PyTorch library for deep learning tasks.
- **load_dataset**: Loads datasets, here used for fetching the samsum dataset.
- **AutoTokenizer** & **AutoModelForCausalLM**: These load the pre-trained tokenizer and model for causal language modeling.
- **BitsAndBytesConfig**: Configures the quantization (memory-efficient) aspects of the model.
- **prepare_model_for_kbit_training** & **get_peft_model**: Helps prepare the model for k-bit fine-tuning and apply LoRA configurations for efficient training.
- **TrainingArguments** & **Trainer**: Used to define training parameters and handle the training loop.


In [2]:
## Import libraries
import torch
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import prepare_model_for_kbit_training, LoraConfig, get_peft_model
from transformers import TrainingArguments, Trainer

- **load_dataset("samsum")**: Downloads and loads the dataset from the Hugging Face repository.


In [3]:
## Load the dataset
dataset = load_dataset("samsum")
print(f"Dataset loaded: {dataset}")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


samsum.py:   0%|          | 0.00/3.36k [00:00<?, ?B/s]

README.md:   0%|          | 0.00/7.04k [00:00<?, ?B/s]

The repository for samsum contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/samsum.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y


corpus.7z:   0%|          | 0.00/2.94M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/14732 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/819 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/818 [00:00<?, ? examples/s]

Dataset loaded: DatasetDict({
    train: Dataset({
        features: ['id', 'dialogue', 'summary'],
        num_rows: 14732
    })
    test: Dataset({
        features: ['id', 'dialogue', 'summary'],
        num_rows: 819
    })
    validation: Dataset({
        features: ['id', 'dialogue', 'summary'],
        num_rows: 818
    })
})


## Authentication  

In [4]:
from huggingface_hub import login
login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

- **tokenizer**: Converts text into a format suitable for the model. Here, it's loaded from the pretrained "Gemma-2B" model.
- **tokenizer.pad_token**: Sets the padding token to be the end-of-sequence (EOS) token.


In [5]:
## Load the Gemma tokenizer and model
model_name = "google/gemma-2b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token

tokenizer_config.json:   0%|          | 0.00/33.6k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

- **load_in_4bit=True**: Loads the model in a more memory-efficient 4-bit format.
- **bnb_4bit_use_double_quant=True**: Uses double quantization to further optimize memory usage.
- **bnb_4bit_quant_type="nf4"**: Uses a specific type of quantization for higher accuracy (NF4 type).
- **bnb_4bit_compute_dtype=torch.bfloat16**: Sets the computation to use BFloat16 for efficient mixed precision.


In [6]:
# Configure quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto",
)

config.json:   0%|          | 0.00/627 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/13.5k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/67.1M [00:00<?, ?B/s]

`config.hidden_act` is ignored, you should use `config.hidden_activation` instead.
Gemma's activation function will be set to `gelu_pytorch_tanh`. Please, use
`config.hidden_activation` if you want to override this behaviour.
See https://github.com/huggingface/transformers/pull/29402 for more details.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

- **device_map="auto"**: Automatically maps the model layers to available devices (CPU/GPU).


In [7]:
## Prepare the model for k-bit training
model = prepare_model_for_kbit_training(model)

- **r=16**: Sets the rank of the low-rank matrix in LoRA.
- **lora_alpha=32**: A scaling factor for LoRA.
- **lora_dropout=0.05**: Sets the dropout rate during training.
- **task_type="CAUSAL_LM"**: Specifies that the task is causal language modeling (predicting the next word).


In [8]:
     ## Define LoRA configuration
peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)


- **get_peft_model**: Wraps the original model with the LoRA parameters for efficient training.


In [9]:
## Get the PEFT model
model = get_peft_model(model, peft_config)

- **tokenize_function**: Tokenizes each dialogue and its corresponding summary.
- **labels["input_ids"]**: Replaces the padding tokens with -100 so they are ignored during loss computation.
- **dataset.map**: Applies the tokenization function to the entire dataset.


In [10]:
## Tokenize the dataset
def tokenize_function(examples):
    prompt = "Summarize the following conversation:\n\n"
    inputs = [prompt + dialog for dialog in examples["dialogue"]]
    targets = [summary for summary in examples["summary"]]

    model_inputs = tokenizer(inputs, max_length=512, padding="max_length", truncation=True)
    labels = tokenizer(targets, max_length=512, padding="max_length", truncation=True)

    # Replace padding token IDs with -100 to ignore them in the loss computation
    labels["input_ids"] = [[-100 if token == tokenizer.pad_token_id else token for token in label] for label in labels["input_ids"]]

    model_inputs["labels"] = labels["input_ids"]
    return model_inputs



tokenized_datasets = dataset.map(tokenize_function, batched=True, remove_columns=dataset["train"].column_names)

Map:   0%|          | 0/14732 [00:00<?, ? examples/s]

Map:   0%|          | 0/819 [00:00<?, ? examples/s]

Map:   0%|          | 0/818 [00:00<?, ? examples/s]

- **output_dir**: Directory to save the results.
- **num_train_epochs=3**: Train for 3 epochs.
- **batch sizes**: Set the batch size for training and evaluation.
- **warmup_steps**: Number of warm-up steps during training.
- **evaluation_strategy="steps"**: Evaluates the model periodically.
- **gradient_accumulation_steps**: Accumulates gradients over multiple steps for memory efficiency.


In [11]:
## Define training arguments
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir='./logs',
    logging_steps=10,
    evaluation_strategy="steps",
    eval_steps=500,
    save_steps=1000,
    gradient_accumulation_steps=8,
#     fp16=True,  # Enable mixed precision training
)



In [12]:
## Define Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["test"],
)


In [None]:

## Train the model
trainer.train()

In [None]:
## Evaluate the model
evaluation = trainer.evaluate()
print(f"Evaluation results: {evaluation}")

In [None]:
# Cell 13: Test the fine-tuned model
test_input = "Summarize the following conversation:\n\nJohn: Hey, how's it going?\nSarah: Not bad, just finished a big project at work.\nJohn: That's great! Want to grab dinner to celebrate?\nSarah: Sure, that sounds fun. Where should we go?\nJohn: How about that new Italian place downtown?\nSarah: Perfect, I've been wanting to try it. See you at 7?"

inputs = tokenizer(test_input, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print("Generated summary:")
print(tokenizer.decode(outputs[0], skip_special_tokens=True))


In [None]:
# Cell 14: Save the fine-tuned model
model.save_pretrained("./fine_tuned_gemma_qlora")
tokenizer.save_pretrained("./fine_tuned_gemma_qlora")