

Articoli per utleriori spiegazioni:

https://medium.com/@kshitiz.sahay26/fine-tuning-llama-2-for-news-category-prediction-a-step-by-step-comprehensive-guide-to-fine-tuning-48c06dee28a9

https://medium.com/@jain.sm/finetuning-llama-2-0-on-colab-with-1-gpu-7ea73a8d3db9


### Installing Required Libraries

First, we will install some required libraries.

`transformers`: for loading a large language model and fine-tuning it.

`bitsandbytes`: for loading the model in 4-bit precision.

`accelerate`: for training models and performing inference at scale.

`peft`: for fine-tuning a small number of parameters.

`trl`: for training transformer language models using Reinforcement Learning.

In [None]:
!huggingface-cli login
# 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
# and insert your token


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
    Setting a new token will erase the existing one.
    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 
Add token as git credential? (Y/n) n
Token is valid (permission: read).
Your token has been saved to /root/.ca

In [None]:
!pip install -q accelerate==0.21.0 --progress-bar off
!pip install -q peft==0.4.0 --progress-bar off
!pip install -q bitsandbytes==0.40.2 --progress-bar off
!pip install -q transformers==4.31.0 --progress-bar off
!pip install -q trl==0.4.7 --progress-bar off

In [None]:
import os
from random import randrange
from functools import partial
import torch
from datasets import load_dataset
from transformers import (AutoModelForCausalLM,
                          AutoTokenizer,
                          BitsAndBytesConfig,
                          HfArgumentParser,
                          Trainer,
                          TrainingArguments,
                          DataCollatorForLanguageModeling,
                          EarlyStoppingCallback,
                          pipeline,
                          logging,
                          set_seed)

import bitsandbytes as bnb
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training, PeftModel, AutoPeftModelForCausalLM
from trl import SFTTrainer
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### Hugging Face Hub Login

Meta's family of Llama 2 models is gated. You will require approval to access it using the Hugging Face Hub.

Below are the steps to request permission for the Llama-2-7B model:
1. Get approval from Hugging Face (https://huggingface.co/meta-llama/Llama-2-7b-hf).
2. Get approval from Meta (https://ai.meta.com/resources/models-and-libraries/llama-downloads/).
3. Create a READ access token on Hugging Face (https://huggingface.co/settings/tokens).
4. Execute `!huggingface-cli login` in Google Colab Notebook, enter the token, and enter "Y."

Note: Make sure your email address on your Hugging Face account is the same as the one you enter on Meta's website for approval.

### Creating Bitsandbytes Configuration

Before loading the model, we will define a function `create_bnb_config` to define the `bitsandbytes` configuration. The `bitsandbytes` library allows model quantization. Quantization is a technique used to compress deep learning models by reducing the number of bits used to represent their weights and activations. This compression allows for faster inference and reduced memory consumption, making it possible to deploy these models on edge devices with limited resources.

By using 4-bit transformer language models, we can achieve impressive results while significantly reducing memory and computational requirements.

Hugging Face Transformers (`transformers`) is closely integrated with `bitsandbytes`. The `BitsAndBytesConfig` class from the `transformers` library allows configuring the model quantization method.

Parameters:

`load_in_4bit`: Load the model in 4-bit precision, i.e., divide memory usage by 4.

`bnb_4bit_use_double_quant`: Use nested quantization techniques for more memory-efficient inference at no additional cost.

`bnb_4bit_quant_type`: Set quantization data type. The options are either FP4 (4-bit precision), which is the default quantization data type, or NF4 (Normal Float 4), a new 4-bit data type adapted for weights that have been initialized using a normal distribution.

`bnb_4bit_compute_dtype`: Set the computational data type for 4-bit models. Default value: torch.float32

In [None]:
bnb_config_parameters = {
    "load_in_4bit" : True, # Activate 4-bit precision base model loading
    "bnb_4bit_use_double_quant" : True, # Activate nested quantization for 4-bit base models (double quantization)
    "bnb_4bit_quant_type" : "nf4", # Quantization type (fp4 or nf4)
    "bnb_4bit_compute_dtype" : torch.bfloat16 # Compute data type for 4-bit base models
}

def create_bnb_config(bnb_config_parameters):
    """ Configures model quantization method using bitsandbytes to speed up training and inference
    :param load_in_4bit: Load model in 4-bit precision mode
    :param bnb_4bit_use_double_quant: Nested quantization for 4-bit model
    :param bnb_4bit_quant_type: Quantization data type for 4-bit model
    :param bnb_4bit_compute_dtype: Computation data type for 4-bit model """

    bnb_config = BitsAndBytesConfig(
        load_in_4bit = bnb_config_parameters['load_in_4bit'],
        bnb_4bit_use_double_quant = bnb_config_parameters['bnb_4bit_use_double_quant'],
        bnb_4bit_quant_type = bnb_config_parameters['bnb_4bit_quant_type'],
        bnb_4bit_compute_dtype = bnb_config_parameters['bnb_4bit_compute_dtype'],
    )

    return bnb_config

### Loading Hugging Face Model and Tokenizer

We will now define a function `load_model` that accepts the model name (`model_name`) from Hugging Face Hub and the `bitsandbytes` configuration for model quantization.

In this function, we will perform the following steps:
 1. Get the number of GPUs available.
 2. Set the maximum GPU memory.
 3. Use the from_pretrained` method from the `AutoModelForCausalLM` class to load a pre-trained Hugging Face model in 4-bit precision using the model name and the quantization configuration.
 4. Set which device to send the model to using `device_map`. Passing `device_map = 0` means putting the whole model on GPU 0. Other inputs could be `cpu`, `cuda:1`, etc. Setting `device_map = auto` will let `accelerate` compute the most optimized `device_map` automatically.
 5. Set `max_memory`, a dictionary device identifier, to maximum memory, which will default to the maximum memory available for each GPU and the available CPU RAM if unset.
 6. Load the model tokenizer from the model name on Hugging Face.
 7. Set a padding token to ensure shorter sequences will have the same length as the longest sequence in a batch. In this case, we will set the EOS (End of Sentence) token as the padding token.

 **Important Note:  A tokenizer for a model will preprocess and tokenize (convert letters/words/sub-words to tokens or numbers) the input in a way that the model expects. Model tokenizers are also responsible for correctly applying special tokens and certain special embeddings or positional encoders specific to a model in the input.**

In [None]:
def load_model(model_name, bnb_config):
    """ Loads model and model tokenizer
    :param model_name: Hugging Face model name
    :param bnb_config: Bitsandbytes configuration """

    # Get number of GPU device and set maximum memory
    n_gpus = torch.cuda.device_count()
    max_memory = f'{40960}MB'

    # Load model
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        quantization_config = bnb_config,
        device_map = "auto", # dispatch the model efficiently on the available resources
        max_memory = {i: max_memory for i in range(n_gpus)},
    )

    # Load model tokenizer with the user authentication token
    tokenizer = AutoTokenizer.from_pretrained(model_name, use_auth_token = True)

    # Set padding token as EOS token
    tokenizer.pad_token = tokenizer.eos_token

    return model, tokenizer

### Creating PEFT Configuration


Fine-tuning pretrained LLMs on downstream datasets results in huge performance gains when compared to using the pretrained LLMs out-of-the-box. However, as models get larger and larger, full fine-tuning becomes infeasible to train on consumer hardware. In addition, storing and deploying fine-tuned models independently for each downstream task becomes very expensive, because fine-tuned models are the same size as the original pretrained model. Parameter-Efficient Fine-tuning (PEFT) approaches are meant to address both problems!


PEFT approaches only fine-tune a small number of (extra) model parameters while freezing most parameters of the pretrained LLMs, thereby greatly decreasing the computational and storage costs. It also helps in portability, wherein users can tune models using PEFT methods to get tiny checkpoints worth a few MB compared to the large checkpoints of full fine-tuning.


**In short, PEFT approaches enable you to get performance comparable to full fine-tuning while only having a small number of trainable parameters.**


Hugging Face provides the PEFT library, which provides the latest Parameter-Efficient Fine-tuning techniques seamlessly integrated with Hugging Face Transformers and Hugging Face Accelerate.


There are several PEFT methods. In the next cell, we will use QLoRA, one of the latest methods that reduces the memory usage of LLM finetuning without performance tradeoffs, using the `LoraConfig` class from the `peft` library.


QLoRA uses 4-bit quantization to compress a pretrained language model. The LM parameters are then frozen, and a relatively small number of trainable parameters are added to the model in the form of Low-Rank Adapters. During finetuning, QLoRA backpropagates gradients through the frozen 4-bit quantized pretrained language model into the Low-Rank Adapters. The LoRA layers are the only parameters being updated during training.

In [None]:
lora_config_parameters = {
    'r' : 16,   # LoRA attention dimension
    'lora_alpha' : 64,  # Alpha parameter for LoRA scaling
    'lora_dropout' : 0.1,   # Dropout probability for LoRA layers
    'bias' : 'none',    # Bias
    'task_type' : 'CAUSAL_LM',  # Task type
}

def create_peft_config(lora_config_parameters, target_modules):
    """ Creates Parameter-Efficient Fine-Tuning configuration for the model
    :param r: LoRA attention dimension
    :param lora_alpha: Alpha parameter for LoRA scaling
    :param modules: Names of the modules to apply LoRA to
    :param lora_dropout: Dropout Probability for LoRA layers
    :param bias: Specifies if the bias parameters should be trained """
    config = LoraConfig(
        r = lora_config_parameters['r'],
        lora_alpha = lora_config_parameters['lora_alpha'],
        target_modules = target_modules,
        lora_dropout = lora_config_parameters['lora_dropout'],
        bias = lora_config_parameters['bias'],
        task_type = lora_config_parameters['task_type'],
    )

    return config

### Finding Modules for LoRA Application

In the next cell, we will define the `find_all_linear_names` function to find the module to apply LoRA to. This function will get the module names from `model.named_modules()` and store it in a set to keep distinct module names.

In [None]:
def find_all_linear_names(model):
    """ Find modules to apply LoRA to.
    :param model: PEFT model """

    cls = bnb.nn.Linear4bit
    lora_module_names = set()
    for name, module in model.named_modules():
        if isinstance(module, cls):
            names = name.split('.')
            lora_module_names.add(names[0] if len(names) == 1 else names[-1])

    if 'lm_head' in lora_module_names:
        lora_module_names.remove('lm_head')
    print(f"LoRA module names: {list(lora_module_names)}")
    return list(lora_module_names)

### Calculating Trainable Parameters

We can use the `print_trainable_parameters` function to find out the number and percentage of trainable model parameters. This function will calculate the number of total parameters in `model.named_parameters()` and then those that would get updated.

In [None]:
def print_trainable_parameters(model, use_4bit = False):
    """ Prints the number of trainable parameters in the model.
    :param model: PEFT model """

    trainable_params = 0
    all_param = 0

    for _, param in model.named_parameters():
        num_params = param.numel()
        if num_params == 0 and hasattr(param, "ds_numel"):
            num_params = param.ds_numel
        all_param += num_params
        if param.requires_grad:
            trainable_params += num_params

    if use_4bit:
        trainable_params /= 2

    print(
        f"All Parameters: {all_param:,d} || Trainable Parameters: {trainable_params:,d} || Trainable Parameters %: {100 * trainable_params / all_param}"
    )

### Fine-tuning the Pre-trained Model

We will create `fine_tune`, our final function, to wrap everything we have done so far and initiate the fine-tuning process. This function will perform the following model preprocessing operations to prepare it for training:


1. Enable gradient checkpointing to reduce memory usage during fine-tuning.
2. Use the `prepare_model_for_kbit_training` function from PEFT to prepare the model for fine-tuning.
3. Call find_all_linear_names` to get the module names to apply LoRA to.
4. Create LoRA configuration by calling the `create_peft_config` function.
5. Wrap the base Hugging Face model for fine-tuning to PEFT using the `get_peft_model` function.
6. Print the trainable parameters.


For training, we will instantiate a `Trainer()` object within the `fine_tune` function. This class requires the model, preprocessed dataset, and training arguments, listed below.


`per_device_train_batch_size`: The batch size per GPU/TPU/CPU for training.


`gradient_accumulation_steps`: Number of update steps to accumulate the gradients for, before performing a backward/update pass.


`warmup_steps`: Number of steps used for a linear warmup from 0 to `learning_rate`.


`max_steps`: If set to a positive number, the total number of training steps to perform.


`learning_rate`: The initial learning rate for Adam.


`fp16`: Whether to use 16-bit (mixed) precision training (through NVIDIA apex) instead of 32-bit training.


`logging_steps`: Number of update steps between two logs.


`output_dir`: The output directory where the model predictions and checkpoints will be written.


`optim`: The optimizer to use for training.


Next, we will use the `train` method on the trainer` object to start the training and log and save the model metrics on the training dataset. Finally, we will save the model checkpoint (model weights, configuration file, and tokenizer) in the output directory and delete the model to free up memory. You can load the model for inference later using its saved checkpoint.

In [None]:
def fine_tune(model, tokenizer, dataset_train, dataset_test,
#def fine_tune(model, tokenizer, dataset_train,
            lora_config_parameters, training_parameters):
    """ Prepares and fine-tune the pre-trained model.
    :param model: Pre-trained Hugging Face model
    :param tokenizer: Model tokenizer
    :param dataset: Preprocessed training dataset """

    # Enable gradient checkpointing to reduce memory usage during fine-tuning
    model.gradient_checkpointing_enable()

    # Prepare the model for training
    model = prepare_model_for_kbit_training(model)

    # Get LoRA module names
    target_modules = find_all_linear_names(model)

    # Create PEFT configuration for these modules and wrap the model to PEFT
    peft_config = create_peft_config(lora_config_parameters, target_modules)
    model = get_peft_model(model, peft_config)

    # Print information about the percentage of trainable parameters
    print_trainable_parameters(model)

    # Training parameters
    trainer = Trainer(
        model = model,
        train_dataset = dataset_train,
        eval_dataset = dataset_test, # inseriamo e vediamo come performa il modello
                                    # lo stiamo mettendo come "validation set"
        args = TrainingArguments(
            per_device_train_batch_size = training_parameters["per_device_train_batch_size"],
            gradient_accumulation_steps = training_parameters["gradient_accumulation_steps"],
            warmup_steps = training_parameters["warmup_steps"],
            #max_steps = training_parameters["max_steps"],
            num_train_epochs = training_parameters["max_steps"],
            learning_rate = training_parameters["learning_rate"],
            fp16 = training_parameters["fp16"],
            logging_steps = training_parameters["logging_steps"],
            output_dir = training_parameters["output_dir"],
            optim = training_parameters["optim"],
        ),
        data_collator = DataCollatorForLanguageModeling(tokenizer, mlm = False)
    )

    model.config.use_cache = False

    do_train = True

    # Launch training and log metrics
    print("Training...")

    if do_train:
        train_result = trainer.train()
        metrics = train_result.metrics
        trainer.log_metrics("train", metrics)
        trainer.save_metrics("train", metrics)
        trainer.save_state()
        print(metrics)
        print('-----')
        trainer.log_metrics("eval", metrics)
        trainer.save_metrics("eval", metrics)
        print(metrics)

    # Save model
    print("Saving last checkpoint of the model...")
    os.makedirs(training_parameters["output_dir"], exist_ok = True)
    trainer.model.save_pretrained(training_parameters["output_dir"])

    # Free memory for merging weights
    del model
    del trainer
    torch.cuda.empty_cache()

In [None]:
training_parameters = {
    "model_name" : "meta-llama/Llama-2-7b-hf", # The pre-trained model from the Hugging Face Hub to load and fine-tune
    "output_dir" : "./results", # Output directory where the model predictions and checkpoints will be stored
    "per_device_train_batch_size" : 1, # Batch size per GPU for training
    "gradient_accumulation_steps" : 4, # Number of update steps to accumulate the gradients for
    "learning_rate" : 2e-4, # Initial learning rate (AdamW optimizer)
    "optim" : "paged_adamw_32bit", # Optimizer to use
    "max_steps" : 20, # Number of training steps (overrides num_train_epochs)
    "warmup_steps" : 2, # Linear warmup steps from 0 to learning_rate
    "fp16" : True, # Enable fp16/bf16 training (set bf16 to True with an A100)
    "logging_steps" : 1, # Log every X updates steps
}


# Load model from Hugging Face Hub with model name and bitsandbytes configuration

bnb_config = create_bnb_config(bnb_config_parameters)
model, tokenizer = load_model(training_parameters['model_name'], bnb_config)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]



### Loading Dataset

Now that we have loaded the Llama-2-7B model and its tokenizer, we will move on to loading our news classification instruction dataset from the previous blog as a Hugging Face `Datasets`.

Firstly, we will initialize the path of the dataset. In this case, we have a CSV file that contains 99 records, or prompts. This dataset contains an `instruction` column containing the instruction to categorize a news article into 18 categories, an `input` column containing the news article, and an `output` column containing the actual news category for training.

We will use the `load_dataset` function and pass the file location. We will define a generic dataset builder name `csv` because our dataset is a CSV file. You can similarly load a JSON file by passing `json` and the dataset location to a JSON file. All the records are assigned to the `train` split by default, which we would retrieve using the `split` parameter.

In [None]:
# Loading dataset
root = "/content/drive/MyDrive/Colab Notebooks/torch/"

# The instruction dataset to use
dataset_name_train = root+"data/BBC-text/bbc-text-train.csv"
dataset_name_test = root+"data/BBC-text/bbc-text-test.csv"

# Load dataset
# il parametro split riguarda il modo in cui sono stati creati i dataset
# sono entrambi a train perché abbiamo creato un csv con i dati di train e uno per il test manualmente
dataset_train = load_dataset("csv", data_files = dataset_name_train, split = "train")
dataset_test = load_dataset("csv", data_files = dataset_name_test, split = "train")

In [None]:
print(f'Number of prompts: {len(dataset_train)}')
print(f'Column names are: {dataset_train.column_names}')
print('-----')
print(f'Number of prompts: {len(dataset_test)}')
print(f'Column names are: {dataset_test.column_names}')

Number of prompts: 1780
Column names are: ['instruction', 'input', 'output']
-----
Number of prompts: 445
Column names are: ['instruction', 'input', 'output']


In [None]:
# The load_dataset function will convert the CSV file into a dictionary of prompts.
# We can look at a random prompt in the dataset using a random index.
dataset_train[randrange(len(dataset_train))]

{'instruction': 'Categorize the news article into one of the 5 categories:\n\nbusiness\npolitics\ntech\nsport\nentertainment\n\n',
 'input': 'o connor aims to grab opportunity johnny o connor is determined to make a big impression when he makes his rbs six nations debut for ireland against scotland on saturday.  the wasps flanker replaces denis leamy but o connor knows that the munster man will be pushing hard for a recall for the following game against england.  it s a  horses for courses  selection really   said o connor.  there s a lot of competition here and i can t just drag my heels around if i don t get picked.  it looks a definite head-to-head battle between himself and 23-year-old leamy - three stone heavier than o connor - for the number seven role against the world champions. nonetheless  all o connor is currently concerned about is making an impression while winning his third cap.   missing the italian game was disappointing certainly  but you can t dwell on these things - 

### Creating Prompt Template

After loading the instruction dataset, we will define the `create_prompt_formats` function to create a prompt template against each prompt in our dataset and save it in a new dictionary key `text` for further data preprocessing and fine-tuning.

In [None]:
def create_prompt_formats(sample):
    """ Creates a formatted prompt template for a prompt in the instruction dataset
    :param sample: Prompt or sample from the instruction dataset """

    # Initialize static strings for the prompt template
    INTRO_BLURB = "Below is an instruction that describes a task. Write a response that appropriately completes the request."
    INSTRUCTION_KEY = "### Instruction:"
    INPUT_KEY = "Input:"
    RESPONSE_KEY = "### Response:"
    END_KEY = "### End"

    # Combine a prompt with the static strings
    blurb = f"{INTRO_BLURB}"
    instruction = f"{INSTRUCTION_KEY}\n{sample['instruction']}"
    input_context = f"{INPUT_KEY}\n{sample['input']}" if sample["input"] else None
    response = f"{RESPONSE_KEY}\n{sample['output']}"
    end = f"{END_KEY}"

    # Create a list of prompt template elements
    parts = [part for part in [blurb, instruction, input_context, response, end] if part]

    # Join prompt template elements into a single string to create the prompt template
    formatted_prompt = "\n\n".join(parts)

    # Store the formatted prompt template in a new key "text"
    sample["text"] = formatted_prompt

    return sample

In [None]:
create_prompt_formats(dataset_train[randrange(len(dataset_train))])

{'instruction': 'Categorize the news article into one of the 5 categories:\n\nbusiness\npolitics\ntech\nsport\nentertainment\n\n',
 'input': 'how political squabbles snowball it s become commonplace to argue that blair and brown are like squabbling school kids and that they (and their supporters) need to grow up and stop bickering.  but this analysis in fact gets it wrong. it s not just children who fight - adults do too. and there are solid reasons why even a trivial argument between mature protagonists can be hard to stop once its got going. the key feature of an endless feud is that everyone can agree they d be better off if it ended - but everyone wants to have the last word.  each participant genuinely wants the row to stop  but thinks it worth prolonging the argument just a tiny bit to ensure their view is heard. their successive attempts to end the argument with their last word ensure the argument goes on and on and on. (in the case of mr blair and mr brown  successive books are

### Getting Maximum Sequence Length of the Pre-trained Model

In the next cell, we will define the `get_max_length` function to find out the maximum sequence length of the Llama-2-7B model. This function will pull the model configuration and attempt to find the maximum sequence length from one of the several configuration keys that may contain it. If the maximum sequence length is not found, it will default to 1024. We will use the maximum sequence length during dataset preprocessing to remove records that exceed that context length because the pre-trained model won't accept them.

In [None]:
def get_max_length(model):
    """ Extracts maximum token length from the model configuration
    :param model: Hugging Face model """

    # Pull model configuration
    conf = model.config
    # Initialize a "max_length" variable to store maximum sequence length as null
    max_length = None
    # Find maximum sequence length in the model configuration and save it in "max_length" if found
    for length_setting in ["n_positions", "max_position_embeddings", "seq_length"]:
        max_length = getattr(model.config, length_setting, None)
        if max_length:
            print(f"Found max lenth: {max_length}")
            break
    # Set "max_length" to 1024 (default value) if maximum sequence length is not found in the model configuration
    if not max_length:
        max_length = 1024
        print(f"Using default max length: {max_length}")
    return max_length

### Tokenizing Dataset Batch

The user-defined `preprocess_batch` function will tokenize a batch of the input dataset (`batch`) using the `tokenizer` object. We will set the maximum sequence length using the `max_length` parameter, which will control the maximum length used by the padding or truncation parameter. `truncation = True` will truncate the input to the maximum length provided by the `max_length` parameter. Similarly, `padding = max_length` will pad the input to the maximum length provided. This function will be called in the `preprocess_dataset` function defined next.

In [None]:
def preprocess_batch(batch, tokenizer, max_length):
    """ Tokenizes dataset batch
    :param batch: Dataset batch
    :param tokenizer: Model tokenizer
    :param max_length: Maximum number of tokens to emit from the tokenizer """

    return tokenizer(
        batch["text"],
        max_length = max_length,
        truncation = True,
    )

### Preprocessing Dataset

To preprocess the complete dataset for fine-tuning, we will define the `preprocess_dataset` function, which will perform the following operations:

1. Create the formatted prompts against each prompt in the instruction dataset using the `create_prompt_formats` function.
2. Tokenize the dataset in batches using the `preprocess_batch` function and removing the original dictionary keys (instruction, input, output, and text).
3. Filter out prompts with input token sizes exceeding the maximum length.
4. Shuffle the dataset using a random seed.

In [None]:
def preprocess_dataset(tokenizer: AutoTokenizer, max_length: int, seed, dataset: str):
    """ Tokenizes dataset for fine-tuning
    :param tokenizer (AutoTokenizer): Model tokenizer
    :param max_length (int): Maximum number of tokens to emit from the tokenizer
    :param seed: Random seed for reproducibility
    :param dataset (str): Instruction dataset """

    # Add prompt to each sample
    print("Preprocessing dataset...")
    dataset = dataset.map(create_prompt_formats)

    # Apply preprocessing to each batch of the dataset & and remove "instruction", "input", "output", and "text" fields
    _preprocessing_function = partial(preprocess_batch, max_length = max_length, tokenizer = tokenizer)
    dataset = dataset.map(
        _preprocessing_function,
        batched = True,
        remove_columns = ["instruction", "input", "output", "text"],
    )

    # Filter out samples that have "input_ids" exceeding "max_length"
    dataset = dataset.filter(lambda sample: len(sample["input_ids"]) < max_length)

    # Shuffle dataset
    dataset = dataset.shuffle(seed = seed)

    return dataset

In [None]:
# Random seed
seed = 33

max_length = get_max_length(model)
preprocessed_dataset_train = preprocess_dataset(tokenizer, max_length, seed, dataset_train)
preprocessed_dataset_test = preprocess_dataset(tokenizer, max_length, seed, dataset_test)

Found max lenth: 4096
Preprocessing dataset...
Preprocessing dataset...


In [None]:
print(preprocessed_dataset_train)
print(preprocessed_dataset_train[0])

Dataset({
    features: ['input_ids', 'attention_mask'],
    num_rows: 1778
})
{'input_ids': [1, 13866, 338, 385, 15278, 393, 16612, 263, 3414, 29889, 14350, 263, 2933, 393, 7128, 2486, 1614, 2167, 278, 2009, 29889, 13, 13, 2277, 29937, 2799, 4080, 29901, 13, 29907, 20440, 675, 278, 9763, 4274, 964, 697, 310, 278, 29871, 29945, 13997, 29901, 13, 13, 8262, 3335, 13, 20087, 1199, 13, 11345, 13, 29879, 637, 13, 296, 13946, 358, 13, 13, 13, 13, 4290, 29901, 13, 29890, 493, 20050, 411, 260, 513, 497, 8494, 15840, 398, 3737, 446, 260, 513, 497, 269, 10823, 756, 1370, 9571, 27683, 896, 505, 2745, 2446, 4723, 304, 11157, 1009, 8078, 5957, 304, 278, 3033, 1049, 767, 470, 12045, 19035, 1075, 304, 263, 17055, 4402, 29889, 29871, 652, 1129, 394, 492, 4083, 540, 756, 4520, 385, 5957, 363, 260, 513, 497, 607, 270, 4495, 5847, 27683, 269, 5376, 322, 393, 1023, 916, 17651, 864, 304, 5193, 29889, 29871, 3737, 446, 947, 451, 864, 304, 748, 964, 278, 4832, 19079, 15982, 292, 1048, 988, 540, 674, 367, 874

In [None]:
fine_tune(model, tokenizer, preprocessed_dataset_train, preprocessed_dataset_test, lora_config_parameters, training_parameters)
#fine_tune(model, tokenizer, dataset_train, lora_config_parameters, training_parameters)

LoRA module names: ['v_proj', 'k_proj', 'q_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj']
All Parameters: 3,540,389,888 || Trainable Parameters: 39,976,960 || Trainable Parameters %: 1.1291682911958425
Training...


You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
1,2.0249
2,1.9982
3,1.9895
4,2.0948
5,1.9587
6,1.6828
7,1.7546
8,1.7941
9,1.5855
10,1.7447


OutOfMemoryError: ignored