# How to FineTune Llama 3 with  SFTTrainer and  Unsloth
Hello everyone, today we are going to show how we can Fine Tune Llama 3 with SFTTrainer and Unsloth
First we are going to perform a simmple Fine Tunning by using SFTTrainer


## Step 1 - Installation of Pytorch
The first step is install pythorch v 2.2.1 with Cuda 12.1 

In [1]:
!pip install --upgrade pip
!pip install -q -U git+https://github.com/huggingface/transformers.git --quiet
!pip install trl wandb --quiet
!pip install "unsloth[cu121-torch240] @ git+https://github.com/unslothai/unsloth.git"  --quiet
!pip install --no-deps xformers trl peft accelerate bitsandbytes --quiet
!pip install  --upgrade --quiet \
  "datasets>=2.21.0" \
  "evaluate==0.4.1" \
  "pillow" \
  "hyperopt" \
  "optuna" \
  "protobuf>=4.21.1"

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
unsloth-zoo 2024.11.8 requires protobuf<4.0.0, but you have protobuf 5.28.3 which is incompatible.
kfp 2.9.0 requires protobuf<5,>=4.21.1, but you have protobuf 5.28.3 which is incompatible.
kfp-kubernetes 1.3.0 requires protobuf<5,>=4.21.1, but you have protobuf 5.28.3 which is incompatible.
kfp-pipeline-spec 0.4.0 requires protobuf<5,>=4.21.1, but you have protobuf 5.28.3 which is incompatible.[0m[31m
[0m

## Step 3 - Installation of Uslotch packages

## Step 4 - Analysis of our infrastructure
In ordering to perform any training it is important to know our system in order to take the full advantage of the system.

## Step 5 Login to Hugging Face

In [2]:
from huggingface_hub import login
login(
  token="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", # ADD YOUR TOKEN HERE
)

**What is SFTTrainer?**

`SFTTrainer` is a class from the `trl` library that implements the SFT algorithm. It is a specialized trainer class that is designed to work with the SFT method. The `SFTTrainer` class takes in a pre-trained model, a dataset, and a set of hyperparameters, and fine-tunes the model using the SFT algorithm.

**What is the difference between SFTTrainer and Trainer?**

The main difference between `SFTTrainer` and the `Trainer` class from the `transformers` library is the fine-tuning algorithm used. The `Trainer` class uses the standard fine-tuning algorithm, where all the model's weights are updated during training. In contrast, the `SFTTrainer` class uses the SFT algorithm, which only updates a small subset of the model's weights. This makes `SFTTrainer` more efficiend suitable for large language models.

**Key differences between SFTTrainer and Trainer**

Here is a table summarizing the key differences between `SFTTrainer` and `Trainer`:

|  | SFTTrainer | Trainer |
| --- | --- | --- |
| Fine-tuning algorithm | Sparse Fine-Tuning (SFT) | Standard fine-tuning |
| Weights updated | Only a small subset of weights | All weights |
| Efficiency | More efficient for large models | Less efficient for large models |
| Suitable for | Large language models | Small to medium-sized models |
| Library | `trl` library | `transformers` library |

Features

| Feature | SFTTrainer | Trainer |
| --- | --- | --- |
| Complexity | Simple, lightweight | More comprehensive, feature-rich |
| Customization | Limited options | Advanced customization options |
| Ease of use | Easy to use, minimal code | More code required, steeper learning curve |
| Integration | Standalone trainer | Part of Hugging Face Transformers library |
| Use cases | Quick fine-tuning, prototyping | Large-scale training, complex models |

Note that the `SFTTrainer` class is specifically designed for sparse fine-tuning, while the `Trainer` class is a more general-purpose trainer class that can be used for a variety of fine-tuning tasks.

##  How to FineTune with SFTTrainer

First let us show the simplest method that is given by  SFTTrainer

In [None]:
from datasets import load_dataset
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, AutoTokenizer
from peft import LoraConfig
from trl import SFTTrainer
from transformers import TrainingArguments
# Load the dataset
dataset_name = "ruslanmv/ai-medical-dataset"
dataset = load_dataset(dataset_name, split="train")
# Select the first 1000 rows of the dataset
dataset = dataset.select(range(100))
# Device map
device_map = 'auto'  # for PP and running with `python test_sft.py`
# Load the model + tokenizer
model_name = "meta-llama/Meta-Llama-3-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    trust_remote_code=True,
    use_cache=False,
    device_map=device_map
)
# PEFT config
lora_alpha = 16
lora_dropout = 0.1
lora_r = 32  # 64
peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=["k_proj", "q_proj", "v_proj", "up_proj", "down_proj", "gate_proj"],
    modules_to_save=["embed_tokens", "input_layernorm", "post_attention_layernorm", "norm"],
)
# Args
max_seq_length = 512
output_dir = "./results"
per_device_train_batch_size = 2  # reduced batch size to avoid OOM
gradient_accumulation_steps = 2
optim = "adamw_torch"
save_steps = 10
logging_steps = 1
learning_rate = 2e-4
max_grad_norm = 0.3
max_steps = 1  
warmup_ratio = 0.1
lr_scheduler_type = "cosine"
training_arguments = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    fp16=True,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=True,
    lr_scheduler_type=lr_scheduler_type,
    gradient_checkpointing=True,  # gradient checkpointing
    #report_to="wandb",
)
# Trainer
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_config,
    dataset_text_field="context",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
)

# Train :)
trainer.train()


# Save model local
trainer.save_model("ai-medical-model")
tokenizer.save_pretrained("ai-medical-model")

Resolving data files:   0%|          | 0/18 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/18 [00:00<?, ?it/s]

Loading dataset shards:   0%|          | 0/18 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]


Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.
[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
[34m[1mwandb[0m: Currently logged in as: [33mliuxiangwin[0m ([33mliuxiangwin-free[0m). Use [1m`wandb login --relogin`[0m to force relogin


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


## Inference 

In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
max_seq_length = 2048
dtype = None
load_in_4bit = True
fine_tuned_model = AutoModelForCausalLM.from_pretrained("ai-medical-model", load_in_4bit=load_in_4bit)
tokenizer = AutoTokenizer.from_pretrained("ai-medical-model")
# Prepare the model for inference
fine_tuned_model.eval()




In [None]:
question="This is the question: What was the main cause of the inflammatory CD4+ T cells?"
prompt=f"<|start_header_id|>system<|end_header_id|> You are a Medical AI chatbot assistant .<|eot_id|><|start_header_id|> User: <|end_header_id|>{question}<|eot_id|>"
# Tokenizing the input and generating the output
#prompt = f"{question}"
# Tokenizing the input and generating the output
inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
outputs = fine_tuned_model.generate(**inputs, max_new_tokens=256, use_cache=True)
answer = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]

In [None]:
answer

In [None]:
# Split the text at the first double newline after "assistant\n\n"
split_text = answer.split("\n\n", 2)[1]
# Print the extracted answer
print(split_text.strip())

```txt

The main cause of inflammatory CD4+ T cells is a topic of ongoing research and debate in the medical field. However, based on current scientific understanding, it is believed that the main cause of inflammatory CD4+ T cells is the activation of these cells by specific antigens or pathogens, leading to an excessive or uncontrolled immune response.

```

* `base_model`: specifies the pre-trained model to use as the base model for fine-tuning.
* `finetuned_model`: specifies the finetuned model to use for fine-tuning.
* `finetuned_name`: specifies the name of the finetuned model.
* `max_seq_length`: specifies the maximum sequence length that the model can process.
* `dtype`: specifies the data type to use for the model's weights and activations. `None` means auto-detection, which will choose the most suitable data type based on the hardware.
* `load_in_4bit`: specifies whether to load the model i 4-bit precision, which can reduce memory usage and improve performance.


* `r`: specifies the number of LoRA layers to use.
* `target_modules`: specifies the modules to which LoRA should be applied.
* `lora_alpha`: specifies the alpha value for LoRA, which controls the strength of the LoRA layers.
* `lora_dropout`: specifies the dropout value for LoRA, which controls the random dropping of neurons during training.
* `bias`: specifies the bias for LoRA, which can be set to "none" or a specific value.
* `use_gradient_checkpointing`: specifies whether to use gradient checkpointing, which can reduce memory usage during training.
* `use_rslora` and `use_dora`: specify whether to use RSLora and DoRa, respectively, which are variants of LoRA.
* `loftq_config`: specifies the LoFTQ configuration, which is not used in this example.


**Training Configuration**

* `per_device_train_batch_size`: specifies the batch size to use for training.
* `gradient_accumulation_steps`: specifies the number of steps to accumulate gradients before updating the model.
* `warmup_steps`: specifies the number of warmup steps to perform before starting training.
* `max_steps`: specifies the maximum number of steps to train for. If set to 0, the model will train for the specified number of epochs.
* `num_train_epochs`: specifies the number of epochs to train for.
* `learning_rate`: specifies the initial learning rate to use for training.
* `fp16` and `bf16`: specify whether to use 16-bit floating-point precision (fp16) or 16-bit bfloat precision (bf16) for training.
* `logging_steps`: specifies the number of steps to log training metrics.
* `optim`: specifies the optimizer to use for training.
* `weight_decay`: specifies the weight decay rate to use for regularization.
* `lr_scheduler_type`: specifies the learning rate scheduler to use.
* `seed`: specifies the random seed to use for training.
* `output_dir`: specifies the output directory to save training results.

**Hugging Face Username**

**`training_dataset`**: This is the top-level key for the dataset configuration.

**`name`**: This specifies the name of the dataset. In this case, it's `ruslanmv/ai-medical-dataset`, which is a dataset hosted on the Hugging Face Hub. The format is `username/dataset_name`.

**`split`**: This specifies the split of the dataset to use for training. In this case, it's set to `"train"`, which means the model will be trained on the training split of the dataset.

**`input_fields`**: This specifies the input fields of the dataset that will be used for trainine, it's a list containing two fields: `"question"` and `"context"`. These fields are likely to be the input features of the dataset.

**`input_field`**: This specifies the primary input field of the dataset. In this case, it's set to `"text"`. This field is likely to be the text input that the model will process.

Here's an example of what this dataset might look like:

| question | context | text |
| --- | --- | --- |
| How does COVID-19 spread? | COVID-19 is a respiratory disease... | The COVID-19 is.. |
| ... | ... | ... |

To save the final model as LoRA adapters, either use Huggingface's push_to_hub for an online save or save_pretrained for a local save.

[NOTE] This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!