<a href="https://colab.research.google.com/github/nazimboudeffa/hugging-face-trainer/blob/main/fine-tune-gpt-2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

1. Why Should We Fine-Tune Pre-Trained Models?

2. How Does Fine-Tuning Happen Using LoRA and What is Q-LoRA?

3. How Can We Fine-Tune Pre-Trained Models Using Only Open-Source Tools?


[accelerate](https://huggingface.co/docs/accelerate/en/index) : Hugging face libary to run raw pytorch training script on any kind of device

[transformers](https://huggingface.co/docs/transformers): Transformers provides APIs and tools to easily download and train state-of-the-art pretrained models.


In [None]:
!pip install -q accelerate transformers

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.9/72.9 kB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.5/92.5 MB[0m [31m9.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.4/77.4 kB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m471.6/471.6 kB[0m [31m29.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m9.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m39.9/39.9 MB[0m [31m16.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m11.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m14.8 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver 

In [None]:
import os
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    pipeline,
    logging,
    GPT2LMHeadModel,
    GPT2Tokenizer,
    Trainer
)

In [None]:
# The model that you want to train from the Hugging Face hub
model_name = "gpt-2"

# The instruction dataset to use
dataset_name = "sigmund-freud-a-general-introduction-to-psychoanalysis"

# Fine-tuned model name
new_model = "gpt-2-sigmund-freud-psychoanalysis"


# QLoRA parameters

In [None]:
# QLoRA parameters

# LoRA attention dimension
lora_r = 64

# Alpha parameter for LoRA scaling
lora_alpha = 16

# Dropout probability for LoRA layers
lora_dropout = 0.1


# bitsandbytes parameters

In [None]:
# Activate 4-bit precision base model loading
use_4bit = True

# Compute dtype for 4-bit base models
bnb_4bit_compute_dtype = "float16"

# Quantization type (fp4 or nf4)
bnb_4bit_quant_type = "nf4"

# Activate nested quantization for 4-bit base models (double quantization)
use_nested_quant = False

# TrainingArguments parameters


In [None]:

# Output directory where the model predictions and checkpoints will be stored
output_dir = "./results"

# Number of training epochs
num_train_epochs = 1

# Enable fp16/bf16 training (set bf16 to True with an A100)
fp16 = False
bf16 = False

# Batch size per GPU for training
per_device_train_batch_size = 4

# Batch size per GPU for evaluation
per_device_eval_batch_size = 4

# Number of update steps to accumulate the gradients for
gradient_accumulation_steps = 1

# Enable gradient checkpointing
gradient_checkpointing = True

# Maximum gradient normal (gradient clipping)
max_grad_norm = 0.3

# Initial learning rate (AdamW optimizer)
learning_rate = 2e-4

# Weight decay to apply to all layers except bias/LayerNorm weights
weight_decay = 0.001

# Optimizer to use
optim = "paged_adamw_32bit"

# Learning rate schedule
lr_scheduler_type = "cosine"

# Number of training steps (overrides num_train_epochs)
max_steps = -1

# Ratio of steps for a linear warmup (from 0 to learning rate)
warmup_ratio = 0.03

# Group sequences into batches with same length
# Saves memory and speeds up training considerably
group_by_length = True

# Save checkpoint every X updates steps
save_steps = 0

# Log every X updates steps
logging_steps = 25


# SFT parameters

In [None]:
# Maximum sequence length to use
max_seq_length = None

# Pack multiple short examples in the same input sequence to increase efficiency
packing = False

# Load the entire model on the GPU 0
device_map = {"": 0}

In [None]:
# Load dataset (you can process it here)
dataset = load_dataset('text', data_files={'train': 'sigmund-freud-a-general-introduction-to-psychoanalysis.txt'})

# Load tokenizer and model with QLoRA configuration
compute_dtype = getattr(torch, bnb_4bit_compute_dtype)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=use_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=use_nested_quant,
)

# Check GPU compatibility with bfloat16
if compute_dtype == torch.float16 and use_4bit:
    major, _ = torch.cuda.get_device_capability()
    if major >= 8:
        print("=" * 80)
        print("Your GPU supports bfloat16: accelerate training with bf16=True")
        print("=" * 80)

Generating train split: 0 examples [00:00, ? examples/s]

In [None]:
# Set training parameters
training_arguments = TrainingArguments(
    output_dir="./results",  # Spécifie le chemin vers le répertoire d'enregistrement
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    num_train_epochs=3,
    weight_decay=0.01,
    save_steps=10_000,  # Fréquence d'enregistrement des checkpoints (ajuste selon tes besoins)
    save_total_limit=2,  # Nombre maximum de checkpoints à sauvegarder (pour éviter de remplir trop d’espace)
)


In [None]:
# Charger le modèle et le tokenizer GPT-2
model = GPT2LMHeadModel.from_pretrained('gpt2')
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')

# Assigner un pad_token
tokenizer.pad_token = tokenizer.eos_token  # Utiliser le token de fin de séquence comme pad token

# Fonction de préparation du dataset (tokenisation + création des labels)
def preprocess_function(examples):
    # Tokenisation du texte
    inputs = tokenizer(examples['text'], truncation=True, padding='max_length', max_length=24)

    # Créer les labels en copiant les input_ids
    inputs['labels'] = inputs['input_ids'].copy()

    # Remplacer les tokens de padding par -100 dans les labels (ce qui les ignore lors du calcul de la perte)
    inputs['labels'] = [[-100 if token == tokenizer.pad_token_id else token for token in labels] for labels in inputs['labels']]

    return inputs

# Appliquer la tokenisation et le prétraitement à tout le dataset
tokenized_datasets = dataset.map(preprocess_function, batched=True)

# Afficher un échantillon pour vérifier
print(tokenized_datasets['train'][0])



Map:   0%|          | 0/18592 [00:00<?, ? examples/s]

{'text': 'PREFACE ', 'input_ids': [47, 31688, 11598, 220, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256], 'attention_mask': [1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'labels': [47, 31688, 11598, 220, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100]}


#Tranining

In [None]:
# Trainer
trainer = Trainer(
    model=model,
    args=training_arguments,
    train_dataset=tokenized_datasets['train'],
)

# Fine-tuning du modèle
trainer.train()

{'loss': 2.7995, 'grad_norm': 11.526394844055176, 'learning_rate': 4.91035570854848e-05, 'epoch': 0.05378657487091222}
{'loss': 2.6039, 'grad_norm': 8.008880615234375, 'learning_rate': 4.8207114170969595e-05, 'epoch': 0.10757314974182444}
{'loss': 2.5647, 'grad_norm': 5.739560604095459, 'learning_rate': 4.731067125645439e-05, 'epoch': 0.16135972461273665}
{'loss': 2.5622, 'grad_norm': 11.669812202453613, 'learning_rate': 4.641422834193919e-05, 'epoch': 0.21514629948364888}
{'loss': 2.4753, 'grad_norm': 5.931076526641846, 'learning_rate': 4.5517785427423984e-05, 'epoch': 0.2689328743545611}
{'loss': 2.511, 'grad_norm': 6.884527683258057, 'learning_rate': 4.462134251290878e-05, 'epoch': 0.3227194492254733}
{'loss': 2.4668, 'grad_norm': 8.08081340789795, 'learning_rate': 4.372489959839358e-05, 'epoch': 0.37650602409638556}
{'loss': 2.4538, 'grad_norm': 8.84925365447998, 'learning_rate': 4.2828456683878374e-05, 'epoch': 0.43029259896729777}
{'loss': 2.4593, 'grad_norm': 5.467613697052002, 

KeyboardInterrupt: 

In [None]:
# Save trained model
trainer.model.save_pretrained(new_model)

#Resutls

In [None]:
 %load_ext tensorboard
 %tensorboard --logdir results/runs

In [None]:
# Ignore warnings
logging.set_verbosity(logging.CRITICAL)

# Run text generation pipeline with our next model
prompt = "What is the definition of psychoanalysis?"
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
result = pipe(f"<s>[INST] {prompt} [/INST]")
print(result[0]['generated_text'])

<s>[INST] What is the definition of psychoanalysis? [/INST] In the first definition of psychoanalysis for the mind? [/INST] What is the definition of the definition of "psycho-psycho"? [-][INST] What is the definition of psychoanalysis? [INST]

The definition of psychoanalysis is one that is not in the normal way.

The other definitions of psychoanalysis are

Psychorades?[/INST] What is the definition of the definition of psychoanalysis? [/INST] [INST] What is the definition of psychoanalysis of the self in it's sense of 'psycho-psycho'?[/INST>

Is it possible that there are no specific definitions of mind that are in the standard form?

No, the definitions of psycho-psycho (orades in some form) have been shown to be more or less vague than the standard form. However, when it is understood that there are certain definitions


In [None]:
# Empty VRAM
del model
del pipe
del trainer
import gc
gc.collect()
gc.collect()

20696

In [None]:
# Reload model in FP16 and merge it with LoRA weights
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map=device_map,
)
model = PeftModel.from_pretrained(base_model, new_model)
model = model.merge_and_unload()

# Reload tokenizer to save it
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]



#Upload

In [None]:
# Pour le télécharger sur Hugging Face
from huggingface_hub import login
login(token="")

model.push_to_hub(new_model, use_temp_dir=False)
tokenizer.push_to_hub(new_model, use_temp_dir=False)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


HfHubHTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/api/repos/create (Request ID: Root=1-66f97c38-6bdaecc37e27d7fc712fe4d3;5ebe443b-e691-43bf-b884-2caccd864d9d)

Invalid username or password.