# Setup

Problem statement : Fine tuning Llama 2 7b

Installing required libraries : accelerate, peft, transformers (from hugging face), datasets (french dataset), trl (instruction fine tuning)
bitsandbytes for quantizing the model into 4bit
einops to load the model


https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/


https://huggingface.co/docs/transformers/main/model_doc/llama2

In [10]:
!pip install -q -U  trl transformers accelerate git+https://github.com/huggingface/peft.git
!pip install -q datasets bitsandbytes einops wandb

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


In [11]:
from datasets import  load_dataset

dataset_name = 'AlexanderDoria/novel17_test' #french novels
dataset = load_dataset(dataset_name, split = 'train')



# Loading the model

In [12]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, AutoTokenizer

In [13]:
model_name = 'TinyPixel/Llama-2-7B-bf16-sharded'

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type = "nf4",
    bnb_4bit_compute_dtype = torch.float16,
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config = bnb_config,
    trust_remote_code = True
)

model.config.use_cache = False

Loading checkpoint shards:   0%|          | 0/14 [00:00<?, ?it/s]

# tokenizer

In [14]:
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code = True)
tokenizer.pad_token = tokenizer.eos_token

In [15]:
from peft import LoraConfig, get_peft_model

lora_alpha = 16
lora_dropout = 0.1
lora_r = 64

In [16]:
peft_config = LoraConfig(
    lora_alpha = lora_alpha,
    lora_dropout = lora_dropout,
    r = lora_r,
    bias = "none",
    task_type = "CAUSAL_LM"
)

# Loading the trainer


In [18]:
from transformers import TrainingArguments

output_dir = "./results"
per_device_train_batch_size= 4
gradient_accumulation_steps = 4
optim = "paged_adamw_32bit"
save_steps = 100
logging_steps = 10
learning_rate = 2e-4
max_grad_norm = 0.3
max_steps = 100
warmup_ratio = 0.03
lr_scheduler_type = "constant"

training_arguments = TrainingArguments(
    output_dir = output_dir,
    per_device_train_batch_size = per_device_train_batch_size,
    gradient_accumulation_steps = gradient_accumulation_steps,
    optim = optim,
    save_steps = save_steps,
    logging_steps = logging_steps,
    learning_rate = learning_rate,
    fp16 = True,
    max_grad_norm = max_grad_norm,
    max_steps = max_steps,
    warmup_ratio = warmup_ratio,
    group_by_length = True,
    lr_scheduler_type = lr_scheduler_type,
)


In [20]:
from trl import SFTTrainer

max_seq_length = 512

trainer = SFTTrainer(
    model= model,
    train_dataset = dataset,
    peft_config = peft_config,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    tokenizer = tokenizer,
    args = training_arguments,
)



Map:   0%|          | 0/1 [00:00<?, ? examples/s]

In [21]:
#preprocess the model by upcasting the layer norms in float 32 for stable training

for name, module in trainer.model.named_modules():
  if "norm" in name:
    module = module.to(torch.float32)

In [22]:
trainer.train()

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
10,0.87
20,0.4541
30,0.0838
40,0.0151
50,0.011
60,0.0103
70,0.0096
80,0.0078
90,0.0008
100,0.0002


TrainOutput(global_step=100, training_loss=0.14627586972084827, metrics={'train_runtime': 86.5837, 'train_samples_per_second': 18.479, 'train_steps_per_second': 1.155, 'total_flos': 618646376448000.0, 'train_loss': 0.14627586972084827, 'epoch': 100.0})

In [23]:
model_to_save = trainer.model.module if hasattr(trainer.model, 'module') else trainer.model
model_to_save.save_pretrained("outputs")

In [24]:
lora_config = LoraConfig.from_pretrained("outputs")
model = get_peft_model(model, lora_config)

In [25]:
dataset['text']

["### Human: Écrire un texte dans un style baroque, utilisant le langage et la syntaxe du 17ème siècle, mettant en scène un échange entre un prêtre et un jeune homme confus au sujet de ses péchés.### Assistant: Si j'en luis éton. né ou empêché ce n'eſt pas ſans cauſe vů que ſouvent les liommes ne ſçaventque dire non plus que celui de tantôt qui ne ſçavoit rien faire que des civiéresVALDEN: Jefusbien einpêché confeſſant un jour un jeune Breton Vallonqui enfin de confeſſion me dit qu'il avoit beſongné une civiere . Quoilui dis je mon amice peché n'eſt point écrit au livre Angeli que d'enfernommé la ſommedes pechez ,qui eſt le livre le plus déteſtable qui fut jamais fait& le plus blafphematoire d'autant qu'il eſt dédié à la plus femme de bien je ne ſçai quelle penitence te donner ; mais non mon amiquel goûty prenois-tu ? Mon fieur bon & delectable. Quoi!"]

In [26]:
text = "Écrire un texte dans un style baroque sur la glace et le feu ### Assistant: Si j'en luis éton"
device = "cuda:0"

inputs = tokenizer(text, return_tensors = "pt").to(device)
outputs = model.generate(**inputs, max_new_tokens = 50)
print(tokenizer.decode(outputs[0], skip_special_tokens = True))



Écrire un texte dans un style baroque sur la glace et le feu ### Assistant: Si j'en luis étonné, si j'en brûle, si j'en frissonne, si j'en tremble, si j'en frémis, si j'en frissonne, si j'en frissonne, si


In [27]:
from huggingface_hub import login

login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…