# Finetune LLaMa2 (7B) on timdettmers/openassistant-guanaco using 🤗 peft, trl, bitsandbytes, Flash Attention 2 & transformers\n

This notebook runs on top of the image built using this Dockerfile: [GitHub Link](https://github.com/huggingface/Google-Cloud-Containers/blob/main/containers/pytorch/training/gpu/2.1/transformers/4.38.1/py310/Dockerfile)

Using this image you don't need to install any packages, as all needed packages are already there.


### Prerequisite

In order to access the model weights, you have to accept the conditions to access its files and content on [HuggingFace Hub](https://huggingface.co/meta-llama/Llama-2-7b). Once accepted, you need to authenticate yourself in order to download model weights. You can use this from CLI:

```bash
huggingface-cli login
```
There are other ways too which can be found [here](https://huggingface.co/docs/huggingface_hub/en/quick-start#authentication)

## Import libraries and specify model to use

In [1]:
import torch
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, TrainingArguments, DataCollatorForLanguageModeling
from peft import LoraConfig
from trl import SFTTrainer

  from .autonotebook import tqdm as notebook_tqdm


[2024-02-28 17:06:21,151] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)


In [2]:
model_id = "meta-llama/Llama-2-7b-hf"

Repo card metadata block was not found. Setting CardData to empty.


## Load Dataset and Tokenizer

We will use `timdettmers/openassistant-guanaco` an open source dataset, which is a subset of the [Open Assistant dataset](https://huggingface.co/datasets/OpenAssistant/oasst1). This subset of the data only contains the highest-rated paths in the conversation tree, with a total of 9,846 samples.



###Human: eu quero que você atue como um terminal linux. Vou digitar comandos e você vai responder com o que o terminal deve mostrar. Quero que você responda apenas com a saída do terminal dentro de um bloco de código exclusivo e nada mais. não escreva explicações. não digite comandos a menos que eu o instrua a fazê-lo. quando eu precisar te dizer algo em português, eu o farei colocando o texto dentro de colchetes {assim}. meu primeiro comando é pwd### Assistant: $ pwd /home/user




To load the dataset, we use the 🤗 Datasets library. One more thing to notice is that we do not need to format the dataset as the `SFTTrainer` from 🤗 [TRL](https://huggingface.co/docs/trl/en/sft_trainer) library supports it.

In [None]:
raw_dataset = load_dataset("timdettmers/openassistant-guanaco", split="train")

In [3]:
# Load Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token

## Finetune LLaMA 7B with QLoRA

In [4]:
# BitsAndBytes 4bit config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

In [5]:
model = AutoModelForCausalLM.from_pretrained(model_id, 
                                             quantization_config=bnb_config, 
                                             device_map="auto",
                                             attn_implementation="flash_attention_2"
                                            )

Loading checkpoint shards: 100%|█████████████████████████████████████| 2/2 [00:04<00:00,  2.23s/it]


In [6]:
# LoRA config
peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.05,
    r=8,
    bias="none",
    task_type="CAUSAL_LM", 
)

In [None]:
# Define training arguments
training_args = TrainingArguments(
    output_dir="output",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    num_train_epochs=3,
    logging_strategy="steps",
    logging_steps=20,
    bf16=True,
    optim="paged_adamw_8bit",
    
)

# Initialize our Trainer
trainer = SFTTrainer(
    model=model,
    peft_config=peft_config,
    args=training_args,
    dataset_text_field="text",
    packing=True,
    train_dataset=raw_dataset,
    tokenizer=tokenizer,
    data_collator=DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False),
)

# Train the model
trainer.train()

# save model
trainer.save_model()