# IIC-3670 NLP UC

QLoRa paper: https://arxiv.org/abs/2305.14314


## Actividad en clase

Vamos a alinear un LLM con QLoRa.

- Descargue desde hugging face hub datasets el modelo base **mistralai/Mistral-7B-v0.1**.
- Cree un dataset de instrucciones para alinear el LLM usando el archivo **alpaca_data_cleaned_redux.json** disponible en el github de la asigntaura.
- Entrene usando el trainer de transformers a un epoch usando QLoRa.
- Guarde el modelo en su cuenta de hugging face.
- Usando el módulo pipeline de transformers, cree un pipe llamado **text_generator**.
- Usando su pipeline, ejecute el prompt: text_generator("What is the capital of Chile?").
- Cuando termine, me avisa para entregarle una **L (logrado)**.
- Recuerde que las L otorgan un bono en la nota final de la asignatura.


***Tiene hasta el final de la clase.***

In [1]:
import transformers

print(transformers.__version__)

4.34.0


In [2]:
import datasets

print(datasets.__version__)

2.12.0


In [3]:
from transformers import BitsAndBytesConfig
import torch

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)


In [4]:
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "mistralai/Mistral-7B-v0.1"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={"":0})

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.



Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /home/marcelo/.local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda120.so
CUDA SETUP: CUDA runtime path found: /usr/local/cuda-12.0/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 120
CUDA SETUP: Loading binary /home/marcelo/.local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda120.so...


  warn(msg)
2024-05-21 17:53:56.618189: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [5]:
from peft import prepare_model_for_kbit_training

model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)

In [6]:
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

In [7]:
from peft import LoraConfig, get_peft_model

config = LoraConfig(
    r=4, 
    lora_alpha=16, 
    target_modules=["q_proj", "v_proj"], 
    lora_dropout=0.05, 
    bias="none", 
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)
print_trainable_parameters(model)

trainable params: 1703936 || all params: 3753775104 || trainable%: 0.04539259686027264


In [8]:
def generate_prompt(data_point):
    # taken from https://github.com/tloen/alpaca-lora
    if data_point["input"]:
        return f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{data_point["instruction"]}

### Input:
{data_point["input"]}

### Response:
{data_point["output"]}"""
    else:
        return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{data_point["instruction"]}

### Response:
{data_point["output"]}"""


In [9]:
from datasets import load_dataset

data = load_dataset("json", data_files="./AlpacaDataCleaned/alpaca_data_cleaned_redux.json")

Found cached dataset json (/home/marcelo/.cache/huggingface/datasets/json/default-d84f91b947aba4c5/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4)


  0%|          | 0/1 [00:00<?, ?it/s]

In [10]:
CUTOFF_LEN = 256  

tokenizer.pad_token_id = 0  # unk. we want this to be different from the eos token

data = data.shuffle().map(
    lambda data_point: tokenizer(
        generate_prompt(data_point),
        truncation=True,
        max_length=CUTOFF_LEN,
        padding="max_length",
    )
)

Map:   0%|          | 0/1016 [00:00<?, ? examples/s]

In [11]:
data

DatasetDict({
    train: Dataset({
        features: ['instruction', 'input', 'output', 'input_ids', 'attention_mask'],
        num_rows: 1016
    })
})

In [13]:
tokenizer.pad_token = tokenizer.eos_token

trainer = transformers.Trainer(
    model=model,
    train_dataset=data["train"],
    args=transformers.TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_steps=100,
        num_train_epochs=1,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=1,
        output_dir="outputs",
        optim="paged_adamw_8bit"
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)
model.config.use_cache = False
train_result = trainer.train()

You are using 8-bit optimizers with a version of `bitsandbytes` < 0.41.1. It is recommended to update your version as a major bug has been fixed in 8-bit optimizers.


Step,Training Loss
1,4.6835
2,4.2188
3,5.1456
4,5.648
5,4.4157
6,4.3319
7,4.4631
8,5.825
9,2.9799
10,3.1476


In [14]:
model.save_pretrained("mistral_7B-alpaca-redux-qlora")

In [15]:
model.push_to_hub("mmendoza/mistral_7B-alpaca-redux-qlora", use_auth_token=True)



adapter_model.bin:   0%|          | 0.00/6.86M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/mmendoza/mistral_7B-alpaca-redux-qlora/commit/617a74f34ee66ae4e818f782ee9f03a51e89518e', commit_message='Upload model', commit_description='', oid='617a74f34ee66ae4e818f782ee9f03a51e89518e', pr_url=None, pr_revision=None, pr_num=None)

In [16]:
trainer.save_model("mistral_7B-alpaca-redux-qlora.h5")

In [1]:
from transformers import pipeline

text_generator = pipeline(model="mmendoza/mistral_7B-alpaca-redux-qlora")

2024-05-21 18:22:09.777429: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.



Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /home/marcelo/.local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda120.so
CUDA SETUP: CUDA runtime path found: /usr/local/cuda-12.0/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 120
CUDA SETUP: Loading binary /home/marcelo/.local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda120.so...


  warn(msg)


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [2]:
text_generator("What is the capital of Chile?")

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[{'generated_text': 'What is the capital of Chile?\n\nSantiago\n\n## What is the capital of'}]

###
https://github.com/huggingface/alignment-handbook/