In [1]:
## Ambiente configurado para treinamento local em um PC com Placa de Vídeo Nvidia RTX-3060 12GB

## Utilizando miniconda, instalado em um Linux Ubuntu conforme orientações do link: https://docs.anaconda.com/miniconda/
## Utilizando miniconda para criação do ambiente do unsloth conforme orientação no link: https://docs.unsloth.ai/get-started/installation/conda-install

## >> Para configurar o ambiente, remova o comentário ("##") e execute os comandos. Lembre-se de instalar o miniconda previamente

#!pip install nbformat
#!conda install -c conda-forge ipywidgets
#!conda create --name unsloth_env python=3.10 pytorch-cuda=12.1 pytorch cudatoolkit xformers -c pytorch -c nvidia -c xformers -y
#!conda activate unsloth_env
#!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
#!pip install --no-deps "trl<0.9.0" peft accelerate bitsandbytes

In [2]:
import helper
import torch; 

import datasets

from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

print(torch.__version__)
print(torch.version.cuda)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
2.4.1
12.1


In [3]:
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

In [4]:
model_name, raw_model, tokenizer = helper.get_model_by_id(1, max_seq_length, dtype, load_in_4bit)  ## "unsloth/Meta-Llama-3.1-8B-bnb-4bit",      # Llama-3.1 15 trillion tokens model 2x faster!

Id Model: 1 - Model Name: unsloth/tinyllama
==((====))==  Unsloth 2024.9: Fast Llama patching. Transformers = 4.44.2.
   \\   /|    GPU: NVIDIA GeForce RTX 3060. Max memory: 11.65 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.4.1. CUDA = 8.6. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth


In [5]:
EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func_train(examples):        
    inputs       = examples['title']
    outputs      = examples['content']
    texts = []
    #for instruction, input, output in zip(instructions, inputs, outputs):
    for input, output in zip(inputs, outputs):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = helper.alpaca_prompt.format(input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }

In [6]:
model = helper.get_fast_language_model(raw_model)

Unsloth 2024.9 patched 22 layers with 22 QKV layers, 22 O layers and 22 MLP layers.


In [7]:
dataset = datasets.Dataset.from_csv('../data/trn_sample.csv', sep=';')
dataset = dataset.map(formatting_prompts_func_train, batched = True,)
dataset

Dataset({
    features: ['uid', 'title', 'content', 'text'],
    num_rows: 1000
})

In [8]:
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 8,
        gradient_accumulation_steps = 2,
        warmup_steps = 10,
        num_train_epochs = 1, # Set this for 1 full training run.
        #max_steps = 60,
        #learning_rate = 2e-4,
        learning_rate = 3e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
    ),
)

Map (num_proc=2):   0%|          | 0/1000 [00:00<?, ? examples/s]

In [9]:
start_gpu_memory, max_memory = helper.print_start_memory_usage()

GPU = NVIDIA GeForce RTX 3060. Max memory = 11.65 GB.
0.787 GB of memory reserved.


In [10]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 1,000 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 8 | Gradient Accumulation steps = 2
\        /    Total batch size = 16 | Total steps = 62
 "-____-"     Number of trainable parameters = 12,615,680


  0%|          | 0/62 [00:00<?, ?it/s]

{'loss': 3.6714, 'grad_norm': 16.946605682373047, 'learning_rate': 2.9999999999999997e-05, 'epoch': 0.02}
{'loss': 3.4999, 'grad_norm': 15.496505737304688, 'learning_rate': 5.9999999999999995e-05, 'epoch': 0.03}
{'loss': 3.4363, 'grad_norm': 8.58876895904541, 'learning_rate': 8.999999999999999e-05, 'epoch': 0.05}
{'loss': 3.5097, 'grad_norm': 5.471931457519531, 'learning_rate': 0.00011999999999999999, 'epoch': 0.06}
{'loss': 3.0741, 'grad_norm': 3.6497642993927, 'learning_rate': 0.00015, 'epoch': 0.08}
{'loss': 2.8615, 'grad_norm': 6.896556377410889, 'learning_rate': 0.00017999999999999998, 'epoch': 0.1}
{'loss': 2.7431, 'grad_norm': 7.148976802825928, 'learning_rate': 0.00020999999999999998, 'epoch': 0.11}
{'loss': 2.5746, 'grad_norm': 3.7633118629455566, 'learning_rate': 0.00023999999999999998, 'epoch': 0.13}
{'loss': 2.3007, 'grad_norm': 11.939837455749512, 'learning_rate': 0.00027, 'epoch': 0.14}
{'loss': 2.1797, 'grad_norm': 4.596930027008057, 'learning_rate': 0.0003, 'epoch': 0.1

In [11]:
helper.print_final_memory_usage(start_gpu_memory, max_memory, trainer_stats)

175.9372 seconds used for training.
2.93 minutes used for training.
Peak reserved memory = 3.619 GB.
Peak reserved memory for training = 2.832 GB.
Peak reserved memory % of max memory = 31.064 %.
Peak reserved memory for training % of max memory = 24.309 %.


In [12]:
# Teste do modelo depois do treinamento

df = dataset.to_pandas().sample(frac=1).head(5).copy()
for _, row in df.iterrows():
  title = row['title']
  print(f"Resultado da predição para o título: [{title}]\n")
  helper.predict_text_streamer(model, tokenizer, title)
  

Resultado da predição para o título: [The Gentle Birth Method The Monthbymonth Jeyarani Way Programme]

<s> Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Based on title of a product, get the real description for the follow product.

### Input:
The Gentle Birth Method The Monthbymonth Jeyarani Way Programme

### Response:
The Gentle Birth Method is a comprehensive programme for women who want to have a gentle birth. It is based on the principles of the ancient Indian Ayurvedic system of healthcare. The programme is designed to help women prepare for labour and to give them the knowledge and confidence to have a gentle birth.</s>
Resultado da predição para o título: [Farther Shores Exploring How NearDeath Kundalini and Mystical Experiences Can Transform Ordinary Lives]

<s> Below is an instruction that describes a task, paired with an input that provides furth

In [13]:
model.save_pretrained('tinyllama') # Local saving
tokenizer.save_pretrained('tinyllama')

('tinyllama/tokenizer_config.json',
 'tinyllama/special_tokens_map.json',
 'tinyllama/tokenizer.model',
 'tinyllama/added_tokens.json',
 'tinyllama/tokenizer.json')