### Notes: If ImportError occurs, it's probably due to the huggingface-hub.
> pip install huggingface-hub==0.25.0


### Reference: https://medium.com/@hakeemsyd/how-to-fine-tune-your-llama-3-2-model-49a6f8c7621a


### https://www.datacamp.com/tutorial/fine-tuning-llama-3-2

In [1]:
!pip install transformers
!pip install langchain
!pip install bitsandbytes
!pip install huggingface-hub==0.25.0
!pip install langchain-community trl

Collecting bitsandbytes
  Downloading bitsandbytes-0.45.0-py3-none-manylinux_2_24_x86_64.whl.metadata (2.9 kB)
Downloading bitsandbytes-0.45.0-py3-none-manylinux_2_24_x86_64.whl (69.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m69.1/69.1 MB[0m [31m26.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: bitsandbytes
Successfully installed bitsandbytes-0.45.0
Collecting huggingface-hub==0.25.0
  Downloading huggingface_hub-0.25.0-py3-none-any.whl.metadata (13 kB)
Downloading huggingface_hub-0.25.0-py3-none-any.whl (436 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m436.4/436.4 kB[0m [31m8.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: huggingface-hub
  Attempting uninstall: huggingface-hub
    Found existing installation: huggingface-hub 0.26.3
    Uninstalling huggingface-hub-0.26.3:
      Successfully uninstalled huggingface-hub-0.26.3
Successfully installed huggingface-hub-0.25.0
Collecting langchain-co

## Import

In [111]:
import torch
from transformers import pipeline
from transformers import AutoTokenizer, TextStreamer, pipeline, BitsAndBytesConfig, AutoModelForCausalLM,HfArgumentParser
from transformers import Trainer, TrainingArguments, TextStreamer, logging

from peft import LoraConfig,PeftModel,prepare_model_for_kbit_training,get_peft_model
from langchain import HuggingFacePipeline, PromptTemplate
from langchain.chains import RetrievalQA
from langchain.document_loaders import PyPDFDirectoryLoader, DirectoryLoader
from langchain.embeddings import HuggingFaceInstructEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from trl import SFTTrainer, setup_chat_format
from datasets import Dataset, load_dataset
from huggingface_hub import login

import os
import re, json

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
HUGGING_FACE_TOKEN = os.environ.get('HUGGING_FACE_TOKEN') #in terminal: export HUGGING_FACE_TOKEN="YOUR_TOKEN"
#os.environ["WANDB_DISABLED"] = "True"


In [112]:
!huggingface-cli login #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



    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
    Setting a new token will erase the existing one.
    To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): Traceback (most recent call last):
  File "/usr/lib/python3.10/getpass.

## Load Model

In [113]:
# Set torch dtype and attention implementation
if torch.cuda.get_device_capability()[0] >= 8:
    !pip install -qqq flash-attn
    torch_dtype = torch.bfloat16
    attn_implementation = "flash_attention_2"
else:
    torch_dtype = torch.float16
    attn_implementation = "eager"

In [114]:
'''
Possible Models:
- meta-llama/Llama-3.2-1B-Instruct
- meta-llama/Llama-3.2-3B-Instruct
- meta-llama/Llama-3.2-11B-Vision-Instruct
'''
model_id = "meta-llama/Llama-3.2-1B-Instruct"

# Quantize your model dtype (for sparsity)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch_dtype,
    bnb_4bit_use_double_quant=True,
)

# Set token using ENV variable
tokenizer = AutoTokenizer.from_pretrained(model_id, token=HUGGING_FACE_TOKEN)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    token=HUGGING_FACE_TOKEN,
    quantization_config=bnb_config,
)

if tokenizer.pad_token is None:
    tokenizer.add_special_tokens({'pad_token': '<|pad|>'})
    model.resize_token_embeddings(len(tokenizer))

`low_cpu_mem_usage` was None, now default to True since model is quantized.


In [115]:
print(repr(tokenizer.pad_token)) ## None
print(repr(tokenizer.bos_token)) ## ''
print(repr(tokenizer.eos_token)) ## ''

'<|pad|>'
'<|begin_of_text|>'
'<|eot_id|>'


# Prepare Data

In [18]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)


Mounted at /content/drive


In [21]:
!cp /content/drive/MyDrive/data/finetune.jsonl finetune.jsonl
!cp /content/drive/MyDrive/data/finetune_new.jsonl finetune_new.jsonl
!cp /content/drive/MyDrive/data/output.json output.json

# Load Data

In [79]:
dataset = load_dataset("json", data_files = "output.json", split = 'train')
dataset = dataset.shuffle(seed=65).select(range(445))

In [80]:
dataset= dataset.rename_column("0", "system")
dataset= dataset.rename_column("1", "user")
dataset= dataset.rename_column("2", "assistant")

In [104]:
dataset['system'][0]

{'content': 'You are an assistant', 'role': 'system'}

In [105]:
dataset['user'][0]

{'content': 'What is the name of the BaseObstacleCritic in the DWB Controller?',
 'role': 'user'}

In [106]:
#def tokenize_function(examples):
#    return tokenizer(examples['text'], truncation=True, max_length=512, padding= "longest")
#tokenized_datasets = dataset.map(tokenize_function, batched=True)
'''
def format_chat_template(row):
    row["text"] = tokenizer.apply_chat_template(row, tokenize=True, truncation=True, max_length=512, padding= "longest")
    return row

dataset = dataset.map(
    format_chat_template,
    num_proc= 4,
    batched=True
)
'''

def format_chat_template(row):

    row_json = [{"role": "system", "content": row["system"]['content'] },
               {"role": "user", "content": row["user"]['content']},
               {"role": "assistant", "content": row["assistant"]['content']}]

    row["text"] = tokenizer.apply_chat_template(row_json, tokenize=False)
    return row

dataset = dataset.map(
    format_chat_template,
    num_proc= 4,
)

Map (num_proc=4):   0%|          | 0/445 [00:00<?, ? examples/s]

In [107]:
dataset['text'][0]

'<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 07 Dec 2024\n\nYou are an assistant<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nWhat is the name of the BaseObstacleCritic in the DWB Controller?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nTitle: BaseObstacleCritic\uf0c1 URL:  Section: getting_started/index.html -------------------------------------------------------------------------------- ## Parameters\uf0c1 <dwbplugin>: DWB plugin name defined in thecontroller_plugin_idsparameter inController Server. <name>: BaseObstacleCritic critic name defined in the<dwb plugin>.criticsparameter defined inDWB Controller.<|eot_id|>'

# Train

In [118]:
import bitsandbytes as bnb

def find_all_linear_names(model):
    cls = bnb.nn.Linear4bit
    lora_module_names = set()
    for name, module in model.named_modules():
        if isinstance(module, cls):
            names = name.split('.')
            lora_module_names.add(names[0] if len(names) == 1 else names[-1])
    if 'lm_head' in lora_module_names:  # needed for 16 bit
        lora_module_names.remove('lm_head')
    return list(lora_module_names)

modules = find_all_linear_names(model)

In [119]:
modules

['q_proj', 'up_proj', 'v_proj', 'gate_proj', 'k_proj', 'down_proj', 'o_proj']

In [120]:
# LoRA config
peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=modules
)
model = get_peft_model(model, peft_config)

In [121]:
training_args= TrainingArguments(
    #output_dir= "./results",
    output_dir= "/content/drive/MyDrive/data/output",
    per_device_train_batch_size= 4,
    per_device_eval_batch_size= 4,
    gradient_accumulation_steps=2, #
    optim="paged_adamw_32bit",
    num_train_epochs= 3,
    learning_rate= 2e-5,
    logging_dir="/content/drive/MyDrive/data/logs",
    logging_steps=10,
    report_to="none"
)


trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_config,
    max_seq_length= 512,
    dataset_text_field="text",
    tokenizer=tokenizer,
    args= training_args,
    packing= False,
)

trainer.train()


Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.


Map:   0%|          | 0/445 [00:00<?, ? examples/s]

Step,Training Loss
10,3.4527
20,3.3495
30,3.0508
40,2.8873
50,2.8757
60,2.7785
70,2.8478
80,2.5841
90,2.5372
100,2.7014




TrainOutput(global_step=168, training_loss=2.764658507846651, metrics={'train_runtime': 95.6643, 'train_samples_per_second': 13.955, 'train_steps_per_second': 1.756, 'total_flos': 3946351847768064.0, 'train_loss': 2.764658507846651, 'epoch': 3.0})

In [127]:
messages = [{"role": "system", "content": "You are a helpful friend"},
    {"role": "user", "content": "What is the size of each gradient period in the costmap"}]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = tokenizer(prompt, return_tensors='pt', padding=True, truncation=True).to("cuda")

outputs = model.generate(**inputs, max_new_tokens=150, num_return_sequences=1)

text = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(text.split("assistant")[1])

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.




Title: Costmap Gradient Periods URL:  Section: getting_started/index.html -------------------------------------------------------------------------------- ## Gradient Periods Gradient period refers to the time interval between two consecutive updates of the costmap. The costmap is updated at regular intervals to reflect the changes in the environment. The costmap is updated in two stages: the costmap is updated every 5 seconds, and the costmap is updated every 10 seconds. The costmap is updated at a minimum interval of 5 seconds. The costmap is updated at a maximum interval of 10 seconds. The costmap is updated at a minimum interval of 5 seconds. The costmap is updated at a maximum interval of 10 seconds. The costmap is updated at a
