<a href="https://colab.research.google.com/github/regraded0101/cgm-remote-monitor/blob/master/unsloth_finetune_llama3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install wandb
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps "xformers<0.0.27" "trl<0.9.0" peft accelerate bitsandbytes

Collecting unsloth@ git+https://github.com/unslothai/unsloth.git (from unsloth[colab-new]@ git+https://github.com/unslothai/unsloth.git)
  Cloning https://github.com/unslothai/unsloth.git to /tmp/pip-install-7shw1ipq/unsloth_cb8c1ef3e08e42f5b8644d87e1cfd9c5
  Running command git clone --filter=blob:none --quiet https://github.com/unslothai/unsloth.git /tmp/pip-install-7shw1ipq/unsloth_cb8c1ef3e08e42f5b8644d87e1cfd9c5
  Resolved https://github.com/unslothai/unsloth.git to commit a4ab920de9282602d587a40df828674bfa9d650e
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


In [2]:
from getpass import getpass
hf_token = getpass("Enter your Hugging Face token: ")

Enter your Hugging Face token: ··········


In [3]:
import wandb
import os
wandb.login()
os.environ["WANDB_PROJECT"] = "llama-3-finetune-diarisation"  # name your W&B project
os.environ["WANDB_LOG_MODEL"] = "checkpoint"  # log all model checkpoints


[34m[1mwandb[0m: Currently logged in as: [33mjonfullertravtus[0m ([33mjonfullertravtus-travtus[0m). Use [1m`wandb login --relogin`[0m to force relogin


In [4]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 10000 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Meta-Llama-3.1-8B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    token=hf_token
    )

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
==((====))==  Unsloth 2024.8: Fast Llama patching. Transformers = 4.44.0.
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.3.1+cu121. CUDA = 7.5. CUDA Toolkit = 12.1.
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.26.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


In [5]:

model = FastLanguageModel.get_peft_model(
    model,
    r = 8, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth 2024.8 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


In [6]:
prompt = """Your job is to take a call transcript and provide a diarized transcript of the recording.

### Input:
{}

### Response:
{}

"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    inputs  = examples['call_transcript']
    outputs = examples['diarized_transcript']

    texts = [prompt.format(input_text, output_text) + EOS_TOKEN for input_text, output_text in zip(inputs, outputs)]
    return { "text" : texts}


    return inputs, outputs
from datasets import load_dataset
data = load_dataset("json", data_files="transcriptions_processed.json", split="train")
data = data.map(formatting_prompts_func, batched=True)



In [7]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = data,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        # num_train_epochs = 1, # Set this for 1 full training run.
        max_steps = 60,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "wandb",
    ),
)

max_steps is given, it will override any value given in num_train_epochs


In [8]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 596 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 20,971,520


Step,Training Loss
1,1.4992
2,1.3427
3,1.524
4,1.5054
5,1.3226
6,1.6361
7,1.8624
8,1.187
9,1.6317
10,1.3836


[34m[1mwandb[0m: Adding directory to artifact (./outputs/checkpoint-60)... Done. 5.0s
max_steps is given, it will override any value given in num_train_epochs


In [10]:
FastLanguageModel.for_inference(model) # Enable native 2x faster inference


PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): LlamaForCausalLM(
      (model): LlamaModel(
        (embed_tokens): Embedding(128256, 4096)
        (layers): ModuleList(
          (0-31): 32 x LlamaDecoderLayer(
            (self_attn): LlamaAttention(
              (q_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (k_proj): lora.Linear4bit(
        

In [None]:
inputs = tokenizer(
    prompt.format(


In [27]:
inputs = tokenizer(
    prompt.format(
        data["call_transcript"][0]["text"],
        ""
    ),
    return_tensors="pt",
).to("cuda")

In [29]:
outputs = model.generate(**inputs, use_cache=True)

In [31]:
tokenizer.decode(outputs[0])

"<|begin_of_text|>Your job is to take a call transcript and provide a diarized transcript of the recording.\n\n### Input:\n Hello? Hello. Hey, Chiquela, it's Carly. I just wanted to follow up with you. I emailed earlier, and I just wanted to see what was going on. Well, we can counsel it. Okay, so you're not able to put in the roommate, really? No. I'm sorry. I talked to the assistant manager there. She said she just needed proof of income. Okay, repeat there. The assistant manager over at the Oliver, she said she just needed proof of income in order to take you off the lease. So is your ex just not willing to provide that? I don't know. Do you want to try to figure it out so, and we can push your moving date? Yeah. Okay, so do you think that you guys would be able to put in the, provide the proof of income tomorrow to do the roommate release? Okay, so I mean, right now, if I cancel your application, I can go ahead and refund the money paid minus the application fee. So there's a $500 

In [34]:
tokenizer(data["text"][0])["input_ids"]

'Your job is to take a call transcript and provide a diarized transcript of the recording.\n\n### Input:\n{\'text\': " Hello? Hello. Hey, Chiquela, it\'s Carly. I just wanted to follow up with you. I emailed earlier, and I just wanted to see what was going on. Well, we can counsel it. Okay, so you\'re not able to put in the roommate, really? No. I\'m sorry. I talked to the assistant manager there. She said she just needed proof of income. Okay, repeat there. The assistant manager over at the Oliver, she said she just needed proof of income in order to take you off the lease. So is your ex just not willing to provide that? I don\'t know. Do you want to try to figure it out so, and we can push your moving date? Yeah. Okay, so do you think that you guys would be able to put in the, provide the proof of income tomorrow to do the roommate release? Okay, so I mean, right now, if I cancel your application, I can go ahead and refund the money paid minus the application fee. So there\'s a $500 