<div align="center">
  <img src="logo_branding.png" width="250" alt="kavi.ai Logo">
  <h1>Unsloth: Nitro-Boosted LLM Training</h1>
  <p><b>A Premium Training Module by kavi.ai</b></p>
</div>

---

### 💎 **Smarter Overview**
Unsloth provides a 2x-5x speedup and 70% memory reduction by using handwritten OpenAI Triton kernels and advanced gradient checkpointing.

### 🚀 **Enterprise Use Case**
High-throughput training pipelines where every minute saved in the cloud drastically reduces operational costs.

### 📈 **Strategic Advantages**
- **Extreme Throughput**: Maximize GPU utilization.
- **Resource Optimization**: Train larger batches on smaller hardware.
- **Seamless Integration**: Works natively with HuggingFace's TRL and PEFT.

---

In [None]:
!pip install transformers --upgrade
!pip install datasets
!pip install trl[peft] --upgrade
!pip install -U git+https://github.com/huggingface/trl
!pip install bitsandbytes loralib
!pip install wandb -U
!pip install hf_transfer

In [None]:
!nvidia-smi

In [None]:
!pip install "unsloth[cu122-ampere-torch211] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps packaging ninja einops flash-attn xformers trl peft accelerate bitsandbytes


In [None]:
!python -m xformers.info
!python -m bitsandbytes

In [None]:
%env HF_HUB_ENABLE_HF_TRANSFER=True
%env WANDB_PROJECT=LLM-Training-Course
%env WANDB_RUN_ID=UNSLOTH
%env WANDB_NOTEBOOK_NAME={__vsc_ipynb_file__}

In [None]:
import wandb
wandb.login()

In [None]:
import sys
sys.path.append('/root/llm-training-course/')

In [None]:
from datasets import load_dataset
train_ds, eval_ds = load_dataset("mlabonne/orpo-dpo-mix-40k", split=["train[:10%]","train[10%:15%]"])

In [None]:
train_ds = train_ds.map(lambda x: { "messages": [{"role":"system", "content": x["prompt"] }] + x["chosen"] })
eval_ds = eval_ds.map(lambda x: { "messages": [{"role":"system", "content": x["prompt"] }] + x["chosen"] })

In [None]:
columns_to_remove = [c for c in train_ds.column_names if c not in ["messages"]]
train_ds = train_ds.remove_columns(columns_to_remove)

columns_to_remove = [c for c in eval_ds.column_names if c not in ["messages"]]
eval_ds = eval_ds.remove_columns(columns_to_remove)

In [None]:
!pip install --force-reinstall "xformers<0.0.27"
!pip install --force-reinstall "numpy<2.0"

In [None]:
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/llama-3-8b-bnb-4bit",
    dtype = None,
    load_in_4bit = True
)


In [None]:
# Do model patching and add fast LoRA weights
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj"],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth", 
    random_state = 3407,
    max_seq_length = 2048,
    use_rslora = False,
    loftq_config = None
)


In [None]:
from helpers import stream_responses_for_sample
from transformers import GenerationConfig

generation_config =  GenerationConfig(max_new_tokens=50)
sample_conversations = [
    [{"role": "user", "content": "What is the capital of France?"}],
    [{"role": "user", "content": "Write me a javascript function that check if string is palindrome."}],
    [{"role": "user", "content": "Given x^2=36-4 what is x?"}]
]
stream_responses_for_sample(model, tokenizer, sample_conversations, generation_config=generation_config)

In [None]:
import os
import torch
from trl import SFTTrainer, SFTConfig

args = SFTConfig(
    output_dir=os.getenv("WANDB_RUN_ID"),
    report_to="wandb",
    num_train_epochs=1.0,
    do_train=True,
    do_eval=True,
    log_level="debug",
    gradient_checkpointing=True,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=8,
    per_device_eval_batch_size=4,
    lr_scheduler_type="constant",
    fp16 = not torch.cuda.is_bf16_supported(),
    bf16 = torch.cuda.is_bf16_supported(),
    evaluation_strategy="steps",
    eval_steps=0.2,
    logging_steps=0.1,
    max_grad_norm=.3,
    learning_rate=5e-5
)


In [None]:
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    args=args,
    train_dataset=train_ds,
    eval_dataset=eval_ds
)
trainer.train()