### Merging LoRA weights, and saving *.safetensors (requirement for AWQ quantization)

In [None]:
from unsloth import FastLanguageModel
import os
import torch
import gc
#Create output directory
os.makedirs("./raw_models", exist_ok = True) 

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


In [2]:
# Define checkpoints paths
model_paths = [
    "training_output/checkpoint-25",
    "training_output/checkpoint-50",
    "training_output/checkpoint-75",
    "training_output/checkpoint-100",
    "training_output/checkpoint-125",
    "training_output/checkpoint-150",
    "training_output/checkpoint-175",
    "training_output/checkpoint-200",
]

In [None]:
# Load model, merge their adapters and save it with safetensors 
max_seq_length = 32768
dtype = None # For auto detection of dtype

for i in range(3, len(model_paths)):
    print=(f"Working on {model_paths[i]}")
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = model_paths[i],
        max_seq_length = max_seq_length,
        dtype = dtype,
    )
    raw_model_path = f"./raw_models/{model_paths[i].split('/')[-1]}"
    
    # Set safe_serialization = None to get safetensors
    model.save_pretrained_merged(raw_model_path, tokenizer, save_method = "merged_16bit", safe_serialization = None)

    #Clear cache
    del model, tokenizer, raw_model_path
    torch.cuda.empty_cache()
    gc.collect()

==((====))==  Unsloth 2025.3.9: Fast Qwen2 patching. Transformers: 4.47.1. vLLM: 0.7.3.
   \\   /|    NVIDIA A100-PCIE-40GB. Num GPUs = 1. Max memory: 39.394 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 8.0. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Unsloth 2025.3.9 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.


Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 166.35 out of 251.73 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


100%|██████████| 28/28 [00:00<00:00, 43.10it/s]


Unsloth: Saving tokenizer... Done.
Done.
==((====))==  Unsloth 2025.3.9: Fast Qwen2 patching. Transformers: 4.47.1. vLLM: 0.7.3.
   \\   /|    NVIDIA A100-PCIE-40GB. Num GPUs = 1. Max memory: 39.394 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 8.0. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 165.93 out of 251.73 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


100%|██████████| 28/28 [00:00<00:00, 52.72it/s]


Unsloth: Saving tokenizer... Done.
Done.
==((====))==  Unsloth 2025.3.9: Fast Qwen2 patching. Transformers: 4.47.1. vLLM: 0.7.3.
   \\   /|    NVIDIA A100-PCIE-40GB. Num GPUs = 1. Max memory: 39.394 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 8.0. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 165.81 out of 251.73 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


100%|██████████| 28/28 [00:00<00:00, 53.04it/s]


Unsloth: Saving tokenizer... Done.
Done.
==((====))==  Unsloth 2025.3.9: Fast Qwen2 patching. Transformers: 4.47.1. vLLM: 0.7.3.
   \\   /|    NVIDIA A100-PCIE-40GB. Num GPUs = 1. Max memory: 39.394 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 8.0. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 165.73 out of 251.73 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


100%|██████████| 28/28 [00:00<00:00, 52.08it/s]


Unsloth: Saving tokenizer... Done.
Done.
==((====))==  Unsloth 2025.3.9: Fast Qwen2 patching. Transformers: 4.47.1. vLLM: 0.7.3.
   \\   /|    NVIDIA A100-PCIE-40GB. Num GPUs = 1. Max memory: 39.394 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 8.0. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 165.58 out of 251.73 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


100%|██████████| 28/28 [00:00<00:00, 52.29it/s]


Unsloth: Saving tokenizer... Done.
Done.
