To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
  <a href="https://github.com/unslothai/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/u54VK8m8tk"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord button.png" width="145"></a>
  <a href="https://ko-fi.com/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Kofi button.png" width="145"></a></a> Join Discord if you need help + ⭐ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐
</div>

To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://github.com/unslothai/unsloth#installation-instructions---conda).

You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save) (eg for Llama.cpp).

[NEW] Llama-3.1 8b, 70b & 405b are trained on a crazy 15 trillion tokens with 128K long context lengths!

**[NEW] Llama 3.2 1B and 3B now supported!! 9 trillion tokens**

Features in the notebook:
1. Uses Maxime Labonne's [FineTome 100K](https://huggingface.co/datasets/mlabonne/FineTome-100k) dataset.
1. Convert ShareGPT to HuggingFace format via `standardize_sharegpt`
2. Train on Completions / Assistant only via `train_on_responses_only`
3. Unsloth now supports Torch 2.4, all TRL & Xformers versions & Python 3.12!

## Kaggle is slow - you'll have to wait **5 minutes** for it to install.

I suggest you to use our free Colab notebooks instead. I linked our Llama 3.1 8b Colab notebook here: [notebook](https://colab.research.google.com/drive/1Ys44kVvmeZtnICzWz0xgpRnrIOjZAuxp?usp=sharing)

In [1]:
%%capture
!pip install pip3-autoremove
!pip-autoremove torch torchvision torchaudio -y
!pip install torch torchvision torchaudio xformers --index-url https://download.pytorch.org/whl/cu121
!pip install unsloth

* We support Llama, Mistral, Phi-3, Gemma, Yi, DeepSeek, Qwen, TinyLlama, Vicuna, Open Hermes etc
* We support 16bit LoRA or 4bit QLoRA. Both 2x faster.
* `max_seq_length` can be set to anything, since we do automatic RoPE Scaling via [kaiokendev's](https://kaiokendev.github.io/til) method.
* [**NEW**] We make Phi-3 Medium / Mini **2x faster**! See our [Phi-3 Medium notebook](https://colab.research.google.com/drive/1hhdhBa1j_hsymiW9m-WzxQtgqTH_NHqi?usp=sharing)
* [**NEW**] We make Gemma-2 9b / 27b **2x faster**! See our [Gemma-2 9b notebook](https://colab.research.google.com/drive/1vIrqH5uYDQwsJ4-OO3DErvuv4pBgVwk4?usp=sharing)
* [**NEW**] To finetune and auto export to Ollama, try our [Ollama notebook](https://colab.research.google.com/drive/1WZDi7APtQ9VsvOrQSSC5DDtxq159j8iZ?usp=sharing)

In [None]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/Meta-Llama-3.1-8B-bnb-4bit",      # Llama-3.1 2x faster
    "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
    "unsloth/Meta-Llama-3.1-70B-bnb-4bit",
    "unsloth/Meta-Llama-3.1-405B-bnb-4bit",    # 4bit for 405b!
    "unsloth/Mistral-Small-Instruct-2409",     # Mistral 22b 2x faster!
    "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
    "unsloth/Phi-3.5-mini-instruct",           # Phi-3.5 2x faster!
    "unsloth/Phi-3-medium-4k-instruct",
    "unsloth/gemma-2-9b-bnb-4bit",
    "unsloth/gemma-2-27b-bnb-4bit",            # Gemma 2x faster!
    
    "unsloth/Llama-3.2-1B-bnb-4bit",           # NEW! Llama 3.2 models
    "unsloth/Llama-3.2-1B-Instruct-bnb-4bit",
    "unsloth/Llama-3.2-3B-bnb-4bit",
    "unsloth/Llama-3.2-3B-Instruct-bnb-4bit",
] # More models at https://huggingface.co/unsloth

# model_name = "unsloth/Llama-3.2-3B-Instruct"
model_name = "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit"

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = model_name, # or choose "unsloth/Llama-3.2-1B-Instruct"
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
#     r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    r = 64,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

### Tools

In [3]:
def print_former_k_dict(dct, former_k=1):
    for i, (k, v) in enumerate(dct.items()):
        if i == former_k:
            break
        print(k)
        print(v)
    print()

### 1. Load ARC dataset

In [2]:
DATA_ROOT = "/kaggle/input/arc-prize-2024/"

train_input_path = f'{DATA_ROOT}/arc-agi_training_challenges.json'
train_output_path = f'{DATA_ROOT}/arc-agi_training_solutions.json'

eval_input_path = f'{DATA_ROOT}/arc-agi_evaluation_challenges.json'
eval_output_path = f'{DATA_ROOT}/arc-agi_evaluation_solutions.json'

test_input_path = f'{DATA_ROOT}/arc-agi_test_challenges.json'
sample_path = f'{DATA_ROOT}/sample_submission.json'

path_dict = dict(
    train_input_path=train_input_path,
    train_output_path=train_output_path,
    eval_input_path=eval_input_path,
    eval_output_path=eval_output_path,
    test_input_path=test_input_path,
    sample_path=sample_path,
)

import os
for k, path in path_dict.items():
    print(k, os.path.isfile(path))

train_input_path True
train_output_path True
eval_input_path True
eval_output_path True
test_input_path True
sample_path True


### 2. Process ARC dataset

In [4]:
PromptTemplate = """
You are given pairs of 2D matrices representing grids. In each matrix, 0 indicates the background, while identical non-zero numbers form specific zones and patterns.  
Your task is to identify the transformation rule that links each input matrix to its corresponding output matrix in the Examples. Then, apply this rule to generate an output matrix for the Test Input Matrix.

Specifically, you need to follow the steps below:
1. Focus on the size relationship between the input matrix and the output matrix in the Examples. There must be a clear dependency between the sizes of the matrices. Based on this, you should accurately determine the size of the output matrix from the Test Input Matrix.
2. Understand the transformation rule between the input matrix and the output matrix. These transformations are based on information from regions formed by identical numbers. This includes absolute positions and shapes of regions, relative positional relationships between regions, etc. You must have a clear definition and description of this transformation rule (but do not output it).
3. Based on the clearly understood transformation rule, strictly follow the output matrix size determined in the first step to generate the output matrix.
4. You only need to output the output matrix.

Examples:
{TRAIN}

Test Input Matrix:
{TEST}
"""

In [5]:
import json
for path in [eval_input_path, eval_output_path]:
    print(path)
    with open(path, "r") as f:
        data = json.load(f)

    for i, (k, v) in enumerate(data.items()):
        print(k, v)
        break

/kaggle/input/arc-prize-2024//arc-agi_evaluation_challenges.json
00576224 {'test': [{'input': [[3, 2], [7, 8]]}], 'train': [{'input': [[8, 6], [6, 4]], 'output': [[8, 6, 8, 6, 8, 6], [6, 4, 6, 4, 6, 4], [6, 8, 6, 8, 6, 8], [4, 6, 4, 6, 4, 6], [8, 6, 8, 6, 8, 6], [6, 4, 6, 4, 6, 4]]}, {'input': [[7, 9], [4, 3]], 'output': [[7, 9, 7, 9, 7, 9], [4, 3, 4, 3, 4, 3], [9, 7, 9, 7, 9, 7], [3, 4, 3, 4, 3, 4], [7, 9, 7, 9, 7, 9], [4, 3, 4, 3, 4, 3]]}]}
/kaggle/input/arc-prize-2024//arc-agi_evaluation_solutions.json
00576224 [[[3, 2, 3, 2, 3, 2], [7, 8, 7, 8, 7, 8], [2, 3, 2, 3, 2, 3], [8, 7, 8, 7, 8, 7], [3, 2, 3, 2, 3, 2], [7, 8, 7, 8, 7, 8]]]


In [6]:
import json
from tqdm import tqdm


def process_input(v):
    test_input_matrix = v["test"][0]
        
    examples_list = v["train"]
    examples = "\n".join([json.dumps(exp) for exp in examples_list])

    input_prompt = PromptTemplate.format(
        TRAIN=examples,
        TEST=test_input_matrix,
    )
    return input_prompt

def process_output(v):
    return json.dumps(v[0])

def process_input_output(input_path, output_path=None):
    
    with open(input_path, "r") as f:
        input_data = json.load(f)
        
    prompts = dict()
            
    for i, (k, v) in enumerate(tqdm(input_data.items())):
        input_prompt = process_input(v)
        prompts[k] = dict(
            input=input_prompt,
        )
        
    if output_path is not None:
        with open(output_path, "r") as f:
            output_data = json.load(f)
        
        for i, (k, v) in enumerate(tqdm(output_data.items())):
            output_prompt = process_output(v)
            prompts[k]["output"] = output_prompt
                                   
    return prompts

In [7]:
train_prompts = process_input_output(input_path=train_input_path, output_path=train_output_path)
eval_prompts = process_input_output(input_path=eval_input_path, output_path=eval_output_path)
test_prompts = process_input_output(input_path=test_input_path)
                               
print_former_k_dict(train_prompts)
print_former_k_dict(eval_prompts)
print_former_k_dict(test_prompts)

100%|██████████| 400/400 [00:00<00:00, 5614.23it/s]
100%|██████████| 400/400 [00:00<00:00, 31990.73it/s]
100%|██████████| 400/400 [00:00<00:00, 3586.17it/s]
100%|██████████| 400/400 [00:00<00:00, 21758.36it/s]
100%|██████████| 100/100 [00:00<00:00, 4960.09it/s]

007bbfb7
{'input': '\nYou are given pairs of 2D matrices representing grids. In each matrix, 0 indicates the background, while identical non-zero numbers form specific zones and patterns.  \nYour task is to identify the transformation rule that links each input matrix to its corresponding output matrix in the Examples. Then, apply this rule to generate an output matrix for the Test Input Matrix.\n\nSpecifically, you need to follow the steps below:\n1. Focus on the size relationship between the input matrix and the output matrix in the Examples. There must be a clear dependency between the sizes of the matrices. Based on this, you should accurately determine the size of the output matrix from the Test Input Matrix.\n2. Understand the transformation rule between the input matrix and the output matrix. These transformations are based on information from regions formed by identical numbers. This includes absolute positions and shapes of regions, relative positional relationships between re




### 3. Format 

In [None]:
from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "llama-3.1",
)

def formatting_prompt_func(k, input_output):
    convo = [
        {
            'content': input_output["input"],
            'role': 'user',
        },
        {
            'content': input_output["output"],
            'role': 'assistant',
        }
    ]
    
    text = tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False)
    
    return dict(
        key=k,
        conversations=convo,
        text=text,
    )

In [None]:
train_dataset_list = [formatting_prompt_func(k, v) for k, v in tqdm(train_prompts.items())]

In [None]:
for k, v in train_dataset_list[0].items():
    print(k, v, sep=": ")

##### Transform data to Dataset

In [None]:
train_dataset_dict = {
    k: [item[k] for item in train_dataset_list] for k in train_dataset_list[0]
}

from datasets import Dataset
dataset = Dataset.from_dict(train_dataset_dict)

print(type(dataset))
print(len(dataset))
print(dataset)
print(dataset[1])

<a name="Train"></a>
### Train the model
Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support TRL's `DPOTrainer`!

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments, DataCollatorForSeq2Seq
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    data_collator = DataCollatorForSeq2Seq(tokenizer = tokenizer),
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        num_train_epochs = 5, # Set this for 1 full training run.
#         max_steps = 60,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none", # Use this for WandB etc
    ),
)

We also use Unsloth's `train_on_completions` method to only train on the assistant outputs and ignore the loss on the user's inputs.

In [None]:
from unsloth.chat_templates import train_on_responses_only
trainer = train_on_responses_only(
    trainer,
    instruction_part = "<|start_header_id|>user<|end_header_id|>\n\n",
    response_part = "<|start_header_id|>assistant<|end_header_id|>\n\n",
)

We verify masking is actually done:

In [None]:
tokenizer.decode(trainer.train_dataset[5]["input_ids"])

In [None]:
space = tokenizer(" ", add_special_tokens = False).input_ids[0]
tokenizer.decode([space if x == -100 else x for x in trainer.train_dataset[5]["labels"]])

We can see the System and Instruction prompts are successfully masked!

In [None]:
#@title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

In [None]:
trainer_stats = trainer.train()

In [None]:
#@title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory         /max_memory*100, 3)
lora_percentage = round(used_memory_for_lora/max_memory*100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

<a name="Save"></a>
### Saving, loading finetuned models
To save the final model as LoRA adapters, either use Huggingface's `push_to_hub` for an online save or `save_pretrained` for a local save.

**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!

In [None]:
model.save_pretrained("lora_model") # Local saving
tokenizer.save_pretrained("lora_model")
# model.push_to_hub("your_name/lora_model", token = "...") # Online saving
# tokenizer.push_to_hub("your_name/lora_model", token = "...") # Online saving

<a name="Inference"></a>
### Inference
Let's run the model! You can change the instruction and input - leave the output blank!

**[NEW] Try 2x faster inference in a free Colab for Llama-3.1 8b Instruct [here](https://colab.research.google.com/drive/1T-YBVfnphoVc8E2E854qF3jdia2Ll2W2?usp=sharing)**

We use `min_p = 0.1` and `temperature = 1.5`. Read this [Tweet](https://x.com/menhguin/status/1826132708508213629) for more information on why.

In [None]:
from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "llama-3.1",
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
pass

In [9]:
def get_answer(model, tokenizer, prompt, max_new_tokens=1024):
    
    messages = [
        {"role": "user", "content": prompt},
    ]
    inputs = tokenizer.apply_chat_template(
        messages,
        tokenize = True,
        add_generation_prompt = True, # Must add for generation
        return_tensors = "pt",
    ).to("cuda")
    
#     print(inputs)

#     outputs = model.generate(input_ids = inputs, max_new_tokens = 64, use_cache = True,
#                              temperature = 1.5, min_p = 0.1)
    gen_tokens = model.generate(input_ids = inputs, max_new_tokens = max_new_tokens, use_cache = True,
                             do_sample=False)
    
#     print(gen_tokens)
    
    gen_texts = tokenizer.batch_decode(gen_tokens, skip_special_tokens=True)
    
    input_texts = tokenizer.batch_decode(inputs, skip_special_tokens=True)
    answers = [
        gen_text[len(input_texts[idx]):].strip() for idx, gen_text in enumerate(gen_texts)
    ]
    
    answer = answers[0]
    
    return answer

In [None]:
for i, (k, v) in enumerate(train_prompts.items()):
#     print(k, v, sep=": ")
    gt = v["output"]
    prompt = v["input"]
    answer = get_answer(model, tokenizer, prompt)
    
    print(answer)
    print(gt)
    
    if i == 1:
        break
    

 You can also use a `TextStreamer` for continuous inference - so you can see the generation token by token, instead of waiting the whole time!

In [None]:
FastLanguageModel.for_inference(model) # Enable native 2x faster inference

messages = [
    {"role": "user", "content": "Continue the fibonnaci sequence: 1, 1, 2, 3, 5, 8,"},
]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
).to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 128,
                   use_cache = True, temperature = 1.5, min_p = 0.1)

### Load Lora Model

In [10]:
# model_path = "/kaggle/input/arc_lora_unsloth_llama-3.2-3b-instruct-bnb-4bit_e5/transformers/default/1"
model_path = "/kaggle/input/arc_lora_model_llama31_8b_instruct_r64/transformers/default/1"
# model_path = "lora_model"

In [11]:
from unsloth import FastLanguageModel

max_seq_length = 1024 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = model_path, # YOUR MODEL YOU USED FOR TRAINING
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
pass

==((====))==  Unsloth 2024.11.5: Fast Llama patching. Transformers = 4.46.2.
   \\   /|    GPU: Tesla T4. Max memory: 14.741 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.5.1+cu121. CUDA = 7.5. CUDA Toolkit = 12.1.
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/234 [00:00<?, ?B/s]

Unsloth 2024.11.5 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


In [None]:
for i, (k, v) in enumerate(train_prompts.items()):
#     print(k, v, sep=": ")
    gt = v["output"]
    prompt = v["input"]
    answer = get_answer(model, tokenizer, prompt, max_new_tokens=max_seq_length)
    
    print(answer)
    print(gt)
    
    break 

In [None]:
c = 0
n = 0
for i, (k, v) in enumerate(eval_prompts.items()):
    
    if i == 20:
        break
        
#     print(k, v, sep=": ")
    gt = v["output"]
    prompt = v["input"]
    answer = get_answer(model, tokenizer, prompt, max_new_tokens=max_seq_length)
    
    print(i, k)
    print("ans:", answer)
    print("gt: ", gt)
    
    if answer == gt:
        c += 1
    n += 1

    print(c, n, sep="/")     

In [12]:
from tqdm import tqdm

solutions = dict()

for i, (k, v) in enumerate(tqdm(test_prompts.items())):
        
#     print(k, v, sep=": ")
#     gt = v["output"]
    prompt = v["input"]
    answer = get_answer(model, tokenizer, prompt, max_new_tokens=max_seq_length)
    
    solutions[k] = answer
    
    if i % 10 == 0:
        print(i, k)
        print(prompt)
        print(answer)
        
        with open("solutions.json", "w") as f:
            json.dump(solutions, f)
    
with open("solutions.json", "w") as f:
    json.dump(solutions, f)

  0%|          | 0/100 [00:00<?, ?it/s]The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
  1%|          | 1/100 [00:21<35:06, 21.28s/it]

0 007bbfb7

You are given pairs of 2D matrices representing grids. In each matrix, 0 indicates the background, while identical non-zero numbers form specific zones and patterns.  
Your task is to identify the transformation rule that links each input matrix to its corresponding output matrix in the Examples. Then, apply this rule to generate an output matrix for the Test Input Matrix.

Specifically, you need to follow the steps below:
1. Focus on the size relationship between the input matrix and the output matrix in the Examples. There must be a clear dependency between the sizes of the matrices. Based on this, you should accurately determine the size of the output matrix from the Test Input Matrix.
2. Understand the transformation rule between the input matrix and the output matrix. These transformations are based on information from regions formed by identical numbers. This includes absolute positions and shapes of regions, relative positional relationships between regions, etc. You

 11%|█         | 11/100 [10:35<1:24:01, 56.65s/it]

10 09629e4f

You are given pairs of 2D matrices representing grids. In each matrix, 0 indicates the background, while identical non-zero numbers form specific zones and patterns.  
Your task is to identify the transformation rule that links each input matrix to its corresponding output matrix in the Examples. Then, apply this rule to generate an output matrix for the Test Input Matrix.

Specifically, you need to follow the steps below:
1. Focus on the size relationship between the input matrix and the output matrix in the Examples. There must be a clear dependency between the sizes of the matrices. Based on this, you should accurately determine the size of the output matrix from the Test Input Matrix.
2. Understand the transformation rule between the input matrix and the output matrix. These transformations are based on information from regions formed by identical numbers. This includes absolute positions and shapes of regions, relative positional relationships between regions, etc. Yo

 21%|██        | 21/100 [21:12<1:29:07, 67.68s/it]

20 1190e5a7

You are given pairs of 2D matrices representing grids. In each matrix, 0 indicates the background, while identical non-zero numbers form specific zones and patterns.  
Your task is to identify the transformation rule that links each input matrix to its corresponding output matrix in the Examples. Then, apply this rule to generate an output matrix for the Test Input Matrix.

Specifically, you need to follow the steps below:
1. Focus on the size relationship between the input matrix and the output matrix in the Examples. There must be a clear dependency between the sizes of the matrices. Based on this, you should accurately determine the size of the output matrix from the Test Input Matrix.
2. Understand the transformation rule between the input matrix and the output matrix. These transformations are based on information from regions formed by identical numbers. This includes absolute positions and shapes of regions, relative positional relationships between regions, etc. Yo

 31%|███       | 31/100 [26:36<31:01, 26.98s/it]  

30 1cf80156

You are given pairs of 2D matrices representing grids. In each matrix, 0 indicates the background, while identical non-zero numbers form specific zones and patterns.  
Your task is to identify the transformation rule that links each input matrix to its corresponding output matrix in the Examples. Then, apply this rule to generate an output matrix for the Test Input Matrix.

Specifically, you need to follow the steps below:
1. Focus on the size relationship between the input matrix and the output matrix in the Examples. There must be a clear dependency between the sizes of the matrices. Based on this, you should accurately determine the size of the output matrix from the Test Input Matrix.
2. Understand the transformation rule between the input matrix and the output matrix. These transformations are based on information from regions formed by identical numbers. This includes absolute positions and shapes of regions, relative positional relationships between regions, etc. Yo

 41%|████      | 41/100 [33:23<27:46, 28.24s/it]  

40 22168020

You are given pairs of 2D matrices representing grids. In each matrix, 0 indicates the background, while identical non-zero numbers form specific zones and patterns.  
Your task is to identify the transformation rule that links each input matrix to its corresponding output matrix in the Examples. Then, apply this rule to generate an output matrix for the Test Input Matrix.

Specifically, you need to follow the steps below:
1. Focus on the size relationship between the input matrix and the output matrix in the Examples. There must be a clear dependency between the sizes of the matrices. Based on this, you should accurately determine the size of the output matrix from the Test Input Matrix.
2. Understand the transformation rule between the input matrix and the output matrix. These transformations are based on information from regions formed by identical numbers. This includes absolute positions and shapes of regions, relative positional relationships between regions, etc. Yo

 51%|█████     | 51/100 [37:23<26:01, 31.86s/it]

50 25d487eb

You are given pairs of 2D matrices representing grids. In each matrix, 0 indicates the background, while identical non-zero numbers form specific zones and patterns.  
Your task is to identify the transformation rule that links each input matrix to its corresponding output matrix in the Examples. Then, apply this rule to generate an output matrix for the Test Input Matrix.

Specifically, you need to follow the steps below:
1. Focus on the size relationship between the input matrix and the output matrix in the Examples. There must be a clear dependency between the sizes of the matrices. Based on this, you should accurately determine the size of the output matrix from the Test Input Matrix.
2. Understand the transformation rule between the input matrix and the output matrix. These transformations are based on information from regions formed by identical numbers. This includes absolute positions and shapes of regions, relative positional relationships between regions, etc. Yo

 61%|██████    | 61/100 [49:12<50:56, 78.38s/it]  

60 29ec7d0e

You are given pairs of 2D matrices representing grids. In each matrix, 0 indicates the background, while identical non-zero numbers form specific zones and patterns.  
Your task is to identify the transformation rule that links each input matrix to its corresponding output matrix in the Examples. Then, apply this rule to generate an output matrix for the Test Input Matrix.

Specifically, you need to follow the steps below:
1. Focus on the size relationship between the input matrix and the output matrix in the Examples. There must be a clear dependency between the sizes of the matrices. Based on this, you should accurately determine the size of the output matrix from the Test Input Matrix.
2. Understand the transformation rule between the input matrix and the output matrix. These transformations are based on information from regions formed by identical numbers. This includes absolute positions and shapes of regions, relative positional relationships between regions, etc. Yo

 71%|███████   | 71/100 [57:44<30:21, 62.81s/it]

70 3345333e

You are given pairs of 2D matrices representing grids. In each matrix, 0 indicates the background, while identical non-zero numbers form specific zones and patterns.  
Your task is to identify the transformation rule that links each input matrix to its corresponding output matrix in the Examples. Then, apply this rule to generate an output matrix for the Test Input Matrix.

Specifically, you need to follow the steps below:
1. Focus on the size relationship between the input matrix and the output matrix in the Examples. There must be a clear dependency between the sizes of the matrices. Based on this, you should accurately determine the size of the output matrix from the Test Input Matrix.
2. Understand the transformation rule between the input matrix and the output matrix. These transformations are based on information from regions formed by identical numbers. This includes absolute positions and shapes of regions, relative positional relationships between regions, etc. Yo

 81%|████████  | 81/100 [1:12:03<25:18, 79.94s/it] 

80 3aa6fb7a

You are given pairs of 2D matrices representing grids. In each matrix, 0 indicates the background, while identical non-zero numbers form specific zones and patterns.  
Your task is to identify the transformation rule that links each input matrix to its corresponding output matrix in the Examples. Then, apply this rule to generate an output matrix for the Test Input Matrix.

Specifically, you need to follow the steps below:
1. Focus on the size relationship between the input matrix and the output matrix in the Examples. There must be a clear dependency between the sizes of the matrices. Based on this, you should accurately determine the size of the output matrix from the Test Input Matrix.
2. Understand the transformation rule between the input matrix and the output matrix. These transformations are based on information from regions formed by identical numbers. This includes absolute positions and shapes of regions, relative positional relationships between regions, etc. Yo

 91%|█████████ | 91/100 [1:16:31<04:10, 27.80s/it]

90 3f7978a0

You are given pairs of 2D matrices representing grids. In each matrix, 0 indicates the background, while identical non-zero numbers form specific zones and patterns.  
Your task is to identify the transformation rule that links each input matrix to its corresponding output matrix in the Examples. Then, apply this rule to generate an output matrix for the Test Input Matrix.

Specifically, you need to follow the steps below:
1. Focus on the size relationship between the input matrix and the output matrix in the Examples. There must be a clear dependency between the sizes of the matrices. Based on this, you should accurately determine the size of the output matrix from the Test Input Matrix.
2. Understand the transformation rule between the input matrix and the output matrix. These transformations are based on information from regions formed by identical numbers. This includes absolute positions and shapes of regions, relative positional relationships between regions, etc. Yo

100%|██████████| 100/100 [1:26:24<00:00, 51.85s/it]


### Format output

In [14]:
# 由于输出长度、OOM 限制被截断了

def complete_matrix_string(matrix_string):
    # 统计方括号的数量
    open_brackets = matrix_string.count('[')
    close_brackets = matrix_string.count(']')

    # 计算缺失的右方括号数量，并补齐
    missing_brackets = open_brackets - close_brackets
    if missing_brackets > 0:
        matrix_string += ']' * missing_brackets

    # 验证字符串是否可以被解析为 JSON
    try:
        json.loads(matrix_string)
    except json.JSONDecodeError:
        return [[0]]

    return matrix_string

In [22]:
# Temporally not split yet
import json
from tqdm import tqdm

with open(test_input_path, "r") as f:
    input_data = json.load(f)
    
ret = {}
    
for k in tqdm(solutions):
    
    try:
        solu = json.loads(solutions[k])
    except json.decoder.JSONDecodeError:
        solu = complete_matrix_string(solutions[k])
    
    # this is because not split
    ret[k] = [
        dict(attempt_1=solu, attempt_2=solu)
        for i in range(len(input_data[k]["test"]))
    ]
        
with open("submission.json", "w") as f:
    json.dump(ret, f)

100%|██████████| 100/100 [00:00<00:00, 22499.22it/s]


In [23]:
print_former_k_dict(ret, former_k=5)

007bbfb7
[{'attempt_1': [[7, 0, 7, 7, 0, 7, 7, 0, 7], [7, 0, 7, 7, 0, 7, 7, 0, 7], [7, 7, 0, 7, 7, 0, 7, 7, 0], [7, 0, 7, 7, 0, 7, 7, 0, 7], [7, 0, 7, 7, 0, 7, 7, 0, 7], [7, 7, 0, 7, 7, 0, 7, 7, 0], [7, 0, 7, 0, 0, 0, 0, 0, 0], [7, 0, 7, 0, 0, 0, 0, 0, 0], [7, 7, 0, 0, 0, 0, 0, 0, 0]], 'attempt_2': [[7, 0, 7, 7, 0, 7, 7, 0, 7], [7, 0, 7, 7, 0, 7, 7, 0, 7], [7, 7, 0, 7, 7, 0, 7, 7, 0], [7, 0, 7, 7, 0, 7, 7, 0, 7], [7, 0, 7, 7, 0, 7, 7, 0, 7], [7, 7, 0, 7, 7, 0, 7, 7, 0], [7, 0, 7, 0, 0, 0, 0, 0, 0], [7, 0, 7, 0, 0, 0, 0, 0, 0], [7, 7, 0, 0, 0, 0, 0, 0, 0]]}]
00d62c1b
[{'attempt_1': [[0]], 'attempt_2': [[0]]}]
017c7c7b
[{'attempt_1': [[2, 2, 2], [0, 2, 0], [0, 2, 0], [2, 2, 2], [0, 2, 0], [0, 2, 0], [2, 2, 2], [0, 2, 0], [0, 2, 0]], 'attempt_2': [[2, 2, 2], [0, 2, 0], [0, 2, 0], [2, 2, 2], [0, 2, 0], [0, 2, 0], [2, 2, 2], [0, 2, 0], [0, 2, 0]]}]
025d127b
[{'attempt_1': [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 4, 4, 4, 4, 4, 4, 0, 0], [0, 0, 4, 0, 0, 0, 0, 0, 4, 0], [0, 0, 0, 4, 0, 0, 0, 0

# Needless

You can also use Hugging Face's `AutoModelForPeftCausalLM`. Only use this if you do not have `unsloth` installed. It can be hopelessly slow, since `4bit` model downloading is not supported, and Unsloth's **inference is 2x faster**.

In [None]:
if False:
    # I highly do NOT suggest - use Unsloth if possible
    from peft import AutoPeftModelForCausalLM
    from transformers import AutoTokenizer
    model = AutoPeftModelForCausalLM.from_pretrained(
        "lora_model", # YOUR MODEL YOU USED FOR TRAINING
        load_in_4bit = load_in_4bit,
    )
    tokenizer = AutoTokenizer.from_pretrained("lora_model")

### Saving to float16 for VLLM

We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens.

In [None]:
# Merge to 16bit
if False: model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_16bit", token = "")

# Merge to 4bit
if False: model.save_pretrained_merged("model", tokenizer, save_method = "merged_4bit",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_4bit", token = "")

# Just LoRA adapters
if False: model.save_pretrained_merged("model", tokenizer, save_method = "lora",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "lora", token = "")

### GGUF / llama.cpp Conversion
To save to `GGUF` / `llama.cpp`, we support it natively now! We clone `llama.cpp` and we default save it to `q8_0`. We allow all methods like `q4_k_m`. Use `save_pretrained_gguf` for local saving and `push_to_hub_gguf` for uploading to HF.

Some supported quant methods (full list on our [Wiki page](https://github.com/unslothai/unsloth/wiki#gguf-quantization-options)):
* `q8_0` - Fast conversion. High resource use, but generally acceptable.
* `q4_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.
* `q5_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K.

[**NEW**] To finetune and auto export to Ollama, try our [Ollama notebook](https://colab.research.google.com/drive/1WZDi7APtQ9VsvOrQSSC5DDtxq159j8iZ?usp=sharing)

In [None]:
# Save to 8bit Q8_0
if False: model.save_pretrained_gguf("model", tokenizer,)
# Remember to go to https://huggingface.co/settings/tokens for a token!
# And change hf to your username!
if False: model.push_to_hub_gguf("hf/model", tokenizer, token = "")

# Save to 16bit GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "")

# Save to q4_k_m GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "q4_k_m", token = "")

# Save to multiple GGUF options - much faster if you want multiple!
if False:
    model.push_to_hub_gguf(
        "hf/model", # Change hf to your username!
        tokenizer,
        quantization_method = ["q4_k_m", "q8_0", "q5_k_m",],
        token = "", # Get a token at https://huggingface.co/settings/tokens
    )

Now, use the `model-unsloth.gguf` file or `model-unsloth-Q4_K_M.gguf` file in `llama.cpp` or a UI based system like `GPT4All`. You can install GPT4All by going [here](https://gpt4all.io/index.html).

And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/u54VK8m8tk) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other links:
1. Zephyr DPO 2x faster [free Colab](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing)
2. Llama 7b 2x faster [free Colab](https://colab.research.google.com/drive/1lBzz5KeZJKXjvivbYvmGarix9Ao6Wxe5?usp=sharing)
3. TinyLlama 4x faster full Alpaca 52K in 1 hour [free Colab](https://colab.research.google.com/drive/1AZghoNBQaMDgWJpi4RbffGM1h6raLUj9?usp=sharing)
4. CodeLlama 34b 2x faster [A100 on Colab](https://colab.research.google.com/drive/1y7A0AxE3y8gdj4AVkl2aZX47Xu3P1wJT?usp=sharing)
5. Mistral 7b [free Kaggle version](https://www.kaggle.com/code/danielhanchen/kaggle-mistral-7b-unsloth-notebook)
6. We also did a [blog](https://huggingface.co/blog/unsloth-trl) with 🤗 HuggingFace, and we're in the TRL [docs](https://huggingface.co/docs/trl/main/en/sft_trainer#accelerate-fine-tuning-2x-using-unsloth)!
7. `ChatML` for ShareGPT datasets, [conversational notebook](https://colab.research.google.com/drive/1Aau3lgPzeZKQ-98h69CCu1UJcvIBLmy2?usp=sharing)
8. Text completions like novel writing [notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing)
9. [**NEW**] We make Phi-3 Medium / Mini **2x faster**! See our [Phi-3 Medium notebook](https://colab.research.google.com/drive/1hhdhBa1j_hsymiW9m-WzxQtgqTH_NHqi?usp=sharing)
10. [**NEW**] We make Gemma-2 9b / 27b **2x faster**! See our [Gemma-2 9b notebook](https://colab.research.google.com/drive/1vIrqH5uYDQwsJ4-OO3DErvuv4pBgVwk4?usp=sharing)
11. [**NEW**] To finetune and auto export to Ollama, try our [Ollama notebook](https://colab.research.google.com/drive/1WZDi7APtQ9VsvOrQSSC5DDtxq159j8iZ?usp=sharing)
12. [**NEW**] We make Mistral NeMo 12B 2x faster and fit in under 12GB of VRAM! [Mistral NeMo notebook](https://colab.research.google.com/drive/17d3U-CAIwzmbDRqbZ9NnpHxCkmXB6LZ0?usp=sharing)

<div class="align-center">
  <a href="https://github.com/unslothai/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/u54VK8m8tk"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord.png" width="145"></a>
  <a href="https://ko-fi.com/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Kofi button.png" width="145"></a></a> Support our work if you can! Thanks!
</div>