To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
  <a href="https://github.com/unslothai/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/u54VK8m8tk"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord button.png" width="145"></a>
  <a href="https://ko-fi.com/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Kofi button.png" width="145"></a></a> Join Discord if you need help + ‚≠ê <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ‚≠ê
</div>

To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://github.com/unslothai/unsloth?tab=readme-ov-file#-installation-instructions).

You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save) (eg for Llama.cpp).

**[NEW] Try 2x faster inference in a free Colab for Llama-3.1 8b Instruct [here](https://colab.research.google.com/drive/1T-YBVfnphoVc8E2E854qF3jdia2Ll2W2?usp=sharing)**

Features in the notebook:
1. Uses Maxime Labonne's [FineTome 100K](https://huggingface.co/datasets/mlabonne/FineTome-100k) dataset.
1. Convert ShareGPT to HuggingFace format via `standardize_sharegpt`
2. Train on Completions / Assistant only via `train_on_responses_only`
3. Unsloth now supports Torch 2.4, all TRL & Xformers versions & Python 3.12!

In [None]:
%%capture
!pip install unsloth
# Also get the latest nightly Unsloth!
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git@nightly git+https://github.com/unslothai/unsloth-zoo.git

* We support Llama, Mistral, Phi-3, Gemma, Yi, DeepSeek, Qwen, TinyLlama, Vicuna, Open Hermes etc
* We support 16bit LoRA or 4bit QLoRA. Both 2x faster.
* `max_seq_length` can be set to anything, since we do automatic RoPE Scaling via [kaiokendev's](https://kaiokendev.github.io/til) method.
* [**NEW**] We make Gemma-2 9b / 27b **2x faster**! See our [Gemma-2 9b notebook](https://colab.research.google.com/drive/1vIrqH5uYDQwsJ4-OO3DErvuv4pBgVwk4?usp=sharing)
* [**NEW**] To finetune and auto export to Ollama, try our [Ollama notebook](https://colab.research.google.com/drive/1WZDi7APtQ9VsvOrQSSC5DDtxq159j8iZ?usp=sharing)

In [None]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 8192 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/Meta-Llama-3.1-8B-bnb-4bit",      # Llama-3.1 2x faster
    "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
    "unsloth/Meta-Llama-3.1-70B-bnb-4bit",
    "unsloth/Meta-Llama-3.1-405B-bnb-4bit",    # 4bit for 405b!
    "unsloth/Mistral-Small-Instruct-2409",     # Mistral 22b 2x faster!
    "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
    "unsloth/Phi-3.5-mini-instruct",           # Phi-3.5 2x faster!
    "unsloth/Phi-3-medium-4k-instruct",
    "unsloth/gemma-2-9b-bnb-4bit",
    "unsloth/gemma-2-27b-bnb-4bit",            # Gemma 2x faster!

    "unsloth/Llama-3.2-1B-bnb-4bit",           # NEW! Llama 3.2 models
    "unsloth/Llama-3.2-1B-Instruct-bnb-4bit",
    "unsloth/Llama-3.2-3B-bnb-4biInstructt",
    "unsloth/Llama-3.2-3B-Instruct-bnb-4bit",

    "unsloth/Llama-3.3-70B-Instruct-bnb-4bit" # NEW! Llama 3.3 70B!
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Llama-3.2-3B-Instruct", # or choose "unsloth/Llama-3.2-1B-Instruct"
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

ü¶• Unsloth: Will patch your computer to enable 2x faster free finetuning.
Unsloth: Failed to patch Gemma3ForConditionalGeneration.
ü¶• Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.3.19: Fast Llama patching. Transformers: 4.51.3.
   \\   /|    NVIDIA A100-SXM4-40GB. Num GPUs = 1. Max memory: 39.557 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 8.0. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/2.35G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/234 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/54.7k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/454 [00:00<?, ?B/s]

We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth 2025.3.19 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.


<a name="Data"></a>
### Data Prep
We now use the `Llama-3.1` format for conversation style finetunes. We use [Maxime Labonne's FineTome-100k](https://huggingface.co/datasets/mlabonne/FineTome-100k) dataset in ShareGPT style. But we convert it to HuggingFace's normal multiturn format `("role", "content")` instead of `("from", "value")`/ Llama-3 renders multi turn conversations like below:

```
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

Hello!<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Hey there! How are you?<|eot_id|><|start_header_id|>user<|end_header_id|>

I'm great thanks!<|eot_id|>
```

We use our `get_chat_template` function to get the correct chat template. We support `zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, phi3, llama3` and more.

In [None]:
from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "llama-3.1",
)

def formatting_prompts_func(examples):
    convos = examples["conversation"]
    texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos]
    return { "text" : texts, }
pass

from datasets import load_dataset
dataset = load_dataset("parquet", data_files="/content/converted_dataset.parquet", split="train")


Generating train split: 0 examples [00:00, ? examples/s]

We now use `standardize_sharegpt` to convert ShareGPT style datasets into HuggingFace's generic format. This changes the dataset from looking like:
```
{"from": "system", "value": "You are an assistant"}
{"from": "human", "value": "What is 2+2?"}
{"from": "gpt", "value": "It's 4."}
```
to
```
{"role": "system", "content": "You are an assistant"}
{"role": "user", "content": "What is 2+2?"}
{"role": "assistant", "content": "It's 4."}
```

In [None]:
import json
from datasets import load_dataset

# Load your dataset
dataset = load_dataset("parquet", data_files="/content/converted_dataset.parquet", split="train")

def transform_conversation(example):
    new_convos = []
    # Process each conversation (assuming it's a JSON string)
    for convo in example["conversation"]:
        # Parse the JSON string to a list of dictionaries
        try:
            messages = json.loads(convo)
        except Exception as e:
            # If already a list, use it directly
            messages = convo

        new_messages = []
        for msg in messages:
            # Rename keys and update role if needed
            new_role = msg.get("from", "")
            if new_role == "gpt":
                new_role = "assistant"
            # Build new message dict
            new_msg = {"role": new_role, "content": msg.get("value", "")}
            new_messages.append(new_msg)
        # Convert the list back to a JSON string (if you prefer to store it as a string)
        new_convos.append(json.dumps(new_messages, ensure_ascii=False))
    return {"conversation": new_convos}

# Apply the transformation in batched mode
dataset = dataset.map(transform_conversation, batched=True)

# Optional: view the first example to check the result
print(dataset[0])


Map:   0%|          | 0/440 [00:00<?, ? examples/s]

{'conversation': '[{"role": "system", "content": "You are a climate change research assistant with expertise in adaptation tracking through document analysis.\\nYour task is to identify the evidence regarding the following questions within the context of climate change adaptation:"}, {"role": "human", "content": "Below are 5 relevant contexts from research papers:\\n1) change will lead to reduced forage and livestock waste.|3.75|0.911| | |I believe that climate change will have a negative impact on agriculture in the Kermanshah province.|3.68|0.948| | |I believe that climate change has a negative impact on wheat production in the Kermanshah province.|3.70|0.832| | |I think that the amount of milk and meat cattle will decrease due to climate change.|3.50|1.048| | |I believe that diseases and pests will increase due to climate change.|3.64|0.953| | |I believe that climate change will lead to biodiversity depletion.|3.49|1.023| | |Considering any potential effects of climate change there 

In [None]:
import json
from datasets import load_dataset

# Load dataset
dataset = load_dataset("parquet", data_files="/content/converted_dataset.parquet", split="train")

# Function to standardize keys in conversation messages
def transform_conversation(example):
    new_convos = []

    for convo in example["conversation"]:
        # Ensure convo is a list, not a string
        if isinstance(convo, str):
            try:
                messages = json.loads(convo)  # Convert JSON string to list
            except json.JSONDecodeError:
                print("Error decoding JSON, skipping...")
                return example
        else:
            messages = convo

        new_messages = []
        for msg in messages:
            new_role = msg.get("from", "")
            if new_role == "gpt":
                new_role = "assistant"

            new_msg = {
                "role": new_role,
                "content": msg.get("value", ""),
            }
            new_messages.append(new_msg)

        new_convos.append(new_messages)  # Keep as list, don't convert to JSON string

    return {"conversation": new_convos}

# Apply transformation
dataset = dataset.map(transform_conversation, batched=True)

# Debugging: Print first example to verify
print(dataset[0])


Map:   0%|          | 0/440 [00:00<?, ? examples/s]

{'conversation': [{'content': 'You are a climate change research assistant with expertise in adaptation tracking through document analysis.\nYour task is to identify the evidence regarding the following questions within the context of climate change adaptation:', 'role': 'system'}, {'content': "Below are 5 relevant contexts from research papers:\n1) change will lead to reduced forage and livestock waste.|3.75|0.911| | |I believe that climate change will have a negative impact on agriculture in the Kermanshah province.|3.68|0.948| | |I believe that climate change has a negative impact on wheat production in the Kermanshah province.|3.70|0.832| | |I think that the amount of milk and meat cattle will decrease due to climate change.|3.50|1.048| | |I believe that diseases and pests will increase due to climate change.|3.64|0.953| | |I believe that climate change will lead to biodiversity depletion.|3.49|1.023| | |Considering any potential effects of climate change there could be for society

In [None]:
from unsloth.chat_templates import standardize_sharegpt
dataset = standardize_sharegpt(dataset)
dataset = dataset.map(formatting_prompts_func, batched = True,)

Map:   0%|          | 0/440 [00:00<?, ? examples/s]

We look at how the conversations are structured for item 5:

In [None]:
dataset[5]["conversation"]

[{'content': 'You are a climate change research assistant with expertise in adaptation tracking through document analysis.\nYour task is to identify the evidence regarding the following questions within the context of climate change adaptation:',
  'role': 'system'},
 {'content': 'Below are 5 relevant contexts from research papers:\n1) to other crops?|Yes/No| |(12) If yes, which crops are you most likely to switch to?|Open| |(13) Given all other conditions remain the same, if the precipitation is increasing in the long run, will you switch to other crops?|Yes/No| |(14) If yes, which crops are you most likely to switch to?|Open| |(15) Given all other conditions remain the same, if the precipitation is decreasing in the long run, will you switch to other crops?|Yes/No| |(16) If yes, which crops are you most likely to switch to?|Open| not far from each other, the walking time from home to the village border is about 0.58 hours. Household size varies from 1 to 12 members with a mean of 4.9

And we see how the chat template transformed these conversations.

**[Notice]** Llama 3.1 Instruct's default chat template default adds `"Cutting Knowledge Date: December 2023\nToday Date: 26 July 2024"`, so do not be alarmed!

In [None]:
dataset[5]["text"]

'<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 26 July 2024\n\nYou are a climate change research assistant with expertise in adaptation tracking through document analysis.\nYour task is to identify the evidence regarding the following questions within the context of climate change adaptation:<|eot_id|><|start_header_id|>human<|end_header_id|>\n\nBelow are 5 relevant contexts from research papers:\n1) to other crops?|Yes/No| |(12) If yes, which crops are you most likely to switch to?|Open| |(13) Given all other conditions remain the same, if the precipitation is increasing in the long run, will you switch to other crops?|Yes/No| |(14) If yes, which crops are you most likely to switch to?|Open| |(15) Given all other conditions remain the same, if the precipitation is decreasing in the long run, will you switch to other crops?|Yes/No| |(16) If yes, which crops are you most likely to switch to?|Open| not far from each othe

<a name="Train"></a>
### Train the model
Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support TRL's `DPOTrainer`!

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments, DataCollatorForSeq2Seq
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,  # A100 can handle long sequences
    data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer),
    dataset_num_proc = 8,   # Utilize more CPU threads
    packing = True,         # Efficient token utilization
    args = TrainingArguments(
        per_device_train_batch_size = 32,       # üöÄ Big speedup on A100
        gradient_accumulation_steps = 1,        # Instant gradient updates
        warmup_steps = 100,
        num_train_epochs =30,                   # Reduce if training fast
        max_steps = -1,
        learning_rate = 2e-4,
        fp16 = False,                           # Let bf16 take over
        bf16 = True,                            # ‚úÖ A100 loves bf16
        logging_steps = 10,
        optim = "adamw_8bit",                   # 8-bit optimizer still efficient
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none",
        save_strategy = "steps",
        save_steps = 420                    # Enable WandB here if needed
    ),
)


Unsloth: Hugging Face's packing is currently buggy - we're disabling it for now!


We also use Unsloth's `train_on_completions` method to only train on the assistant outputs and ignore the loss on the user's inputs.

In [None]:
from unsloth.chat_templates import train_on_responses_only
trainer = train_on_responses_only(
    trainer,
    instruction_part = "<|start_header_id|>user<|end_header_id|>\n\n",
    response_part = "<|start_header_id|>assistant<|end_header_id|>\n\n",
)

We verify masking is actually done:

In [None]:
tokenizer.decode(trainer.train_dataset[5]["input_ids"])

'<|begin_of_text|><|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 26 July 2024\n\nYou are a climate change research assistant with expertise in adaptation tracking through document analysis.\nYour task is to identify the evidence regarding the following questions within the context of climate change adaptation:<|eot_id|><|start_header_id|>human<|end_header_id|>\n\nBelow are 5 relevant contexts from research papers:\n1) to other crops?|Yes/No| |(12) If yes, which crops are you most likely to switch to?|Open| |(13) Given all other conditions remain the same, if the precipitation is increasing in the long run, will you switch to other crops?|Yes/No| |(14) If yes, which crops are you most likely to switch to?|Open| |(15) Given all other conditions remain the same, if the precipitation is decreasing in the long run, will you switch to other crops?|Yes/No| |(16) If yes, which crops are you most likely to switch to?|Open| not f

In [None]:
space = tokenizer(" ", add_special_tokens = False).input_ids[0]
tokenizer.decode([space if x == -100 else x for x in trainer.train_dataset[5]["labels"]])

'                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       

We can see the System and Instruction prompts are successfully masked!

In [None]:
#@title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = NVIDIA A100-SXM4-40GB. Max memory = 39.557 GB.
22.729 GB of memory reserved.


In [None]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 440 | Num Epochs = 30 | Total steps = 420
O^O/ \_/ \    Batch size per device = 32 | Gradient accumulation steps = 1
\        /    Data Parallel GPUs = 1 | Total batch size (32 x 1 x 1) = 32
 "-____-"     Trainable parameters = 24,313,856/3,000,000,000 (0.81% trained)


Step,Training Loss
10,1.5325
20,1.4943
30,1.4433
40,1.4464
50,1.4032
60,1.2617
70,1.2659
80,1.1078
90,1.0388
100,0.8803


In [None]:
trainer_stats = trainer.train(resume_from_checkpoint = True)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 440 | Num Epochs = 3 | Total steps = 84
O^O/ \_/ \    Batch size per device = 16 | Gradient accumulation steps = 1
\        /    Data Parallel GPUs = 1 | Total batch size (16 x 1 x 1) = 16
 "-____-"     Trainable parameters = 24,313,856/3,000,000,000 (0.81% trained)


Step,Training Loss
60,2.4643


KeyboardInterrupt: 

In [None]:
#@title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory         /max_memory*100, 3)
lora_percentage = round(used_memory_for_lora/max_memory*100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

<a name="Inference"></a>
### Inference
Let's run the model! You can change the instruction and input - leave the output blank!

**[NEW] Try 2x faster inference in a free Colab for Llama-3.1 8b Instruct [here](https://colab.research.google.com/drive/1T-YBVfnphoVc8E2E854qF3jdia2Ll2W2?usp=sharing)**

We use `min_p = 0.1` and `temperature = 1.5`. Read this [Tweet](https://x.com/menhguin/status/1826132708508213629) for more information on why.

In [None]:
messages = [
    {
        "role": "system",
        "content": "You are a climate change research assistant with expertise in adaptation tracking through document analysis. Your task is to identify the evidence regarding the following questions within the context of climate change adaptation."
    },
    {
        "role": "user",
        "content": """Below are 3 relevant contexts from research papers:
1) change (Measham et al., 2011; Preston, Westaway, & Yuen, 2011) and the severity of its adverse impacts, adaptation actions to reduce risks of climate change are urgent (Campbell et al., 2016). In the view of Asfaw et al. (2015) adaptation to climate change is an adjustment in natural or human systems in response to actual or expected climatic stimuli or their effects, which moderates harm or exploits beneficial opportunities. Page 2 of 15 --- # Lemessa et al., Cogent Food & Agriculture (2019), 5: 1640835 https://doi.org/10.1080/23311932.2019.1640835 The literature on climate change impacts and the vulnerability of agricultural sector is increasingly recognizing the important role of adaptation strategies (Smit & Skinner, 2002). Adaptation strategies are required as they contribute to mitigate the high incidence of climate change related problems. These strategies are relevant for smallholder farmers who are highly exposed to the threats of climate changes. The relevance of adaptations strategies plied by smallholders farmers are pronged into four folds: (i) it enables them to evaluate and choose the best alternatives, (ii) it helps them to practice new way of production system to address climate-related shocks and ensures their survival, (iii) it assists them to change/modify production systems in line with the effect of climate change, and (iv) it helps to design policies to tackle the challenges that climate change is posing on smallholder farmers. Moreover, identifying the adaptation strategies for particular crop than for the entire agricultural sector helps to device an appropriate climate change policies. Climate change adaptation strategy studies on agriculture are abundant (Belay et al., 2017; Campbell et al., 2016; Gbegbelegbe et al., 2017; Mulatu, 2013; Tesfaye & Seifu, 2016), but adaptation strategies on root crops, particularly potato (Solanum tuberosum L.) production was forlorn. While potato is the most popular human staple crop in Ethiopia with the
2) |0|13.57|5.54|9.14| | | | |10| | | | | | | |20|13.85| | | | | | |30| | | | | | | |40| | | | | | | 39.34 --- # Figure 3. Types of climate change-induced challenges. | |Water log|Soil erosion|Drought|Frost|Pest and diseases| |---|---|---|---|---|---| |Percentage|100|93.41|95.33|90.91| | |Percentage| |63.97|69.42| | | # Figure 4. Intensity of climate change-induced challenges. Drought Frost 100 92.24 80 57.65 60 29.71 40 20 5 7.65 0 Little bit Medium Severe Little bit Medium Severe Not a problem Not a problem Soil erosion Water log 100 50.52 80 40.85 60 27.15 40 14.43 20 7.56 16.9 0 Little bit Medium Severe Little bit Medium Severe Not a problem Not a problem 60% of the respondents reported the prevalence of all of the aforementioned climate change-induced problems in their locality. Similar to the finding of the study by Tesfahun (2018), drought, soil erosion, and water log were intensively challenging potato production in the study area as reported by quite half of the respondents. Frost (wurchi/amadey) is the worst type of climate change-induced problem to potatoes production as reported by 90% of the respondents, see Figure 4. Frost (wurchi/amadey) which occurs in two-to-three laps of years not only affects potatoes production but also the production of khat (Chata edulis forsk), the most widely grown cash crop in the study area. As smallholder farmers‚Äô livelihood mainly depends on the production of the two crops, the adverse effects of climate change on their production aggravate food insecurity in the area. --- # 3.4. Three-stage probit regression results Smallholder farmers adopt and exercise different strategies that can reduce the adverse effects of climate change on economic livelihoods and food security. Three major climate change adaptation strategies, namely: adoption of IPVs, irrigation usage and intercropping were practiced in
3) commonly called frost (Wurchi/Amadey). According to Tesfaye and Seifu (2016), the eastern Ethiopian highlands suffer from food production deficits and of high livelihoods vulnerability due to the aforementioned climate change-induced risks. Therefore, the rate of food production has failed to keep pace with the high rate of population growth resulting in high levels of food insecurity. Despite the relevance of studying climate change adaptation strategies in response to food insecurity, empirical studies are strikingly scant on climate change adaptations strategies for root crops. Even the few existing empirical studies considered bundles of adaptation strategies as independent of one another. Moreover, these studies failed to capture the possibilities of reverse causation between the strategies (Acquah, 2011; Belay et al., 2017; Deressa, Hassan, Ringler, Alemu, & Yesuf, 2009; Fosu-Mensah, Vlek, & MacCarthy, 2012; Tesfaye & Seifu, 2016). --- # Methodology # Study setting This study was carried out in eastern part of Ethiopia, particularly eastern Hararghe Zone of Oromia National Regional State. The study area is characterized by high population density, rainfall variability, frequent drought occurrence, crop failure, severe land degradation, and increased vulnerability to chronic food insecurity (Tesfaye & Seifu, 2016). Although the amount and pattern vary locally, rainfall is bimodal in distribution; a short season from March to May is known as Belg, and a longer rainy reason from July to September is known as Kiremt seasons. The amount of rainfall varies between 650 and 750 mm, while the average temperature of the study area varies between 25¬∞C and 30¬∞C. The maximum and minimum mean annual temperatures for the area are 23.8¬∞C and 9.6¬∞C, respectively (Alemayehu, Furi, & Legesse, 2007; Mulugeta et al., 2018). Smallholder farmers in the zone have rich experience in root crops and vegetable production, irrigation usage and intercropping. The major crops grown are sorghum and maize

User question:
1. Where exactly in terms of geography is this adaptation response observed?
   If there are more than one location, please provide all that apply.
   Provide details in the format:
   Country name: <country name>,
   Sub-national region: <sub-national region>,

2. Please identify the adaptation response undertaken and, for the adaptation response identified,
   please list the stakeholders involved using the following categories:
   - International or multinational governance institutions
   - Government (national)
   - Government (sub-national)
   - Government (local)
   - Private sector (corporations)
   - Private sector (SME)
   - Civil society (international/multinational/national)
   - Civil society (sub-national or local)
   - Individuals or households
   - Other
   Respond in the format: Stakeholders: <your answer>

3. The depth of the climate adaptation response relates to the degree to which the change is new or transformative.
   Classify the depth as one of the following: Low; Medium; High; Not certain / Insufficient information / Not assessed.
   Respond in the format:
   Depth: <your assessment>,
   Explanation: <your reasoning for this assessment>.
   """
    }
]




In [None]:
# Ensure tokenizer has a padding token
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token  # Use EOS as padding token

# Apply chat template
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,  # Must be added for generation
    return_tensors="pt"
).to("cuda")

# Manually create an attention mask
attention_mask = inputs.ne(tokenizer.pad_token_id)  # Mask non-pad tokens

# Generate response with attention mask
outputs = model.generate(
    input_ids=inputs,
    attention_mask=attention_mask,  # Add attention mask
    max_new_tokens=64,
    use_cache=True,
    temperature=1.5,
    min_p=0.1
)

# Decode output
decoded_output = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print(decoded_output)


['system\n\nCutting Knowledge Date: December 2023\nToday Date: 26 July 2024\n\nYou are a climate change research assistant with expertise in adaptation tracking through document analysis. Your task is to identify the evidence regarding the following questions within the context of climate change adaptation.user\n\nBelow are 3 relevant contexts from research papers:\n1) Chap 122009 Mertz, Ole, Halsnaes, Kirsten, Olesen, J√∏rgen, Rasmussen, Kjeld. Adaptation to Climate Change in Developing Countries. Environmental Management. 2009a; 43:743‚Äì52. [PubMed: 19184576] Mertz, Ole, Cheikh, Mbow, Reenberg, Anette, Diouf, Awa. Farmers‚Äô Perceptions of Climate Change and Agricultural Adaptation Strategies in Rural Sahel. Environmental Management. 2009b; 43:804‚Äì16. [PubMed: 18810526] Mitchell, Timopy. Rule of Experts. Berkeley: University of California Press; 2002. Sociol Dev (Oakl). Author manuscript; available in PMC 2017 October 05. --- # TELLER Mohai, Paul, Pellow, David, Timmons Roberts, J

In [None]:
"""from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "llama-3.1",
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference


inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
).to("cuda")

outputs = model.generate(input_ids = inputs, max_new_tokens = 64, use_cache = True,
                         temperature = 1.5, min_p = 0.1)
tokenizer.batch_decode(outputs)"""

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


['<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 26 July 2024\n\nYou are a climate change research assistant with expertise in adaptation tracking through document analysis. Your task is to identify the evidence regarding the following questions within the context of climate change adaptation.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nBelow are 3 relevant contexts from research papers:\n1) Change (Measham et al., 2011; Preston, Westaway, & Yuen, 2011) and the severity of its adverse impacts, adaptation actions to reduce risks of climate change are urgent (Campbell et al., 2016)... (truncated for brevity)\n\nUser question:\n1. Where exactly in terms of geography is this adaptation response observed?\n   If there are more than one location, please provide all that apply.\n   Provide details in the format:\n   Country name: <country name>,\n   Sub-national region: <sub-national region>,\n\n2. Please identify the

 You can also use a `TextStreamer` for continuous inference - so you can see the generation token by token, instead of waiting the whole time!

In [None]:
FastLanguageModel.for_inference(model) # Enable native 2x faster inference


inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
).to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 128,
                   use_cache = True, temperature = 1.5, min_p = 0.1)

Country name: Ethiopia
Stakeholders:  Individuals or households
Depth: Low
Explanation: |||Small changes<|eot_id|>


<a name="Save"></a>
### Saving, loading finetuned models
To save the final model as LoRA adapters, either use Huggingface's `push_to_hub` for an online save or `save_pretrained` for a local save.

**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
save_path = "/content/drive/MyDrive/lora_model_adapters_context_5_latest_30"


In [None]:
model.save_pretrained(save_path) # Local saving
tokenizer.save_pretrained(save_path)
# model.push_to_hub("your_name/lora_model", token = "...") # Online saving
# tokenizer.push_to_hub("your_name/lora_model", token = "...") # Online saving

('/content/drive/MyDrive/lora_model_adapters_context_5_latest_30/tokenizer_config.json',
 '/content/drive/MyDrive/lora_model_adapters_context_5_latest_30/special_tokens_map.json',
 '/content/drive/MyDrive/lora_model_adapters_context_5_latest_30/tokenizer.json')

Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:

In [None]:
messages = [
    {
        "role": "system",
        "content": "You are a climate change research assistant with expertise in adaptation tracking through document analysis. Your task is to identify the evidence regarding the following questions within the context of climate change adaptation."
    },
    {
        "role": "user",
        "content": """1) 2007. The Survey System: Sample Size Calculator (Retrieved from: Accessed on 3 February, 2013) http://www.surveysystem.com/sscalc.htm. Cooper, P.J.M., Coe, R., 2011. Assessing and addressing climate-induced risk in sub-Saharan rainfed agriculture. Exp. Agric. 47 (02), 179‚Äì184. --- # F.W. Muriu-Ng‚Äôang‚Äôa et al. / Agricultural Water Management 194 (2017) 150‚Äì159 Rockstrom, J., Barron, J., Fox, P., 2002. Rainwater management for increased productivity among small-holder farmers in drought prone environments. Phys. Chem. Earp. 27, 949‚Äì959 (s 11‚Äì22). Rockstrom, J., 2000. Water resources management in smallholder farms in eastern and soupern africa: an overview. Phys. Chem. Earp B 25 (3), 275‚Äì283. Rogers, E.M., 2003. Diffusion of Innovations, 5p ed. Free Press, New York (551pp.). Sakaki, M., Koga, K., 2013. An effective approach to sustainable small-scale irrigation developments in Sub-Saharan Africa. Paddy Water Environ. 11, 1‚Äì14. Scherr, S., 2000. A downward spiral? Research evidence on pe relationship between poverty and natural resource degradation. Food Policy 25, 479‚Äì498. Shiferaw, B.A., Okello, A.J., Reddy, R.V., 2009. Adoption and adaptation of natural resource management innovations in smallholder agriculture: reflections on key lessons and best practices. Environ. Dev. Sustain. 11, 601‚Äì619. Shikur, A., Beshah, T., 2013. Analysis of influencing factors in adoption of rainwater harvesting technology to combat pe ever changing climate variability in Lanfuro Woreda, Soupern region, Epiopia. J. Agric. Res. 2 (1), 15‚Äì27. Sidibe, A., 2005. Farm-level adoption of soil and water conservation techniques in Norpern Burkina Faso. Agric. Water Manage. 71, 211‚Äì224. Tefera, F., 1983. The Effect of Grass Strips in Controlling Soil Erosion and Runoff on Long Steep Slopes. MSc Thesis. Department of Agricultural Engineering, University of Nairobi. Teklehaimanot, A., 2006. Social, Economic and Institutional Factors Affecting Utilization of Rainwater Harvesting Technology, Eastern Tigray, Epiopia. Mekele University, Epiopia. Tengberg, A., Muriipi, L., Okoba, B., 1999. Land management on semi-Arid Hillsides in eastern Kenya: learning from farmers‚Äô
2) # Agricultural Water Management 194 (2017) 150‚Äì159 Contents lists available at ScienceDirect Research Paper Socio-economic factors influencing utilisation of rain water harvesting and saving technologies in Tharaka South, Eastern Kenya F.W. Muriu-Ng‚Äôang‚Äôa, M. Mucheru-Muna, F. Waswa, F.S Mairura a Department of Forestry and Land Resources Management, South Eastern Kenya University, P.O.Box 70-9010, Kitui, Kenya b Department of Environmental Science, Kenyatta University, P.O. Box 43844-00100, Nairobi, Kenya c Department of Agricultural Resource Management, Kenyatta University, P.O. Box 43844-00100, Nairobi, Kenya Article history: Received 15 May 2017 Received in revised form 6 September 2017 Accepted 7 September 2017 Available online 18 September 2017 Keywords: Rain water harvesting Dryland agriculture Climate change Resource-intensive technologies The study concluded that farmer age, household size, farm size, farming history, training, and formal education were important factors which influenced utilisation of rain water harvesting and saving technologies in Tharaka sub-county. Specific approaches are needed to scale-up resource-intensive technologies (Fanya juu, Zai pits, and Negarims) compared to less resource-intensive technologies. 1. Introduction 1.1. Background Rain-fed agriculture is estimated to contribute about 60% of the world‚Äôs crop production (FAO, 2009). For many countries in Sub-Saharan Africa, agriculture is the most important sector of the economy with significant multiplier effects and about 60% to 70% of the total population from the region living in rural areas and largely depending on small-scale subsistence agriculture for their livelihood (Rockstrom, 2000). Further, rainfall is relatively inadequate and shows high spatial and temporal variability, which is associated with prolonged dry spells during the growing season (Jaetzold et al., 2006). Several scientific investigations have shown that rainwater harvesting technologies are important in enhancing productive water use whilst raising crop yield levels (Karpouzoglou and Barron, 2014) in delicate semi-arid agro-ecosystems. In Tharaka South Sub-county, seasonal rainfall varies a lot around the mean, with occasions of subsequent
3) below average rainfall (Ngetich et al., 2014). Persistence of below normal rainfall in Tharaka is a risk to livelihoods which aggravates vulnerability to hunger and famine in this semi-arid area (Icheria, 2015). Various water harvesting and saving technologies have been successfully tested and popularized by governmental and non-governmental agencies (Pachpute et al., 2010). Since the 1950s, several rainwater harvesting and saving technologies have been introduced in Kenya (Tiffen et al., 1994). Despite the biophysical and socio-economic benefits of rainwater harvesting and saving technologies, adoption of soil and water conservation technologies has remained low, especially in Kenya‚Äôs semi-arid zones (Kahinda and Taigbenu, 2011). Few of these efforts have succeeded in combining technical efficiency with social acceptability to local farmers (Pachpute et al., 2010). The Kenya‚Äôs Vision 2030 and the --- # F.W. Muriu-Ng‚Äôang‚Äôa et al. / Agricultural Water Management 194 (2017) 150‚Äì159 National Climate Change Action Plan recognizes rainwater harvesting and saving technologies as crucial in achieving the goals of socio-economic development, climate change resilience and sustainability (Republic of Kenya, 2013). However the adoption and utilization of these technologies in Kenya has been shaped by interactions of biophysical and socio-economic factors which have not been sufficiently studied (Furlow et al., 2011). There is lack of scientific knowledge on smallholder farmer‚Äôs adoption and utilization behavior of resource-intensive and less resource-intensive rainwater harvesting and saving technologies. Thus the following study aims to establish socio-economic factors influencing their use (or non-use) in Tharaka Sub-County. Six types of in situ rainwater harvesting and saving technologies which were practiced for soil and water conservation in Tharaka South are described. The technologies selected included 3 highly resource intensive (in terms of labour and knowledge requirements) [including Fanya juu, Zai pits, and Negarims] and 3 less resource intensive technologies [including Grass strips, stone lines and trash lines]. The

User question:
1. Where exactly in terms of geography is this adaptation response observed?
   If there are more than one location, please provide all that apply.
   Provide details in the format:
   Country name: <country name>,
   Sub-national region: <sub-national region>,

2. Please identify the adaptation response undertaken and, for the adaptation response identified,
   please list the stakeholders involved using the following categories:
   - International or multinational governance institutions
   - Government (national)
   - Government (sub-national)
   - Government (local)
   - Private sector (corporations)
   - Private sector (SME)
   - Civil society (international/multinational/national)
   - Civil society (sub-national or local)
   - Individuals or households
   - Other
   Respond in the format: Stakeholders: <your answer>

3. The depth of the climate adaptation response relates to the degree to which the change is new or transformative.
   Classify the depth as one of the following: Low; Medium; High; Not certain / Insufficient information / Not assessed.
   Respond in the format:
   Depth: <your assessment>,
   Explanation: <your reasoning for this assessment>.
   """
    }
]




In [None]:
if True:
    from unsloth import FastLanguageModel
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "/content/drive/MyDrive/lora_model_adapters_context_5", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = 8192,
        dtype = None,
        load_in_4bit = True,
    )
    FastLanguageModel.for_inference(model) # Enable native 2x faster inference




==((====))==  Unsloth 2025.3.19: Fast Llama patching. Transformers: 4.51.1.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Unsloth 2025.3.19 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.


In [None]:
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
).to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 128,
                   use_cache = True, temperature = 0.6, min_p = 0.1)

I'll answer the questions based on the provided text.

**1. Where exactly in terms of geography is this adaptation response observed?**

Country name: Kenya,
Sub-national region: Tharaka South Sub-county,

**2. Please identify the adaptation response undertaken and, for the adaptation response identified, please list the stakeholders involved**

Adaptation response: Rainwater harvesting and saving technologies
Stakeholders:
- International or multinational governance institutions: None
- Government (national): Republic of Kenya, Kenya‚Äôs Vision 2030 and National Climate Change Action Plan
- Government (sub-national): Government of Tharaka South Sub-count


You can also use Hugging Face's `AutoModelForPeftCausalLM`. Only use this if you do not have `unsloth` installed. It can be hopelessly slow, since `4bit` model downloading is not supported, and Unsloth's **inference is 2x faster**.

In [None]:
if False:
    # I highly do NOT suggest - use Unsloth if possible
    from peft import AutoPeftModelForCausalLM
    from transformers import AutoTokenizer
    model = AutoPeftModelForCausalLM.from_pretrained(
        "lora_model", # YOUR MODEL YOU USED FOR TRAINING
        load_in_4bit = load_in_4bit,
    )
    tokenizer = AutoTokenizer.from_pretrained("lora_model")

### Saving to float16 for VLLM

We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens.

In [None]:
path = "/content/drive/MyDrive/lora_model_merged_16_bit"

In [None]:
# Merge to 16bit
if True: model.save_pretrained_merged(path, tokenizer, save_method = "merged_16bit",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_16bit", token = "")

# Merge to 4bit
if False: model.save_pretrained_merged(save_path, tokenizer, save_method = "merged_4bit",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_4bit", token = "")

# Just LoRA adapters
if False: model.save_pretrained_merged(save_path, tokenizer, save_method = "lora",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "lora", token = "")

Unsloth: Will remove a cached repo with size 1.5K


Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 53.49 out of 83.48 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 28/28 [00:00<00:00, 112.92it/s]


Unsloth: Saving tokenizer... Done.
Done.


### GGUF / llama.cpp Conversion
To save to `GGUF` / `llama.cpp`, we support it natively now! We clone `llama.cpp` and we default save it to `q8_0`. We allow all methods like `q4_k_m`. Use `save_pretrained_gguf` for local saving and `push_to_hub_gguf` for uploading to HF.

Some supported quant methods (full list on our [Wiki page](https://github.com/unslothai/unsloth/wiki#gguf-quantization-options)):
* `q8_0` - Fast conversion. High resource use, but generally acceptable.
* `q4_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.
* `q5_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K.

[**NEW**] To finetune and auto export to Ollama, try our [Ollama notebook](https://colab.research.google.com/drive/1WZDi7APtQ9VsvOrQSSC5DDtxq159j8iZ?usp=sharing)

In [None]:
# Save to 8bit Q8_0
if False: model.save_pretrained_gguf("model", tokenizer,)
# Remember to go to https://huggingface.co/settings/tokens for a token!
# And change hf to your username!
if False: model.push_to_hub_gguf("hf/model", tokenizer, token = "")

# Save to 16bit GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "")

# Save to q4_k_m GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "q4_k_m", token = "")

# Save to multiple GGUF options - much faster if you want multiple!
if False:
    model.push_to_hub_gguf(
        "hf/model", # Change hf to your username!
        tokenizer,
        quantization_method = ["q4_k_m", "q8_0", "q5_k_m",],
        token = "", # Get a token at https://huggingface.co/settings/tokens
    )

Now, use the `model-unsloth.gguf` file or `model-unsloth-Q4_K_M.gguf` file in `llama.cpp` or a UI based system like `GPT4All`. You can install GPT4All by going [here](https://gpt4all.io/index.html).

**[NEW] Try 2x faster inference in a free Colab for Llama-3.1 8b Instruct [here](https://colab.research.google.com/drive/1T-YBVfnphoVc8E2E854qF3jdia2Ll2W2?usp=sharing)**

And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/u54VK8m8tk) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other links:
1. Zephyr DPO 2x faster [free Colab](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing)
2. Llama 7b 2x faster [free Colab](https://colab.research.google.com/drive/1lBzz5KeZJKXjvivbYvmGarix9Ao6Wxe5?usp=sharing)
3. TinyLlama 4x faster full Alpaca 52K in 1 hour [free Colab](https://colab.research.google.com/drive/1AZghoNBQaMDgWJpi4RbffGM1h6raLUj9?usp=sharing)
4. CodeLlama 34b 2x faster [A100 on Colab](https://colab.research.google.com/drive/1y7A0AxE3y8gdj4AVkl2aZX47Xu3P1wJT?usp=sharing)
5. Mistral 7b [free Kaggle version](https://www.kaggle.com/code/danielhanchen/kaggle-mistral-7b-unsloth-notebook)
6. We also did a [blog](https://huggingface.co/blog/unsloth-trl) with ü§ó HuggingFace, and we're in the TRL [docs](https://huggingface.co/docs/trl/main/en/sft_trainer#accelerate-fine-tuning-2x-using-unsloth)!
7. `ChatML` for ShareGPT datasets, [conversational notebook](https://colab.research.google.com/drive/1Aau3lgPzeZKQ-98h69CCu1UJcvIBLmy2?usp=sharing)
8. Text completions like novel writing [notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing)
9. [**NEW**] We make Phi-3 Medium / Mini **2x faster**! See our [Phi-3 Medium notebook](https://colab.research.google.com/drive/1hhdhBa1j_hsymiW9m-WzxQtgqTH_NHqi?usp=sharing)
10. [**NEW**] We make Gemma-2 9b / 27b **2x faster**! See our [Gemma-2 9b notebook](https://colab.research.google.com/drive/1vIrqH5uYDQwsJ4-OO3DErvuv4pBgVwk4?usp=sharing)
11. [**NEW**] To finetune and auto export to Ollama, try our [Ollama notebook](https://colab.research.google.com/drive/1WZDi7APtQ9VsvOrQSSC5DDtxq159j8iZ?usp=sharing)
12. [**NEW**] We make Mistral NeMo 12B 2x faster and fit in under 12GB of VRAM! [Mistral NeMo notebook](https://colab.research.google.com/drive/17d3U-CAIwzmbDRqbZ9NnpHxCkmXB6LZ0?usp=sharing)

<div class="align-center">
  <a href="https://github.com/unslothai/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/u54VK8m8tk"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord.png" width="145"></a>
  <a href="https://ko-fi.com/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Kofi button.png" width="145"></a></a> Support our work if you can! Thanks!
</div>

### Testing and Evaluation

In [None]:
import pandas as pd
from transformers import TextStreamer

# Load test dataset
test_df = pd.read_parquet('/content/test_data_converted_for_llama_3.2.parquet_context_1')

# List to store model outputs for each test record
results = []

# Enable native 2x faster inference
FastLanguageModel.for_inference(model)

# Loop through each record
for idx, row in test_df.iterrows():
    messages = row["messages"]  # Each record is a list of dicts in the required format

    # Prepare inputs using tokenizer apply_chat_template function
    inputs = tokenizer.apply_chat_template(
        messages,
        tokenize=True,
        add_generation_prompt=True,  # Must add for generation
        return_tensors="pt",
    ).to("cuda")

    # Generate output using the fine-tuned model
    generated_ids = model.generate(
        input_ids=inputs,
        max_new_tokens=128,
        use_cache=True,
        temperature=0.5,
        min_p=0.1
    )

    # Decode only the newly generated tokens (excluding input prompt)
    input_length = inputs.shape[1]  # Length of the input tokens
    generated_text = tokenizer.decode(generated_ids[0][input_length:], skip_special_tokens=True)

    results.append(generated_text.strip())  # Strip extra spaces
    print(f"Processed record {idx+1}")
    print(f"{generated_text.strip()}")
# Save the results to a parquet file
results_df = pd.DataFrame({"result": results})
results_df.to_parquet("/content/test_results.parquet", index=False)

print(f"Saved generated test results to /content/test_results.parquet with {len(results_df)} records.")


The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Processed record 1
Country name: Kenya
Stakeholders:  Individuals or households||| National government
Depth: Low
Explanation: |||moderate|||
Processed record 2
Country name: Zambia
Stakeholders:  Norad||| Norwegian Agency of Development Cooperation
Depth: Low
Explanation: The depth is limited and relates to changes in agricultural production practices|||In-depth change to farmers from traditional practices to those prescribed by the program.
Processed record 3
Country name: Niger Delta, Nigeria|||Nigeria
Stakeholders:  Individuals or households||| Civil Society- sub-national or local||| Private sector SMEs
Depth: Medium
Explanation: in-depth|||The response has been widely used in brackish water aquaculture systems.|||medium|||Depth is not discussed but assumed to be moderate|||Depth of change is not reported, but assumed to be significant as this is a practice that has been expanded over the years.|||Not assessed|||Not reported but assumed to be significant as this practice has been e

Unsloth: Input IDs of length 17537 > the model's max sequence length of 8192.
We shall truncate it ourselves. It's imperative if you correct this issue first.


Processed record 97
Country name: Uganda
Stakeholders:  Individuals or households||| Civil Society- sub-national or local
Depth: Medium
Explanation: |||In depth
Processed record 98
ÔøΩÔøΩ„ÄÄÔΩ∞ÔΩáÔΩ†ÔΩáÔøΩ„ÄÄÔºéÔΩæÔΩ†ÔΩíÔºéÔºéÔΩ†ÔΩ†ÔΩ†ÔΩ†ÔøΩÔºéÔºéÔΩ†ÔΩ†ÔΩ†ÔøΩÔºéÔΩáÔΩ†ÔΩ†ÔøΩÔºé„ÄÄÔº†ÔøΩ„ÄÄÔºÅÔΩáÔΩáÔøΩ„ÄÄÔºé„ÄÄÔºé„ÄÄÔº†ÔøΩ„ÄÄÔº†ÔøΩ„ÄÄÔºé„ÄÄÔºéÔºéÔΩ†ÔΩ†ÔøΩ„ÄÄÔºé„ÄÄÔºé„ÄÄ„ÄÄÔº∞„ÄÄÔº†ÔøΩ„ÄÄÔºéÔøΩ„ÄÄÔºéÔøΩ„ÄÄÔºûÔºéÔºû„ÄÄÔºé„ÄÄÔºéÔøΩ„ÄÄÔºéÔºûÔºéÔøΩÔºéÔøΩ„ÄÄÔº†ÔøΩ„ÄÄÔºé„ÄÄÔºéÔøΩ„ÄÄÔºé
Processed record 99
Country name: Ethiopia
Stakeholders:  Individuals or households||| National government
Depth: Medium
Explanation: in-depth|||medium depth - row planting is new approach to teff farming but still teff is the main crop
Processed record 100
Country name: India
Stakeholders:  Individuals or households
Depth: Low
Explanation: Limited depth - this is essentially an expansion of existing practices for wheat farmers.|||limited
Processed record 101
Country name: Vietnam
Stakeholders:  Individuals or ho

In [None]:
df = pd.read_parquet('/content/test_results.parquet')
print("Columns in the DataFrame:", df.columns.tolist())

Columns in the DataFrame: ['result']


In [None]:
import pandas as pd


df = pd.read_parquet('/content/test_results.parquet')
for idx, row in df.iterrows():
    print(f"Record {idx+1}:\n")
    print(row['result'])
    print("\n" + "="*80 + "\n")

Record 1:

Country name: China
Stakeholders:  Individuals or households
Depth: Medium
Explanation: Medium - needs more than basic and routine responses, but not necessarily large-scale changes|||In depth change - moving farming locations selectively


Record 2:

Country name: Pakistan
Stakeholders:  Individuals or households
Depth: Medium
Explanation: In depth as the authors mention that the respondents were similar in terms of demographic and socioeconomic characteristics. Also, they mention that the outcomes were similar too.|||limited depth


Record 3:

Country name: Tanzania
Stakeholders:  Individuals or households||| Local government
Depth: Low
Explanation: limited depth|||Limited depth - no real change in underlying values.


Record 4:

Country name: Bangladesh
Stakeholders:  Individuals or households||| National government||| Civil Society- sub-national or local||| Local government
Depth: Low
Explanation: shallow|||limited|||In-depth|||In-depth


Record 5:

Country name: Belgium

In [None]:
import pandas as pd
import re
from transformers import TextStreamer  # if you're using a streamer for debugging
# (Make sure other necessary modules, e.g., torch, are already imported)

# Load test dataset
test_df = pd.read_parquet('/content/test_data_converted_for_llama_3.2.parquet')

# Enable native 2x faster inference (call this once, outside the temperature loop)
FastLanguageModel.for_inference(model)

# List of temperature values to experiment with
temperature_values = [0.1,0.3,0.5,0.7,0.9,1.3,1.5]

# Loop through each temperature value
for temp in temperature_values:
    results = []  # List to store outputs for this temperature

    print(f"\nProcessing for temperature: {temp}\n{'-'*40}")
    # Loop through each record in the test dataframe
    for idx, row in test_df.iterrows():
        messages = row["messages"]  # Each record is a list of dicts in the required format

        # Prepare inputs using tokenizer apply_chat_template function
        inputs = tokenizer.apply_chat_template(
            messages,
            tokenize=True,
            add_generation_prompt=True,  # Must add for generation
            return_tensors="pt",
        ).to("cuda")

        # Generate output using the fine-tuned model with current temperature value
        generated_ids = model.generate(
            input_ids=inputs,
            max_new_tokens=128,
            use_cache=True,
            temperature=temp,
            min_p=0.1
        )

        # Decode only the newly generated tokens (excluding input prompt)
        input_length = inputs.shape[1]  # Length of the input tokens
        generated_text = tokenizer.decode(generated_ids[0][input_length:], skip_special_tokens=True)

        results.append(generated_text.strip())  # Strip extra spaces

        print(f"Processed record {idx+1} at temperature {temp}")
        print(generated_text.strip())

    # Save the results for the current temperature to a parquet file
    results_df = pd.DataFrame({"result": results})
    output_file = f"/content/drive/MyDrive/test_results_temperature_{temp}.parquet"
    results_df.to_parquet(output_file, index=False)

    print(f"\nSaved generated test results to {output_file} with {len(results_df)} records.")



Processing for temperature: 0.1
----------------------------------------
Processed record 1 at temperature 0.1
Country name: Kenya
Stakeholders:  Individuals or households||| National government
Depth: Low
Explanation: limited|||Limited
Processed record 2 at temperature 0.1
Country name: Zambia
Stakeholders:  Individuals or households||| Civil Society- international/multinational/national||| National government
Depth: Low
Explanation: limited depth|||Limited depth - the project builds on existing practices and changes only some of their practices.
Processed record 3 at temperature 0.1
Country name: Niger Delta, Nigeria|||Nigeria
Stakeholders:  Individuals or households||| Civil Society- sub-national or local
Depth: Not certain / Insufficient information / Not assessed
Explanation: limited|||Insufficient information available on this topic|||In-depth
Processed record 4 at temperature 0.1
Country name: Sierra Leone|||Sierra Leonean communities
Stakeholders:  Individuals or households|||

Unsloth: Input IDs of length 17556 > the model's max sequence length of 8192.
We shall truncate it ourselves. It's imperative if you correct this issue first.


Processed record 97 at temperature 0.1
Country name: Uganda
Stakeholders:  Individuals or households||| Local government
Depth: Low
Explanation: Limited|||Small
Processed record 98 at temperature 0.1
ÔΩ†ÔøΩ„ÄÄÔºéÔΩ†ÔΩ†ÔºéÔΩ†ÔΩ†ÔΩ†ÔΩ†ÔΩ†ÔøΩÔºéÔΩ†ÔΩ†ÔøΩÔºéÔøΩÔºéÔΩ†ÔΩ†ÔºéÔΩ†ÔΩ†ÔΩ†ÔΩ†ÔΩ†ÔΩ†ÔøΩÔºéÔΩ†ÔΩ†ÔΩ†ÔºéÔΩ†ÔΩ†ÔΩ†ÔΩ†ÔΩ†ÔøΩÔºéÔΩ†ÔºéÔºéÔºéÔºéÔΩ†ÔºéÔºéÔΩ†ÔΩ†ÔΩ†ÔΩ†ÔΩ†ÔΩ†ÔøΩÔºéÔΩ†ÔøΩÔºéÔºéÔΩ†ÔΩ†ÔøΩ„ÄÄÔºéÔøΩÔºéÔΩ†ÔºéÔºéÔºéÔΩ†ÔΩ†ÔºéÔºéÔøΩ„ÄÄÔºéÔΩ†ÔΩ†ÔΩ†ÔºéÔøΩ
Processed record 99 at temperature 0.1
Country name: Ethiopia
Stakeholders:  Individuals or households||| Local government||| National government
Depth: Medium
Explanation: Medium|||Then compared the control group and the experimental group to see if there was any change in the outcome measures (teff yield and teff crop income)  Since the study shows significant increase in teff crop income, it can be concluded that row planting of teff was successful in increasing teff crop income for farmers.
Processed record 100 at temperature 0.1
Coun

In [None]:
from unsloth import FastLanguageModel
from peft import PeftModel
from transformers import AutoTokenizer

# Step 3: Load LoRA adapter weights
model = PeftModel.from_pretrained(
    model,
    "/content/drive/MyDrive/lora_model_adapters_context_5"
)

# Step 4: Load tokenizer manually if needed (Optional, only if you're restoring from disk)
tokenizer = AutoTokenizer.from_pretrained(
    "/content/drive/MyDrive/lora_model_adapters_context_5",
    use_fast=True
)

model.print_trainable_parameters()




trainable params: 0 || all params: 3,237,063,680 || trainable%: 0.0000
