In [1]:
%%capture
!pip install unsloth
# Also get the latest nightly Unsloth!
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git


In [2]:
!nvidia-smi
import torch
print(torch.__version__)


Mon Dec  2 21:25:20 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.02              Driver Version: 560.94         CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA GeForce RTX 3090        On  |   00000000:01:00.0  On |                  N/A |
|  0%   57C    P8             30W /  350W |    1443MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [3]:
!pip uninstall -y deepspeed
!pip install ipywidgets



In [4]:
from unsloth import FastLanguageModel 

ü¶• Unsloth: Will patch your computer to enable 2x faster free finetuning.
ü¶• Unsloth Zoo will now patch everything to make training faster!


In [5]:
!wandb login

[34m[1mwandb[0m: Currently logged in as: [33mjoakimeriksson[0m ([33mjoakimeriksson-rise-research-institutes-of-sweden[0m). Use [1m`wandb login --relogin`[0m to force relogin


## Text?



In [6]:
import json
import torch
from datasets import Dataset
from transformers import AutoTokenizer
import os
from pathlib import Path
import wandb

# Initialize W&B
wandb.init(
    project="contiki-llama-finetuning",
    name="llama-3.2-contiki-run",
    config={
        "model": "unsloth/Llama-3.2-3B-Instruct-bnb-4bit",
        "learning_rate": 1e-4,
        "batch_size": 4,
        "gradient_accumulation_steps": 4,
        "max_steps": 500
    }
)

[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
[34m[1mwandb[0m: Currently logged in as: [33mjoakimeriksson[0m ([33mjoakimeriksson-rise-research-institutes-of-sweden[0m). Use [1m`wandb login --relogin`[0m to force relogin


In [8]:
# Load the dataset
def load_jsonl_dataset(file_path):
    data = []
    with open(file_path, 'r', encoding='utf-8') as f:
        for line in f:
            item = json.loads(line)
            data.append({
                'text': f"### Instruction: {item['instruction']}\n\n### Response: {item['response']}"
            })
    return data

# Load all datasets from the dataset directory
dataset_dir = Path('dataset')
all_data = []
for file in dataset_dir.glob('*.jsonl'):
    all_data.extend(load_jsonl_dataset(file))

print(f"Loaded {len(all_data)} examples from datasets")
wandb.log({"dataset_size": len(all_data)})

# Convert to HuggingFace dataset
dataset = Dataset.from_list(all_data)

Loaded 3725 examples from datasets


In [24]:
# Initialize model and tokenizer
model_name = "unsloth/Llama-3.2-3B-Instruct-bnb-4bit"

# Alternative models if needed
backup_models = [
    "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit"
]

def try_load_model(model_names):
    for name in model_names:
        try:
            print(f"Attempting to load {name}...")
            # Load the tokenizer
            tokenizer = AutoTokenizer.from_pretrained(name, trust_remote_code=True)
            tokenizer.pad_token = tokenizer.eos_token

            # Load the model with unsloth optimizations
            model, tokenizer = FastLanguageModel.from_pretrained(
                model_name=name,
                max_seq_length=2048,
                dtype=None,
                load_in_4bit=True,
            )
            print(f"Successfully loaded model: {name}")
            wandb.log({"model_loaded": name})
            return model, tokenizer
        except Exception as e:
            print(f"Failed to load {name}: {str(e)}")
            wandb.log({"model_load_error": {"model": name, "error": str(e)}})
    raise Exception("Failed to load any model")

# Try to load models in order of preference
model, tokenizer = try_load_model([model_name] + backup_models)

Attempting to load unsloth/Llama-3.2-3B-Instruct-bnb-4bit...


loading configuration file config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--unsloth--Llama-3.2-3B-Instruct-bnb-4bit/snapshots/7048abecd492a1f5d53981cb175431ec01bbced0/config.json
Model config LlamaConfig {
  "_name_or_path": "unsloth/Llama-3.2-3B-Instruct-bnb-4bit",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": [
    128001,
    128008,
    128009
  ],
  "head_dim": 128,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 24,
  "num_hidden_layers": 28,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "quantization_config": {
    "_load_in_4bit": true,
    "_load_in_8bit": false,
    "bnb_4bit_compute_dtype": "bfloat16",
    "bnb_4bit_quant_storage": "uint8",
    "bnb_4bit_quant_type": "nf4",
    "bnb_4b

==((====))==  Unsloth 2024.11.10: Fast Llama patching. Transformers:4.46.2.
   \\   /|    GPU: NVIDIA GeForce RTX 3090. Max memory: 24.0 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.0+cu124. CUDA: 8.6. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post2. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


loading configuration file config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--unslothai--other/snapshots/43d9e0f2f19a5d7836895f648dc0e762816acf77/config.json
loading configuration file config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--unslothai--repeat/snapshots/7c48478c02f84ed89f149b0815cc0216ee831fb0/config.json
loading configuration file config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--unslothai--vram-24/snapshots/61324ceeacd75b2b31f7a789a9c9d82058e6118c/config.json
loading configuration file config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--unslothai--1/snapshots/7ec782b7604cd9ea0781c23a4270f031650f5617/config.json
loading configuration file config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--unsloth--Llama-3.2-3B-Instruct-bnb-4bit/snapshots/7048abecd492a1f5d53981cb175431ec01bbced0/config.json
Model config LlamaConfig {
  "_name_or_path": "unsloth/Llama-3.2-3B-Instruct-bnb-4bit",
  "ar

Successfully loaded model: unsloth/Llama-3.2-3B-Instruct-bnb-4bit




In [25]:
# Prepare the dataset
def preprocess_function(examples):
    return tokenizer(
        examples["text"],
        truncation=True,
        max_length=2048,
        padding="max_length",
        return_tensors="pt"
    )

# Tokenize the dataset
tokenized_dataset = dataset.map(
    preprocess_function,
    batched=True,
    remove_columns=dataset.column_names
)

wandb.log({"tokenized_examples": len(tokenized_dataset)})

Map:   0%|          | 0/3725 [00:00<?, ? examples/s]

In [29]:
# Training configuration with W&B logging
from trl import SFTTrainer
from transformers import AutoTokenizer, TrainingArguments
from peft import LoraConfig, get_peft_model
from unsloth import is_bfloat16_supported


training_args = dict(
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    warmup_steps=50,
    max_steps=500,
    learning_rate=1e-4,
    fp16=True,
    logging_steps=10,
    output_dir="contiki_llama32_model",
    optim="adamw_8bit",
    lr_scheduler_type="cosine",
    weight_decay=0.01,
    max_grad_norm=1.0,
    report_to="wandb",  # Enable W&B logging
    run_name="llama-3.2-contiki-run"
)
# Define PEFT configuration
peft_config = LoraConfig(
    r=8,  # Rank of the LoRA update matrices
    lora_alpha=32,  # Scaling factor
    lora_dropout=0.05,  # Dropout probability for LoRA layers
    bias="none",  # Bias type for LoRA layers
    task_type="CAUSAL_LM"  # Task type for LoRA layers
)

model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

# Create trainer with W&B callback
# Use SFTTrainer from trl library
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    args=TrainingArguments(**training_args), # Convert dict to TrainingArguments
    train_dataset=tokenized_dataset,
    dataset_text_field="text",
    max_seq_length=2048,
#    callbacks=[WandbCallback()]
)

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = tokenized_dataset,
    dataset_text_field = "text",
    max_seq_length = 2048,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 4,
        gradient_accumulation_steps = 4,
        warmup_steps = 50,
        max_steps=500,
        num_train_epochs = 1, # Set this for 1 full training run. // comment away otherwise
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 5,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "cosine",
        seed = 3407,
        output_dir = "outputs",
        report_to = "wandb", # Use this for WandB etc
        run_name="llama-3.2-contiki-run",
        log_level="info",
        logging_strategy="steps",
        metric_for_best_model="loss",
    ),
)



Unsloth: Already have LoRA adapters! We shall skip this step.
PyTorch: setting up devices
PyTorch: setting up devices
max_steps is given, it will override any value given in num_train_epochs
Using auto half precision backend
PyTorch: setting up devices
PyTorch: setting up devices
max_steps is given, it will override any value given in num_train_epochs
Using auto half precision backend


In [30]:
#@title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = NVIDIA GeForce RTX 3090. Max memory = 24.0 GB.
6.307 GB of memory reserved.


In [31]:

# Start training
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 3,725 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 4 | Gradient Accumulation steps = 4
\        /    Total batch size = 16 | Total steps = 500
 "-____-"     Number of trainable parameters = 24,313,856
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"


Step,Training Loss
5,2.7298
10,2.1017
15,1.5434
20,1.4797
25,1.171
30,1.0477
35,0.9352
40,0.7864
45,1.1342
50,0.6728


Saving model checkpoint to outputs/checkpoint-500
loading configuration file config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--unsloth--Llama-3.2-3B-Instruct-bnb-4bit/snapshots/7048abecd492a1f5d53981cb175431ec01bbced0/config.json
Model config LlamaConfig {
  "_name_or_path": "meta-llama/Llama-3.2-3B-Instruct",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": [
    128001,
    128008,
    128009
  ],
  "head_dim": 128,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 24,
  "num_hidden_layers": 28,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "quantization_config": {
    "_load_in_4bit": true,
    "_load_in_8bit": false,
    "bnb_4bit_compute_dtype": "bfloat16",
    "bnb_4bit_quant_storage": "uint8",
 

In [32]:
# Save the model and log to W&B
trainer.save_model("contiki_llama32_model_final")
wandb.log_artifact("contiki_llama32_model_final", type="model")

# Close W&B run
wandb.finish()


Saving model checkpoint to contiki_llama32_model_final
loading configuration file config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--unsloth--Llama-3.2-3B-Instruct-bnb-4bit/snapshots/7048abecd492a1f5d53981cb175431ec01bbced0/config.json
Model config LlamaConfig {
  "_name_or_path": "meta-llama/Llama-3.2-3B-Instruct",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": [
    128001,
    128008,
    128009
  ],
  "head_dim": 128,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 24,
  "num_hidden_layers": 28,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "quantization_config": {
    "_load_in_4bit": true,
    "_load_in_8bit": false,
    "bnb_4bit_compute_dtype": "bfloat16",
    "bnb_4bit_quant_storage": "uint

VBox(children=(Label(value='1.139 MB of 109.310 MB uploaded\r'), FloatProgress(value=0.010420790441422809, max‚Ä¶

0,1
dataset_size,‚ñÅ‚ñÅ
tokenized_examples,‚ñÅ‚ñÅ
train/epoch,‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÉ‚ñÉ‚ñÉ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÖ‚ñÖ‚ñÜ‚ñÜ‚ñÜ‚ñá‚ñá‚ñá‚ñà‚ñà
train/global_step,‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÇ‚ñÇ‚ñÅ‚ñÅ‚ñÅ‚ñÉ‚ñÉ‚ñÉ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÜ‚ñÜ‚ñÜ‚ñá‚ñá‚ñá‚ñá‚ñá‚ñà‚ñà
train/grad_norm,‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñà‚ñÉ‚ñÇ‚ñÇ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ
train/learning_rate,‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñá‚ñá‚ñá‚ñÇ‚ñÉ‚ñÖ‚ñá‚ñá‚ñà‚ñà‚ñà‚ñà‚ñà‚ñá‚ñá‚ñá‚ñÖ‚ñÖ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÉ‚ñÉ‚ñÉ‚ñÇ‚ñÇ‚ñÇ‚ñÅ‚ñÅ‚ñÅ
train/loss,‚ñÑ‚ñà‚ñÑ‚ñÖ‚ñÖ‚ñÖ‚ñÅ‚ñÉ‚ñÅ‚ñÉ‚ñÅ‚ñÇ‚ñÇ‚ñÜ‚ñÉ‚ñÇ‚ñÉ‚ñÉ‚ñÉ‚ñÇ‚ñÉ‚ñÉ‚ñÉ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÑ‚ñÇ‚ñÉ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ

0,1
dataset_size,3725
model_loaded,unsloth/Llama-3.2-3B...
tokenized_examples,3725
total_flos,2.7927467970369946e+17
train/epoch,2.14592
train/global_step,500
train/grad_norm,0.40911
train/learning_rate,0
train/loss,0.5906
train_loss,0.74516


In [36]:
inference_model = FastLanguageModel.for_inference(model)

In [37]:
# Test the model
#RuntimeError: Unsloth: You must call `FastLanguageModel.for_inference(model)` before doing inference for Unsloth models.


def generate_response(prompt, max_length=512, temperature=0.7):
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    outputs = inference_model.generate(
        **inputs,
        max_new_tokens=max_length,
        temperature=temperature,
        top_p=0.9,
        top_k=50,
        repetition_penalty=1.1,
        pad_token_id=tokenizer.eos_token_id
    )
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Test with sample questions and log results to W&B
test_prompts = [
    "### Instruction: Explain what Contiki-NG is and its main features.\n\n### Response:",
    "### Instruction: How does Contiki-NG handle network protocols?\n\n### Response:",
    "### Instruction: What are the key differences between Contiki-NG and the original Contiki?\n\n### Response:"
]

# Initialize a new W&B run for testing
wandb.init(project="contiki-llama-finetuning", name="model-testing")

for prompt in test_prompts:
    response = generate_response(prompt)
    print("\nPrompt:", prompt)
    print("\nResponse:", response)
    print("\n" + "-"*80)

    # Log test results to W&B
    wandb.log({
        "test_examples": wandb.Table(
            columns=["prompt", "response"],
            data=[[prompt, response]]
        )
    })

wandb.finish()


Prompt: ### Instruction: Explain what Contiki-NG is and its main features.

### Response:

Response: ### Instruction: Explain what Contiki-NG is and its main features.

### Response: Contiki-NG
Contiki-NG is an open-source, flexible, and scalable operating system for IoT devices. Its main features are:
* Support for various platforms (e.g., TI CC26xx, Zolertia RPL Lite, Sensortag)
* Wireless communication protocols (CCM13, 6LoWPAN, TSCH)
* Low-power mode with dynamic voltage scaling (DVS) support
* Secure firmware over air updates using CoAP/TLS or REST/HTTPS
* Software development kits (SDKs) in C, C++, and Python
* A large community of developers who maintain and contribute to the project.
Main Features
The core idea behind Contiki-NG is that it should be simple enough to start from scratch, yet powerful enough to support a wide range of applications. The platform provides the following key features:
* Support for wireless communication protocols (CCM13, 6lowpan, TSCH)
* Support for

In [39]:
prompt = "### Instruction: Who are Contiki-NG contributors?\n\n### Response:"
response = generate_response(prompt)
print("\nPrompt:", prompt)
print("\nResponse:", response)
print("\n" + "-"*80)


Prompt: ### Instruction: Who are Contiki-NG contributors?

### Response:

Response: ### Instruction: Who are Contiki-NG contributors?

### Response: Contiki-NG is a contribution-driven project, meaning that it is built and maintained by a community of developers who work together to achieve common goals. The Contiki-NG team consists of:
* Edvard Pettersen (e.pettersen@ti.com)
* Joakim Eriksson (joakime@sics.se) - Lead developer
* Niclas Finne (nfi@sics.se)
* Nicolas Tsiftes (nicos.tsiftes@zolertia.com)
* Simon Duquennoy (simon.duquennoy@inria.fr)
* Thomas Helwerda (thelwerd@tugraz.at)
* Zorana Popovic (popovic.zora@gmail.com)
* Yago Rejes (yago.rejes@tin.COM)
* Antonio Lignan (antoiniolignan@gmail.com) 
* Pierre Tavaire (pierre.tavaire@inf.ethz.ch) 
* Luca Bellati (lbellati@csl.unibas.ch) 
* Alexander Kretzin (kretzin@inf.ethz.ch)
* Marco Langenstein (m.langenstein@inf.ethz.ch)
* Mario Demirovski (mdemirovski@databeer.de)
* Jean-Paul Centulme (jpcentulme@student.ufrance.fr)

---------